Partial Word DFAs⋆

15 downloads 0 Views 3MB Size Report
a partial word over Σ is a sequence of symbols from Σ . Denoting the set of ... the minimal state complexity of a DFA accepting L or the minimal state com-.
Partial Word DFAs? E. Balkanski1 , F. Blanchet-Sadri2 , M. Kilgore3 , and B. J. Wyatt2 1

3

Department of Mathematical Sciences, Carnegie Mellon University, Wean Hall 6113, Pittsburgh, PA 15213, USA, [email protected] 2 Department of Computer Science, University of North Carolina, P.O. Box 26170, Greensboro, NC 27402–6170, USA, [email protected] [email protected] Department of Mathematics, Lehigh University, Christmas-Saucon Hall, 14 East Packer Avenue, Bethlehem, PA 18015, USA [email protected]

Abstract. Recently, Dassow et al. connected partial words and regular languages. Partial words are sequences in which some positions may be undefined, represented with a “hole” symbol . If we restrict what the symbol  can represent, we can use partial words to compress the representation of regular languages. Doing so allows the creation of so-called -DFAs which are smaller than the DFAs recognizing the original language L, which recognize the compressed language. However, the -DFAs may be larger than the NFAs recognizing L. In this paper, we investigate a question of Dassow et al. as to how these sizes are related.

1

Introduction

The study of regular languages dates back to McCulloch and Pitts’ investigation of neuron nets (1943) and has been extensively developing since (for a survey see, e.g., [7]). Regular languages can be represented by deterministic finite automata, DFAs, by non-deterministic finite automata, NFAs, and by regular expressions. They have found a number of important applications such as compiler design. There are well-known algorithms to convert a given NFA to an equivalent DFA and to minimize a given DFA, i.e., find an equivalent DFA with as few states as possible (see, e.g., [6]). It turns out that there are languages accepted by DFAs that have 2n states while their equivalent NFAs only have n states. Recently, Dassow et al. [4] connected regular languages and partial words. Partial words first appeared in 1974 and are also known under the name of strings with don’t cares [5]. In 1999, Berstel and Boasson [2] initiated their combinatorics under the name of partial words. Since then, many combinatorial properties and algorithms have been developed (see, e.g., [3]). One of Dassow et al.’s motivations was to compress DFAs into smaller machines, called -DFAs. ?

This material is based upon work supported by the National Science Foundation under Grant No. DMS–1060775.

2

E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt

More precisely, let Σ be a finite alphabet of letters. A (full) word over Σ is a sequence of letters from Σ. We denote by Σ ∗ the set of all words over Σ, the free monoid generated by Σ under the concatenation of words where the empty word ε serves as the identity. A language L over Σ is a subset of Σ ∗ . It is regular if it is recognized by a DFA or an NFA. A DFA is a 5-tuple M = (Q, Σ, δ, q0 , F ), where Q is a set of states, δ : Q × Σ → Q is the transition function, q0 ∈ Q is the start state, and F ⊆ Q is the set of final or accepting states. In an NFA, δ maps Q × Σ to 2Q . We call |Q| the state complexity of the automaton. Many languages are classified by this property. Setting Σ = Σ ∪ {}, where  6∈ Σ represents undefined positions or holes, a partial word over Σ is a sequence of symbols from Σ . Denoting the set of all partial words over Σ by Σ∗ , a partial language L0 over Σ is a subset of Σ∗ . It is regular if it is regular when being considered over Σ . In other words, we define languages of partial words, or partial languages, by treating  as a letter. They can be transformed to languages by using -substitutions over Σ. A ∗ substitution σ : Σ∗ → 2Σ satisfies σ(a) = {a} for all a ∈ Σ, σ() ⊆ Σ, and σ(uv) = σ(u)σ(v) for u, v ∈ Σ∗ . As a result, σ is fully defined by σ(), e.g., if σ() = {a, b} and L0 = {b, c} then σ(L0 ) = {ab, bb, ac, bc}. If we consider this process in reverse, we can “compress” languages into partial languages. We consider the following question from Dassow et al. [4]: Are there regular languages L ⊆ Σ ∗ , L0 ⊆ Σ∗ and a -substitution σ with σ(L0 ) = L such that the minimal state complexity of a DFA accepting L0 or the minimal state complexity of a -DFA accepting L, denoted by min -DFA (L), is (strictly) less than the minimal state complexity of a DFA accepting L, denoted by minDFA (L)? Reference [4, Theorem 4] states that for every regular language L, we have minDFA (L) ≥ min -DFA (L) ≥ minNFA (L), where minNFA (L) denotes the minimal state complexity of an NFA accepting L, and there exist regular languages L such that minDFA (L) > min -DFA (L) > minNFA (L). On the other hand, [4, Theorem 5] states that if n ≥ 3 is an integer, regular languages L and L0 exist such that min -DFA (L) ≤ n + 1, minDFA (L) = 2n − 2n−2 , minNFA (L0 ) ≤ 2n + 1, and min -DFA (L0 ) ≥ 2n − 2n−2 . This was the first step towards analyzing the sets: Dn = {m | there exists L such that min -DFA (L) = n and minDFA (L) = m}, Nn = {m | there exists L such that min -DFA (L) = n and minNFA (L) = m}. Our paper, whose focus is the analysis of Dn and Nn , is organized as follows. We obtain in Section 2 values belonging to Dn by looking at specific types of regular languages, followed by values belonging to Nn in Section 3. Due to the nature of NFAs, generating a sequence of minimal NFAs from a -DFA is difficult. However, in the case minDFA (L) > min -DFA (L) = minNFA (L), we show how to use concatenation of languages to create an L0 with systematic differences between min -DFA (L0 ) and minNFA (L0 ). We also develop a way of applying integer partitions to obtain such values. We conclude with some remarks in Section 4.

Partial Word DFAs

2

3

Constructs for Dn

This section provides some values for Dn by analyzing several classes of regular languages. In the description of the transition function of our DFAs and -DFAs, all the transitions lead to the error state (a sink non-final state) unless otherwise stated. Also, in our figures, the error state and transitions leading to it have been removed for clarity. We will often refer to the following algorithm. Given a -DFA M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ) and a -substitution σ, Algorithm 1 gives a minimal DFA that accepts σ(L(M 0 )): – Build an NFA N = (Q0 , Σ, δ, q00 , F 0 ) that accepts σ(L(M 0 )), where δ(q, a) = {δ 0 (q, a)} if a ∈ Σ \ σ() and δ(q, a) = {δ 0 (q, a), δ 0 (q, )} if a ∈ σ(). – Convert N to an equivalent minimal DFA. First, we look at languages of words of equal length. We give three constructs. The first two both use an alphabet of variable size, while our third one restricts this to a constant k. We prove the second construct which is illustrated in Fig. 1.  2  n−1  Theorem 1. For n ≥ 1, n−1 + 3 + 2 + (n − 1) mod 3 ∈ Dn . 3 √  1+8(n−1)−1 Theorem 2. For n ≥ 1, if x = then 2x + n − 1 − x(x+1) ∈ Dn 2 2 for languages of words of equal length. Px Proof. We start by writing n as n = r + i=1 i such that 1 ≤ r ≤ x+1 (from the online encyclopedia of integer sequences, x is as stated). Let M = (Q, Σ, δ, q0 , F ) be the DFA defined as follows:  – (i, j) | 0 ≤ i < x, 0 ≤ j < 2i , (i, j) 6= (x − 1, 0) ∪{(i, 0) | x ≤ i ≤ x + r} = Q, q0 = (0, 0), F = {(x + r − 1, 0)}, and (x + r, 0) is the error state; – Σ = {a0 , a1 , c} ∪ {bi | 1 ≤ i < x}; – δ is defined as follows: • δ((i, j), ak ) = (i + 1, 2j + k) for all (i, j), (i + 1, 2j + k) ∈ Q, ak ∈ Σ, i 6= x − 1, with the exception of δ((x − 2, 0), a0 ) = (x + r, 0), • δ((x − 1, i), bj ) = (x, 0) for all (x − 1, i) ∈ Q, bj ∈ Σ where the jth digit from the right in the binary representation of i is a 1, • δ((i, 0), c) = (i + 1, 0) for x ≤ i < x + r. Each word accepted by M can be written in the form w = ubi cr−1 , where u is a word of length x − 1 over {a0 , a1 } except for a0x−1 , and bi belongs to some subset of Σ unique for each u. This implies that M is minimal with 2x + n − 1 − x(x+1) 2 states. We can build the minimal equivalent -DFA for σ() = {a0 , a1 }, giving M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ) with n states as follows: – {(i, j) | 0 ≤ i < x, 0 ≤ j ≤ i, (i, j) 6= (x − 1, 0)} ∪ {(i, 0) | x ≤ i ≤ x + r} = Q0 , q00 = (0, 0), F 0 = {(x + r − 1, 0)}, and (x + r, 0) is the error state; – δ 0 is defined as follows: • δ 0 ((i, 0), a1 ) = (i + 1, i + 1) for 0 ≤ i < x − 1,

4

E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt

• δ 0 ((i, j), ) = (i + 1, j) for all (i, j) ∈ Q0 \{(x − 2, 0)} where i < x − 1, • δ 0 ((x − 1, i), bx−i ) = (x, 0) for 1 ≤ i < x, • δ 0 ((x + i, 0), c) = (x + i + 1, 0) for 0 ≤ i < r − 1. Observe that L(M 0 ) = {x−i−1 a1 i−1 bi cr−1 | 1 ≤ i < x}, so σ(L(M 0 )) = L(M ). Each accepted word consists of a unique prefix of length x − 1 paired with a unique bi ∈ Σ, and r states are needed for the suffix cr−1 P, xwhich implies that M 0 is minimal over all -substitutions. Note that |Q0 | = ( i=1 i) + r = n. t u

Fig. 1. M (left) and M 0 (right) from Theorem 2, n = 11, x = 4

Theorem 3. For k > 1 and l, r ≥ 0, let n =

k(k+2l+3) 2

+ r + 2. Then

2k+1 + l(2k − 1) + r ∈ Dn , for languages of words of equal length. Next, we look at languages of words of bounded length. The following theorem is illustrated in Fig. 2. Theorem 4. For n ≥ 3, [n, n + (n−2)(n−3) ] ⊆ Dn . 2 Pn−3 Proof. Write m = n + r + i=l i for the lowest value of l ≥ 1 such that r ≥ 0. Let M = (Q, Σ, δ, q0 , F ) be defined as follows: – Σ = {a0 , ar } ∪ {ai | l ≤ i ≤ n − 3}; – Q = {(i, 0) | 0 ≤ i < n} ∪ {(i, j) | aj ∈ Σ and 1 ≤ i ≤ j}, q0 = (0, 0), F = {(n − 2, 0)} ∪ {(i, i) | i 6= 0, (i, i) ∈ Q}, and (n − 1, 0) is the error state; – δ is defined by δ((0, 0), ai ) = (1, i) for all ai ∈ Σ where i > 0, δ((i, j), a0 ) = (i + 1, j) for all (i, j) ∈ Q, i 6= j, and δ((i, i), a0 ) = (i + 1, 0) for all (i, i) ∈ Q. Then L(M ) = {ai an−3 | ai ∈ Σ} ∪ {ai ai−1 | ai ∈ Σ, i 6= 0}. For each ai , i 6= 0, 0 0 M requires i states. These are added to the error state and n − 1 states needed for an−2 . Thus, M is minimal with m states. Let M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ), where 0 0 Q = {i | 0 ≤ i < n}, q00 = 0, F 0 = {n − 2}, and n − 1 is the error state; δ 0 is defined by δ 0 (0, ) = 1, δ 0 (0, ai ) = n − 1 − i for all ai ∈ Σ, i > 0, and δ 0 (i, a0 ) = i + 1 for 1 ≤ i < n − 1. For σ() = Σ, we have σ(L(M 0 )) = L(M ). Furthermore, M 0 needs n − 1 states to accept a0n−3 ∈ L(M 0 ), so M 0 is minimal with n states. t u

Partial Word DFAs

5

Fig. 2. M (top) and M 0 (bottom) from Theorem 4, n = 7 and m = 15 (l = 3, r = 1)

Theorem 4 gives elements of Dn close to its lower bound. To find an upper bound, we look at a specific class of machines. Let n ≥ 2 and let Rn = ({0, . . . , n − 1}, {a0 } ∪ {(αi )j | 2 ≤ i + 2 ≤ j ≤ n − 2} , δ 0 , 0, {n − 2}) (1) be the -DFA where n − 1 is the error state, and δ 0 is defined by δ 0 (i, ) = i + 1 for 0 ≤ i < n − 2 and δ 0 (i, (αi )j ) = j for all (αi )j . Fig. 3 gives an example when n = 7. Set Ln = σ(L(Rn )), where σ is the -substitution that maps  to the alphabet. Note that Rn is minimal for L(Rn ), since we need at least n − 1 states to accept words of length n − 2 without accepting longer strings. Furthermore, Rn is minimal for σ, as each letter (αi )j encodes a transition between a unique pair of states (i, j). This also implies that Rn is minimal for any -substitution. The next two theorems look at the minimal DFA that accepts Ln . We refer the reader to Fig. 3 to visualize the ideas behind the proofs. Referring to Fig. 3, in the DFA, each explicitly labelled transition is for the indicated letters. From each state, there is one transition that is not labelled this represents the transition for each letter not explicitly labelled in a different transition from that state. (For example, from state 0, a3 transitions to {1, 3}, a2 transitions to {1, 2}, a4 transitions to {1, 4}, a5 transitions to {1, 5}, and all other letters a0 , b3 , b4 , b5 , c4 , c5 , d5 transition to {1}). The idea behind the proof of Theorem 6 is that we start with this DFA. We introduce a new letter, “e”, into the alphabet and add a new state, {2, 3, 4, 5}, along with a transition from {1, 3} to {2, 3, 4, 5} for e. We want to alter the -DFA to accommodate this. So we add a transition for e from 1 to 3 and from 3 to 5 (represented by dashed edges). All other states transition to the error state for e. Now consider the string a3 e. We get four strings that correspond to some partial word that produces a3 e after substitution: a3 e, a3 , e, and . When the -DFA reads the first, it halts in state 5; on the second, it halts in 4; on the third, it halts in 3; and for the fourth, it halts in 2, which matches the added state {2, 3, 4, 5}. Finally, we need to consider the effect of adding e and the described transitions to the -DFA does it change the corresponding minimal DFA in other ways? To show that it does not, all transitions with dashed edges in the DFA represent the transitions for e. For example, from state {2, 3}, an e transitions to {3, 4, 5}.

6

E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt

Fig. 3. -DFA R7 (top if the dashed edges are seen as solid) and minimal DFA for σ(L7 ) (bottom if the dotted element is ignored and the dashed edges are seen as solid) where α0 = a, α1 = b, α2 = c, α3 = d and σ() = {a0 , a2 , a3 , a4 , a5 , b3 , b4 , b5 , c4 , c5 , d5 }.

Theorem 5. Let Fib be the Fibonacci sequence defined by Fib(1) = Fib(2) = 1 and for n ≥ 2, Fib(n+1) = Fib(n)+ Fib(n−1). Then for n ≥ 1, Fib(n+1) ∈ Dn . Proof. For n ≥ 2, applying Algorithm 1, convert M 0 = Rn to a minimal DFA M = (Q, Σ, δ, q0 , F ) that accepts Ln , where Q ⊆ 2{0,...,n−1} . For each state {i} ∈ Q for 0 ≤ i ≤ n − 2, M requires additional states to represent each possible subset of one or more states of {i + 1, . . . , n − 2} that M 0 could reach in i transitions. Thus M is minimal with number of states 1+

n−2 X i=0

min{i,n−2−i} 

X j=0

n−2−i j

 = Fib(n + 1),

where the 1 refers to the error state and where the inside sum refers to the number of states with minimal element i. t u Theorem 6. For n ≥ 3, the following is the least upper bound for m ∈ Dn in the case of languages of words of bounded length: n−1 X i=0

 n − 1 − dlog2 ie . i

Our next result restricts the alphabet size to two. Theorem 7. For n ≥ 1,

n−1 b n2 c(b n2 c+1)+b n−1 2 c(b 2 c+1)

2

+ 1 ∈ Dn .

Finally, we look at languages with some arbitrarily long words.

Partial Word DFAs

7

Theorem 8. For n ≥ 1, 2n − 1 is the least upper bound for m ∈ Dn . Proof. First, let M 0 be a minimal -DFA with -substitution σ. If we convert this to a minimal DFA accepting σ(L(M 0 )) using Algorithm 1, the resulting DFA has at most 2n − 1 states, one for each non-empty subset of the set of states in M 0 . Thus an upper bound for m ∈ Dn is 2n − 1. Now we show that there exists a regular language L such that min -DFA (L) = n and minDFA (L) = 2n −1. Let M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ) with Q0 = {0, . . . , n−1}, Σ = {a, b}, q00 = 0, F 0 = {n − 1}, and δ 0 defined by δ 0 (i, α) = i + 1 for 0 ≤ i < n − 1, α ∈ {, a}; δ 0 (n − 1, α) = 0 for α ∈ {, a}; and δ 0 (i, b) = 0 for 0 ≤ i < n. Then M 0 is minimal, since n−1 ∈ L(M 0 ) but i ∈ / L(M 0 ) for 0 ≤ i < n − 1. After constructing the minimal DF A M = (Q, Σ, δ, q0 , F ) using Algorithm 1 for σ() = {a, b}, we claim that all non-empty subsets of Q0 are states in Q. To show this, we construct a word that ends in any non-empty subset P of Q0 . Let P = {p0 , . . . , px } with p0 < · · · < px . We start with apx . Then create the word w by replacing the a in each position px − pi − 1, 0 ≤ i < x, with b. We show that w ends in state P by first showing that for each pi ∈ P , some partial word w0 exists such that w ∈ σ(w0 ) and M 0 halts in pi when reading w0 . First, suppose pi = px . Since |w| = px , let w0 = px . For w0 , M 0 halts in px . Now, suppose pi 6= px . Let w0 = px −pi −1 bpi . After reading px −pi −1 , M 0 is in state px − pi − 1, then in state 0 for b, and then in state pi after reading pi . Now suppose a partial word w0 exists such that w ∈ σ(w0 ) where M 0 halts in p for p ∈ / P . Suppose p > px . Each state i ∈ Q0 is only reachable after i transitions and |w0 | = px , so M 0 cannot reach p after reading w0 . Now suppose p < px . Then M 0 needs to be in state 0 after reading px − p symbols to end in p, so we must have w0 [px − p − 1] = b. However, w[px − p − 1] = a, a contradiction. Furthermore, no states of Q are equivalent, as each word w ends in a unique state of Q. Therefore, M has 2n − 1 states, and 2n − 1 ∈ Dn . t u To further study intervals in Dn , we look at the following class of -DFAs. For n ≥ 2 and 0 ≤ r < n, let Rn,r {s1 , . . . , sk } = ({0, . . . , n − 1}, {a0 , a1 , . . . , ak } , δ 0 , 0, {n − 1})

(2)

be the -DFA where {s1 , . . . , sk } is a set of tuples whose first member is a letter ai , distinct from a0 , followed by one or more states in ascending order, and where δ 0 (q, ai ) = 0 for all (q, ai ) that occur in the same tuple, δ 0 (i, ) = i + 1 for 0 ≤ i ≤ n − 2, δ 0 (n − 1, ) = r, and δ 0 (q, ai ) = δ 0 (q, ) for all other (q, ai ). Since Rn,r {} is minimal for any -substitution, and since  and non- transitions from any state end in the same state, Algorithm 1 converts Rn,r {} to a minimal DFA with exactly n states. The next result looks at -DFAs of the form Rn,r {(a1 , 0)}. Theorem 9. For n ≥ 2 and 0 ≤ i < n, n + (n − 1)i ∈ Dn . Proof. Let a0 = a and a1 = b, let r = n − i − 1, let σ() = Σ = {a, b}, and let M 0 = Rn,r {(b, 0)}. Using Algorithm 1, let M = (Q, Σ, δ, {0}, F ) be the minimal DFA accepting σ(L(M 0 )). For all words over Σ of length less than n, M must halt

8

E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt

in some state P ∈ Q, a subset of consecutive states of {0, . . . , n − 1}. Moreover, any state P ∈ Q of consecutive states of {0, . . . , n − 1}, with minimal element p, is reached by M when reading bq ap for some q ≥ 0. Also, any accepting states in Q that are subsets of {0, . . . , n − 1} of size n − r or greater are equivalent, as are any non-accept states that are subsets of size n − r or greater such that the n − r greatest values in each set are identical. This implies that M requires P n j=n−i j states for words of length less than n. For words of length n or greater, M may halt in a state P ∈ Q that is not a subset of consecutive states of {0, . . . , n − 1}, as for some r < p < n − 1, it is possible to have r, n − 1 ∈ P but p ∈ / P . This only occurs when a transition from a state P with n−1 ∈ P occurs, in which case, M moves to a state P 0 containing r, corresponding to δ 0 (n − 1, α) for all α ∈ Σ . Thus, all states can be considered subsets of consecutive values if we consider r consecutive to n − 1 or, in other words, if we allow values n − 1 to r to “wrap” around to each other. This Pfrom i−1 means that M requires j=1 j states for words of length n or greater. Therefore, Pn Pi−1 t u j=n−i j + j=1 j = n + (n − 1)i ∈ Dn .

3

Constructs for Nn

Let Σ be an alphabet, and let Σi = {ai | a ∈ Σ} for all integers i, i > 0. Let σi : Σ → Σi such that a 7→ ai , and let #j be a symbol in no Σi , for all i and j. Given a language L over Σ, the ith product of L and the ith #-product of L are, respectively, the languages πi (L) =

i Y j=1

σj (L),

πi0 (L) = σ1 (L)

i Y

{#j−1 }σj (L).

j=2

In general, we call any construct of this form, languages over different alphabets concatenated with # symbols, a #-concatenation. With these definitions in hand, we obtain our first bound for Nn .     , n ⊆ Nn . Theorem 10. For n > 0, n − n−1 3 Proof. Let L = {aa, ba, b} be a language over Σ = {a, b}. A minimal NFA recognizing πi (L) is defined as having 2i + 1 states, q0 , . . . , q2i , with accepting state q2i , starting state q0 , and transition function δ defined by δ(q2j , bj+1 ) = {q2j+1 , q2(j+1) }, δ(q2j , aj+1 ) = {q2j+1 }, and δ(q2j+1 , aj+1 ) = {q2(j+1) } for j < i. It is easy to see this is minimal: the number of states is equal to the maximal length of the words plus one. A minimal -DFA recognizing πi (L) is defined as having 3i + 1 states, q0 , . . . , q3i−1 and qerr , with accepting states q3i−1 and q3i−2 , starting state q0 , and transition function δ defined as follows: – δ(q0 , b1 ) = q2 , δ(q0 , ) = q1 , and δ(q1 , a1 ) = q2 ; – δ(q3j−1 , aj+1 ) = q3j , δ(q3j−1 , bj+1 ) = q3j+1 , δ(q3j , aj+1 ) = q3(j+1)−1 , and δ(q3j+1 , aj+1 ) = q3(j+1)−1 for 0 < j < i;

Partial Word DFAs

9

– δ(q3j+1 , aj+2 ) = δ(q3(j+1)−1 , aj+2 ) and δ(q3j+1 , bj+2 ) = δ(q3(j+1)−1 , bj+2 ) for 0 < j < i − 1. The -substitution corresponds to Σ1 = {a1 , b1 } here. This is minimal. Now, fix n; take any i ≤ b n−1 3 c. We can write n = 3i + r + 1, for some r ≥ 0. Let {αj }0≤j≤r be a set of symbols not in the alphabet of πi (L). Minimal NFA and -DFA recognizing πi (L) ∪ {α0 · · · αr } can clearly be obtained by adding 0 0 to each a series of states q00 = q0 , q10 , . . . , qr0 , and qr+1 = q2i and qr+1 = q3i−1 0 0 respectively, with δ(qj , αj ) = qj+1 for 0 ≤ j ≤ r. Hence, for i ≤ b n−1 3 c, we can produce a -DFA of size n = 3i + r + 1 which reduces to an NFA of size 2i + r + 1 = n − i. t u Our general interval is based on πi0 (L), where no -substitutions exist over multiple Σi ’s. We need the following lemma. Lemma 1. Let L, L0 be languages recognized by minimal NFAs N = (Q, Σ, δ, q0 , F ) and N 0 = (Q0 , Σ 0 , δ 0 , q00 , F 0 ), where Σ ∩ Σ 0 = ∅. Moreover, let # ∈ / Σ, Σ 0 . 00 0 00 0 Then L = L{#}L is recognized by the minimal NFA N = (Q ∪ Q , Σ ∪ Σ 0 , δ 00 , q0 , F 0 ), where δ 00 (q, a) = δ(q, a) if q ∈ Q and a ∈ Σ; δ 00 (q, a) = δ 0 (q, a) if q ∈ Q0 and a ∈ Σ 0 ; δ 00 (q, #) = {q00 } if q ∈ F ; and δ 00 (q, a) = ∅ otherwise. Consequently, the following hold: 1. For any L, minNFA (πi0 (L)) = i minNFA (L); 2. Let L1 , . . . , Ln be languages whose minimal DFAs have no error states and whose alphabets are pairwise disjoint, and without loss of generality, let minDFA (L1 ) − min -DFA (L1 ) ≥ · · · ≥ minDFA (Ln ) − min -DFA (Ln ). Then min (L1 {#1 }L2 {#2 } · · · Ln ) = 1 + min (L1 ) +

-DFA

-DFA

n X i=2

min(Li ).

DFA

Theorem 11. Let L be a language whose minimal DFA has no error state. Moreover, assume k min -DFA (L) = minNFA (L). Fix some n and j, 0 < j ≤ j n−min -DFA (L)−1 . Then n − j (minDFA (L) − min -DFA (L)) − 1 ∈ Nn . minDFA (L) Proof. Since 0 < j ≤

j

n−min -DFA (L)−1 minDFA (L)

k , we can write n = 1 + min -DFA (L) +

j minDFA (L) + r for some r. Then, by Lemma 1(2), this corresponds to n = 0 min -DFA (πj+1 (L) ∪ {w}), where w is a word corresponding to an r-length chain 0 of states, as we used in the proof of Theorem 10. We also have minNFA (πj+1 (L)∪ {w}) = (j + 1) min -DFA (L) + r using Lemma 1(1) and our assumption that min -DFA (L) = minNFA (L). Alternatively,   0 min(πj+1 (L) ∪ {w}) = n − j min(L) − min (L) − 1. NFA

Our result follows.

DFA

-DFA

t u

10

E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt

The above linear bounds can be improved, albeit with a loss of clarity in the overall construct. Consider the interval of values obtained in Theorem 4. Fix anl integer x.mThe minimal integer y such that x ≤ y + (y−2)(y−3) is clearly 2 √ 3+ 8x−15 , for x ≥ 4. Associate with x and nx the corresponding DFAs nx = 2 and -DFAs used in the proof of Theorem 4, i.e., let Ln,m be the language in the proof with minimal -DFA size n and minimal DFA size m. If we replace each -transition in the minimal -DFA and remove the error state, we get a minimal NFA of size n − 1 accepting Ln,m (this NFA must be minimal since the maximal length of a word in Ln,m is n − 2). Noting that all deterministic automata in question have error states, we get, using Lemma 1(1), that min -DFA (πi0 (Lnx ,x )) = nx + (i − 1)(x − 1) and minNFA (πi0 (Lnx ,x )) = i(nx − 1). This allows us to obtain the following linear bound. k i h j x − 1, n ⊆ Nn . Theorem 12. For n > nx ≥ 4, n − (x − nx ) n−n x−1 Proof. For any n and fixed x, write n = nx + (i − 1)(x − 1) + r, for some 0 ≤ r < x − 1, which is realizable as a minimal -DFA by appending to the minimal -DFA accepting πi0 (Lnx ,x ) an arbitrary chain of states of length r, using letters not in the alphabet of πi0 (Lnx ,x ), similar to what we did in the proof of Theorem 10. This leads j to ka minimal NFA of size i(nx − 1) + r, giving

x the lower bound n − (x − nx ) n−n − 1 if we solve for i. Anything in the upper x−1 bound can be obtained by decreasing i or replacing occurrences of Lnx ,x with Lnx ,x−j (for some j) and in turn adding additional chains of states of length r, to maintain the size of the -DFA. t u

We can obtain even lower bounds by considering the sequence of DFAs defined in Theorem 8. Recall that for any n ≥ 1, we have a minimal DFA, which we call Mn , of size 2n − 1; the equivalent minimal -DFA, Mn0 , has size n. Applying Algorithm 1 to Mn0 , the resulting NFA of size n is also minimal. Let n0 ≥ n1 ≥ · · · ≥ nk be a sequence of integers and consider min (L(Mn0 ){#1 }L(Mn1 ) · · · {#k }L(Mnk )) = 1 + n0 +

-DFA

k X

(2ni − 1) ,

(3)

i=1

where the equality comes from Lemma 1(2). Iteratively applying Lemma 1 gives min(L(Mn0 ){#1 }L(Mn1 ) · · · {#k }L(Mnk )) = NFA

k X

ni .

(4)

i=0

To understand the difference between (3) and (4) in greater depth, let us view (n1 , . . . , nk ) as an integer partition, λ, or as a Young Diagram and assign each cell a value (see, e.g., [1]). In this case, the ith column of λ has each cell valued at 2i−1 . Transposing about y = −x gives the diagram corresponding to the transpose of λ, λT = (m1 , . . . , mn1 ), in which the ith row has each cell valued at 2i−1 . Note that m1 = k and there are, for each i, mi terms of 2i−1 . Fig. 4

Partial Word DFAs

11

Fig. 4. λ = (6, 4, 1, 1) (left) and λT = (4, 2, 2, 2, 1, 1) (right)

gives an example of an integer partition and its transpose. Define Π(λT ) = Pk Pk Pn1 i−1 mi = i=1 (2ni − 1) and Σ(λ) = i=1 ni . i=1 2 Given this, we can view the language L described in (3) and (4), i.e, L = L(Mn0 ){#1 }L(Mn1 ) · · · {#k }L(Mnk ), as being defined by the integer n0 and the partition of integers λ = (n1 , . . . , nk ) with n0 ≥ n1 . This gives min (L) = 1 + n0 + Π(λT )

-DFA

and

min(L) = n0 + Σ(λ). NFA

To further understand this, we must consider the following sub-problem: let Π(λ) = n. What are the possible values of Σ(λ)? To proceed here, we define the sequence pn recursively as follows: if n = 2k − 1 for some k, pn = k; otherwise, letting n = m + (2k − 1) for k maximal, pn = k + pm . This serves as the minimal bound for the possible values of Σ(λ). Theorem 13. If Π(λ) = n, then Σ(λ) ≥ pn . Consequently, for all n and k = blog2 (n + 1)c, k + pn ∈ N1+k+n . Proof. To show that pn is obtainable, we prove that the following partition, λn , satisfies Σ(λ) ≥ pn : if n = 2k − 1 for some k, λn = (1k ); otherwise, letting n = m + (2k − 1) for k maximal, λn = λ2k −1 + λm . Here, the sum of two partitions is the partition obtained by adding the summands term by term; (1k ) is the k-tuple of ones. Clearly, for partitions λ and λ0 , Π(λ + λ0 ) = Π(λ) + Π(λ0 ) and Σ(λ + λ0 ) = Σ(λ) + Σ(λ0 ). By construction, Π(λn ) = n and Σ(λn ) = pn . To see this, if n = 2k − 1 for some k, Π(λn ) = Π((1k )) = Π((k)T ) = 2k − 1 = n Pk and Σ(λn ) = Σ((1k )) = i=1 1 = k = pn . Otherwise, Π(λn ) = Π(λ2k −1 ) + Π(λm ) = Π((1k )) + Π(λm ) = 2k − 1 + m = n, Σ(λn ) = Σ(λ2k −1 ) + Σ(λm ) = Σ((1k )) + Σ(λm ) = k + pm = pn . To show that pn , or λn , is minimal, we can proceed inductively. From the above, each pn is obtainable by a partition of size k, where k is the maximal integer with n ≥ 2k − 1. Alternatively, k = blog2 (n + 1)c. Fixing n, we get k + pn ∈ N1+k+n . t u

12

4

E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt

Conclusion

For languages of words of equal length, Theorem 2 gives the maximum element in Dn found so far and Theorem 3 gives that maximum element when we restrict to a constant alphabet size. For languages with words of bounded length, Theorem 6 gives the least upper bound for elements in Dn based on minimal -DFAs of the form (1) and Theorem 7 gives the maximum element found so far when we restrict to a binary alphabet. For languages with words of arbitrary length, Theorem 8 gives the least upper bound of 2n − 1 for elements in Dn , bound that can be achieved over a binary alphabet. We conjecture that for n ≥ 1, [n, 2n − 1] ⊆ Dn . This conjecture has been verified for all 1 ≤ n ≤ 7 based on all our constructs from Section 2. In Section 3, via products, Theorem 10 gives an interval for Nn . If we replace products with #-concatenations, Theorem 12 increases the interval further. Theorem 13 does not give an interval, but an isolated point not previously achieved. With the exception of this latter result, all of our bounds are linear. Some of our constructs satisfy min -DFA (L) = minNFA (L), ignoring error states. As noted earlier, this is a requirement for #-concatenations to produce meaningful bounds. Constructs without this restriction are often too large to be useful.

References 1. Andrews, G.E., Eriksson, K.: Integer Partitions. Cambridge University Press (2004) 2. Berstel, J., Boasson, L.: Partial words and a theorem of Fine and Wilf. Theoretical Computer Science 218, 135–141 (1999) 3. Blanchet-Sadri, F.: Algorithmic Combinatorics on Partial Words. Chapman & Hall/CRC Press, Boca Raton, FL (2008) 4. Dassow, J., Manea, F., Merca¸s, R.: Connecting partial words and regular languages. In: Cooper, S.B., Dawar, A., L¨ owe, B. (eds.) CiE 2012, Computability in Europe. Lecture Notes in Computer Science, vol. 7318, pp. 151–161. Springer-Verlag, Berlin, Heidelberg (2012) 5. Fischer, M., Paterson, M.: String matching and other products. In: Karp, R. (ed.) 7th SIAM-AMS Complexity of Computation. pp. 113–125 (1974) 6. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation - international edition (2nd ed). Addison-Wesley (2003) 7. Yu, S.: Regular languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, chap. 2, pp. 41–110. Springer-Verlag, Berlin (1997)