Languages Accepted by Integer Weighted Finite Automata Vesa Halava
Turku Centre for Computer Science TUCS Lemminkäisenkatu 14 A, FIN20520 Turku, Finland. Email:
[email protected]
Tero Harju
Department of Mathematics, University of Turku FIN20014 Turku, Finland. Email:
[email protected]
Turku Centre for Computer Science TUCS Technical Report No 216 November 1998 ISBN 9521203293 ISSN 12391891
Abstract We study the family of languages accepted by the integer weighted nite automata. Especially the closure properties of this family are investigated.
Keywords: weighted nite automata, closure properties, trio, semiAFL
TUCS Research Group
Theory Group: Mathematical Structures in Computer Science
1 Introduction The integer weighted automata, as were studied in [4], [5] and [6], are closely related to the 1turn onecounter automata as considered by Baker and Book [1], Greibach [3], and especially by Ibarra [7]. In our model the counter is replaced by a weight function of the transitions, and while doing so, the nite automaton becomes independent of the counter. The dierences between onecounter automata and integer weighted nite automata as well as the deterministic integer weighted automata are considered in [4]. We shall rst give the denition of the weighted nite automata. Consider a (nondeterministic) nite automaton A = (Q; A; ; q0 ) without nal states with the states Q, the alphabet A, the set of transitions Q A Q and the initial state q0 . We shall redene the transitions using a set of edges T = ft1 ; t2 ; : : : ; tm g and a transition function : T ! . We do this to allow transitions ti and tj , where i 6= j , but (ti ) = (tj ). In other words, there may exists many copies of one transition in this new set of edges. Clearly this new denition of the transitions does not aect the language accepted by the automaton. Let G, or (G; ; ), be a group with identity . A (G)weighted nite automaton A consists of a nite automaton A = (Q; A; ; q0 ) as above, and a weight function : ft1 ; : : : ; tn g ! G of the edges. To simplify the notation, we shall write the edges in the form
t = hq; a; p; zi if (t) = (q; a; p) and (t) = z . Similarly, we shall write the transition function as a set, Q A Q Z, where
= fhq; a; p; zi j 9t 2 T : (t) = (q; a; p) and (t) = zg : In the gures we shall denote such an edge t by a;z) q (?! p:
But note that the function does not aect the computations of the nite automaton A. It will be used only in the acceptance of words. Let = ti0 ti1 : : : tin be a path of A, where (tij ) = (qij ; aj ; qij+1 ) for 0 j n ? 1. Dene a morphism k k : ft1 ; : : : ; tm g ! A by setting ktk = a if (t) = (q; a; p). The weight of the path is the element
() = (ti0 ) (ti1 ) (tin ) 2 G: Further, we let L(A ) = k ?1 ()k, that is,
L(A ) = fw 2 A j w = kk; () = g; 1
be the language accepted by A . A conguration of A is any triple (q; w; g) 2 Q A G. A conguration (q; aw; g1 ) is said to yield in a conguration (p; w; g1 g2 ), denoted by (q; aw; g1 ) j=A (p; w; g1 g2 ); if there is an edge t such that (t) = (q; a; p) with (t) = g2 . Let j=A , or simply j= if A is clear from the context, be the reexive and transitive closure of the relation j=A . We shall restrict to the case, where the group of the automaton is the additive group of integers, namely (Z; +; 0). Such automata are called integer weighted nite automata and denoted by FA(Z). Note that our denition of the integer weighted nite automata is a restricted case of the extended nite automata of Mitrana and Stiebe [8]. In the extended nite automata the underlying automata has nal states F Q and the transitions reading the empty word are allowed. Let A be an FA(Z). The empty word is always included in L(A ), since in an integer weighted nite automaton all states, including the initial state, are nal. Therefore we known that not all regular languages can be accepted by an FA(Z). On the other hand, Theorem 1. For each regular language L, there exists an FA(Z) A such that L(A ) = L [ f"g. Proof. Let L be a regular language and let A = (Q; A; ; q0 ; F ) be a (nondeterministic) nite automaton such that L(A) = L, where Q A Q. We may assume that A has one initial state q0 and one nal state qf (or two nal states q0 and qf , if " 2 L) such that there are no transitions to q0 and no transitions from qf . It is well known that such an A exists for all regular languages L. We shall dene an FA(Z) A = (Q; A; ; q0 ), where is a bijection and therefore we may dene T = . One such required weight function is dened, for (p; a; q) 2 , by 8 > 0 > > > ?1 if p 6= q0 and q = qf ; > > > :0 if p 6= q0 and q 6= qf :
(1)
It is obvious that () = 0 only in the case where is an accepting path of A. This proves our claim. As we mentioned, not all regular languages are accepted by a FA(Z). On the other hand, it is easy to show that not all languages accepted by integer 2
weighted nite automata are regular, since the language L = fanbn j n 0g is accepted by an FA(Z) of Figure 1, but L is not regular.
(b; ?1)
(a; 1)

 q0
(b; ?1)
q1
Figure 1: An FA(Z) accepting the language fan bn j n 0g. It was proved in [5] that the universe problem, asking whether all input words are accepted by an integer weighted automata, is undecidable. Actually, it was proved there that the universe problem is undecidable for a rather restricted type of 4state integer weighted nite automata. In this type the edges on every possible path have rst positive weights, then zero weights, then negative weights and then again zero weights. Any of these parts may be trivial. Such a restricted type of integer weighted nite automaton is called unimodal.
2 Closure properties In this section we consider the closure properties of the family of FA(Z) languages, denoted by LFA(Z), under various operations. We shall begin with the union. Theorem 2. The family of FA(Z) languages is closed under unions. Proof. Let L1 and L2 be two FA(Z) languages accepted by the FA(Z)'s A , where A = (Q1 ; A; 1 ; q1 ), and B , where B = (Q2 ; A; 2 ; q2 ), respectively. Assume that Q1 \ Q2 = ;. We dene an FA(Z) C , where C = (Q1 [ Q2 [ fq0g ; A; ; q0 ); and q0 2= Q1 [ Q2 is the new initial state. The edges in C are as in A and B, except that there is an edge hq0; a; p; zi for p 2 Q1 [ Q2 in C if the edge hq1; a; p; zi, where p 2 Q1 , is in A or the edge hq2; a; p; zi, where p 2 Q2, is in B. Clearly L(C ) = L(A ) [ L(B ) = L1 [ L2 . This proves the claim. Next we consider the intersection. For this, let L1 = a fbn cn j n 0g and L2 = fan bn j n 0g c . These languages can be accepted by integer weighted nite automata, but L = L1 \ L2 = fanbncn j n 0g 3
cannot be, since L is not even a contextfree language and it was proved in [4] that LFA(Z) LCF , where LCF denotes the family of contextfree languages. Theorem 3. The family of FA(Z) languages is not closed under intersections. Next we consider the concatenation operation on languages. Theorem 4. The family of FA(Z) languages is not closed under concatenations. In fact, the square L2 of a language L 2 LFA(Z) need not be in LFA(Z). Proof. Let L = fan bn j n 0g. We shall show that the (onecounter) language L2 = L L = fanbnam bm j n; m 0g cannot be accepted by an FA(Z). Assume on the contrary that there is an FA(Z) A with states Q and weight function such that it accepts L2 . Since L2 is an innite language and it contains words that have length greater than jQj, necessarily there is a cycle in A. We have now two cases to consider: 1) Suppose A has only one cycle. Assume that the weight of the cycle is d. To accept all words in L2 with only one cycle, necessarily d = 0, since L2 is innite. Let uv 2 L2 , u; v 2 L and juvj > jQj. Then we have a factorization uv = xyz , where y is read during the cycle. Since uv = xyz gives a path of weight 0, we get that xyk z 2 L(A ) for all k 2 N . But L2 does not contain any words xyk z for k > 2 and y 6= ". 2) Suppose that A has more than one cycle. If there exists a cycle of weight zero in any accepting path, then we get a contradiction as in the previous case. On the other hand, the weights of all cycles cannot be of the same sign, since L2 is innite. It follows that there must be u; v 2 L such that uv = xyzrs, and, in an accepting path = x y z r s of uv, y is read during the cycle y and r is read during the cycle r , and (y ) and (r ) are of dierent sign. But now the word xykj (r )j+1 zrkj (y )j+1 s 2 L(A ) for each k > 0 is accepted by the path
0 = x ykj (r )j+1 z rkj (y )j+1 s;
since
(0 ) = (x ) + (y )(kj (r )j + 1) + (z ) + (r )(kj (y )j + 1) + (s ) = (x ) + (y ) + (z ) + (r ) + (s ) + k( (y )j (r )j + (r )j (y )j) = 0: Since these new words are not in L2 , we get a contradiction. 4
Note that although the family of FA(Z) languages is not closed under intersections and concatenation, it is closed under these operations with regular languages (that contain the empty word). This is stated in the next theorem. Theorem 5. Let R be a regular language with " 2 R and L be an FA(Z) language. Then R \ L, RL and LR are FA(Z) languages. Proof. Assume that R is accepted by the FA B = (Q; A; ; q0 ; F ) and L with the FA(Z) A , where A = (P; A; 1 ; p0 ). Assume also that Q \ P = ;. We can transform B into a B accepting R, where is dened as in (1). We may also assume that in A all edges have even weight, since we can multiply each weight by 2, and the accepted language remains the same. For R \ L, we dene C , where C = (Q P; A; ; (q0 ; p0 )), and there is an edge h(q; p); a; (r; s); (z1 + z2)i in C , for all edges hq; a; r; z1 i in B and hp; a; s; z2 i in A . By the denition in (1) of , the weight of a path starting from (q0 ; p0 ) and ending in a (q; p) is 0 (mod 2) if and only if q 2 F , which is equivalent to the fact that the word is in R. It follows that w 2 A is in R \ L if and only if it is in L(C ). For RL, we construct an FA(Z) C , where C = (Q [ P; A; ; q0 ), by connecting the two FA(Z)'s by introducing new edges
hf; a; p; zi ; for f 2 F and a 2 A, if hp0 ; a; p; z i is an edge in A . It is then clear that L(C ) = RL.
For LR, the same construction can be used, but this time we connect the two FA(Z) in the opposite order. Next we shall consider the star operation. Let K be a language and
K =
1 [
K i:
i=0 L = fan bn j n 0g
By using again the language and the proof of Theorem 4, we can show Theorem 6. The family of FA(Z) languages is not closed under star. Proof. Assume that there is an FA(Z) A accepting L for L = fan bn j n 0g. It follows by Theorem 5 that the language
L \ a b a b = L2 can be accepted by an FA(Z), since a b a b is a regular language. But this is a contradiction by the proof of Theorem 4. 5
The family of FA(Z) languages is not closed under complement, since each FA(Z) language contains the empty word, and therefore it is not in the complement. But let us consider the complement modulo ". For any language L A , the complement modulo " of L is
L " = (A n L) [ f"g : For example, let A = fa; bg and L = fan bn j n 0g. Now there is a partition
L " = a [ bA [ aA baA [ fanbm j 0 < m < ng [ fam bn j m > n > 0g ; and this language can be accepted by the FA(Z) in Figure 2. In other words, to prove that the family of FA(Z) languages is not closed under complement modulo ", we have to use some other language than L. It is also clear that this new language cannot be regular, since the family of regular languages is closed under complement.
(a; 0) (b; 0)
(a; 0) (b; 0)
: "* " " "" (b; ?1)
(a; ?1)
" "  "a eeaaaaa ee aaaj  XX XXXXX ee XXz ee !!* ! ee ! !!! Re ! !  (a; 2)
q0
(a; 0)
(b; 0) (a; 0)
(a; 1)
(a; 1)
(a; 1)
(a; 1)
(b; ?1)
(b; ?1)
(b; 0)
(b; 0)
(b; ?1)
Figure 2: An FA(Z) accepting the complement of fan bn j n 0g modulo ". 6
Let A = fa; b; cg and S = fan bn cn j n 0g. Now S is not an FA(Z) language, since it is not a contextfree language. But S" =a [ fb; cg A [ A baA [ A c fa; bg A n
o
[ anbmck j n 6= k; k; m; n 2 N n o [ anbmck j n 6= m; k; m; n 2 N :
is a FA(Z) language, since the languages on the rst row are regular and the last two languages are easily seen to be FA(Z) languages. Since (S" )" = S; we may write
Theorem 7. The family of FA(Z) languages is not closed under complement modulo ".
It is obvious that the family of FA(Z) languages is closed under taking the image of a nonerasing morphism h : A ! B , since each transition reading a letter a 2 A can be replaced by a new path, which reads the image h(a) and has the same weight as the original transition. Note that the morphism must be nonerasing, since otherwise we would get "transitions. Next we consider inverse morphisms. Lemma 8. Let h : B + ! A+ be a morphism and A , where A = (Q; A; ; q0 ), be an FA(Z). Then there exists an FA(Z) B such that
L(B ) = h?1 (L(A )) = h?1 (w) j w 2 L(A ) : Proof. Let B , where B = (Q; B; 0 ; q0 ), and the edges of B , for all q 2 Q and b 2 B , are dened by hq; b; p; zi if (q; h(b); 0) j=A (p; "; z); where p 2 Q and z 2 Z. Note that if h(b) = ", then there is a loop hq; b; q; 0i ; for all q 2 Q. Now it is straightforward to show that (q0 ; v; 0) j=B (q; "; 0) if and only if (q0 ; h(v); 0) j=A (q; "; 0). This proves the claim. We have proved Theorem 9. The family of FA(Z) languages is closed under taking the images of nonerasing morphisms and under taking the image of arbitrary inverse morphisms. 7
Next we shall consider the shue operation, denoted by . For u; v 2 A and a; b 2 A, the shue is dened recursively by 1. (au bv) = a(u bv) [ b(au v), and 2. (" u) = (u ") = fug. For two languages K; L A , dene the shue by
K L=
[
u2K; v2L
(u v):
Theorem 10. The family of FA(Z) languages is not closed under shue. Proof. Let L1 = fan bn j n 0g and L2 = fcm dm j m 0g. We shall prove
that L1 L2 is not an FA(Z) language. Assume on the contrary that L1 L2 is an FA(Z) language. Then by Theorem 5 the language
(L1 L2 ) \ (a+ b+ c+ d+ [ f"g) = L1 L2 is also an FA(Z) language, since a+ b+ c+ d+ is regular. Since L1 L2 has a morphic image L21 , also L21 is an FA(Z) language. But this contradicts the proof of Theorem 4. Note that all these closure properties hold also for unimodal FA(Z) languages, except that in the case of inverse image of a morphism we have to assume that the morphism is nonerasing. Closure properties for unimodal and deterministic integer weighted nite automata are studied in [4].
3 Final states In this section we shall consider the integer weighted nite automata with nal states. This is done for the purposes of further closure properties of FA(Z) languages. An integer weighted nite automata with nal states is dened as the integer weighted nite automata in Section 1 except that the underlying automata has nal states, i.e. the underlying automata is A = (Q; A; ; q0 ; F ), where F Q is the set of nal states. Let be an edge set and be a weight function for A. The accepting of A is now dened by the nal states, in other words, we let
L(A ) = fw 2 A j (q0 ; w; 0) j=A (q; "; 0) for some q 2 F g;
be the language accepted by A . We shall denote the family of languages accepted by integer weighted nite automata with nal states by LfFA(Z). 8
Theorem 11. LfFA(Z) = LFA(Z) [
?S
L2LFA(Z)(L n f"g) .
Proof. It is obvious that LFA(Z) LfFA(Z), since in an FA(Z) all states are nal. We need to show that each language L 2 LfFA(Z) can be accepted by an FA(Z) or L [ " is accepted by an FA(Z). Consider now an arbitrary integer weighted nite automaton A with nal states, where A = (Q; A; ; q0 ; F ). We have two cases depending on whether or not " is in L(A ): (i) Assume that " 2 L(A ). This implies that q0 2 F . The only thing we shall change in this case is the weight function . For hq; a; p; z i 2 , let hq; a; p; z0 i be in 0 , where 8 > >2z + 1 > > 2z + 1 if q 2 F and p 2= F; > > > :2z ? 1 if q 2 = F and p 2 F: We dene an FA(Z) B , where B = (Q; A; 0 ; q0 ). By the denition of the edges in B , the weight of a path is 0 (mod 2) only when we end in a state of F . If we are in the state in F then the weight of the path is 2w, where w 2 Z is the weight of the same path in A . This proves that L(B ) = L(A ). (ii) Assume now that " 2= L(A ). Since q0 is not a nal state, we must assure that the only word accepted in the initial state is ". Therefore we dene a new initial state q00 2= Q. The edges from q00 are dened in the following way. Let
0 = [ q00 ; a; p; z j hq0 ; a; p; zi 2 : Now the automaton B = (Q [ fq00 g ; A; 0 ; q00 ; F [ fq00 g) accepts the language L(A ) [ f"g. The claim follows then by the case (i). We proved that each nonempty language in LfFA(Z) can be accepted with integer weighted nite automaton where all states are nal or all but the initial state are nal. In an integer weighted nite automata with nal states the number of nal states does not make any dierence, since it can be proved that each language in LfFA(Z) can be accepted by an integer weighted nite automaton with zero, one or two nal states depending on whether " is in the language or not. This follows, since we may dene a new nal state and make a copy of each edge ending in the nal state to this new nal state. No outgoing edges are dened for this new state. Clearly all accepted nonempty words are accepted also in this new nal state. To have " accepted we must also have the initial state as a nal state. Naturally, the empty language can be accepted only by zero nal states. 9
It is obvious that the closure properties proved for the family LFA(Z) in Section 2 also hold for the family LfFA(Z), with the dierence that Theorem 5 can be stated as follows: Theorem 12. Let R be a regular language and L be in LfFA(Z). Then R \ L, RL and LR are LfFA(Z) languages. The reason for studying the integer weighted nite automata with nal states is that, by Theorem 12, we get that LfFA(Z) is a semiAFL, where AFL stands for abstract family of languages. Indeed, LfFA(Z) is trio, meaning that it is closed under "free morphisms and inverse morphisms, and intersections with regular languages. Furthermore, a trio is a semiAFL, if it is closed under union. For these denitions and for properties of a semiAFL, we refer to [2]. A family of languages is called a AFL if it is a semiAFL and closed under concatenation and Kleene plus. Obviously LfFA(Z) is not an AFL, since by Theorem 4 it is not closed under concatenation. Neither can it be closed under Kleene plus, since by Theorem 3.1.2 in [2] this would imply that LfFA(Z) is an AFL. This also follows by the proof of Theorem 6. A nite transducer with accepting states is a 6tuple M = (K; A; B; H; p0 ; F ), where K is a nite set of states, A is the input alphabet and B is the output alphabet, H is a nite subset of K A B K , p0 2 K is the initial state and F K is the set of nal states. The elements in H are called moves. A nite transducer is a nite automaton with output, i.e. a move (p; a; b; q) means that when we are in the state p and read a 2 A as input, we move to the state q and output b 2 B . As usual, we dene the relation j=M on K A B by (p; xw; z ) j=M (q; w; zy) for each w 2 A if (p; x; y; q) is in H . The triple (p; w; z ) represents the fact that M is in state p, w is the input still to be read and z is the output so far. For each word w 2 A , we dene the set M(w) = fz 2 B j (p0 ; w; ") j=M (q; "; z); and q 2 F g ; where j=M denotes the reexive and transitive closure of j=M . The mapping M from 2A into 2B is called a rational transduction. Note that M is "free if M(w) is "free for all w 6= " and M(") contains " if and only if p0 2 F . In [2], Corollary 2 of Theorem 3.2.1 states that each trio is closed under "free rational transduction, and therefore Theorem 13. The family LfFA(Z) is closed under "free rational transductions, i.e. for all L 2 LfFA(Z) and "free nite transducers M
M(L) 2 LfFA(Z): 10
A nite transducer G = (K; A; B; H; p0 ; F ) is called a generalized sequential machine (gsm for short), if H K A B K . G is "free, if (p; a; "; q) 2= H for all p; q 2 K and a 2 A. The gsm mapping is dened as the rational transduction, and the inverse of a gsm mapping is called an inverse gsm mapping. By Corollary 3 of Theorem 3.2.2 in [2], each trio is closed under inverse gsm mappings. Therefore
Theorem 14. The family LfFA(Z) is closed under "free gsm mappings and arbitrary inverse gsm mappings.
The closure under "free gsm mappings follows from Theorem 13. Let A and B be alphabets. A function : A ! 2B is called a substitution, if 1. (") = ", and 2. (xy) = (x) (y) for all x; y 2 A . A substitution is extended to languages by dening (L) = [w2L (w). A substitution is said "free, if (a) is "free. By Theorem 3.3.1 in [2], we have the next theorem. such that, for all a 2 A, Theorem 15. Let : A ! 2B be a substitution (a) is an "free regular language, and L 2 LfFA. Then (L) 2 LfFA.
Finally, we give a negative result on the closure under "free substitution. By Proposition 3.3.3 in [2], each trio closed under "free substitutions is an AFL.
Theorem 16. The family LfFA is not closed under "free substitutions. References [1] B. Baker and R. Book, Reversalbounded multipushdown machines, J. Comput. System Sci. 8 (1974), 315332. [2] S. Ginsburg, Algebraic and automatatheoretic properties of formal languages, NorthHolland Publishing Co., Amsterdam, 1975, Fundamental Studies in Computer Science, Vol. 2. [3] S. A. Greibach, An innite hierarchy of contextfree languages, J. Assoc. Comput. Mach. 16 (1969), 91106. [4] V. Halava, Finite Substitutions and Integer Weighted Finite Automata, Tech. Report 197, Turku Centre for Computer Science, August 1998. 11
[5] V. Halava and T. Harju, Undecidability in integer weighted nite automata, Fund. Inform., to appear. [6] V. Halava and T. Harju, Undecidability of the equivalence of nite substitutions on regular language, Tech. Report 160, Turku Centre for Computer Science, February 1998. [7] O. H. Ibarra, Restricted onecounter machines with undecidable universe problems, Math. Systems Theory 13 (1979), 181186. [8] V. Mitrana and R. Stiebe, The accepting power of nite automata over groups, New Trends in Formal Language (G. P un and A. Salomaa, eds.), Lecture Notes in Comput. Sci., vol. 1218, SpringerVerlag, 1997, pp. 3948.
12
Turku Centre for Computer Science Lemminkäisenkatu 14 FIN20520 Turku Finland http://www.tucs.abo.
University of Turku Department of Mathematical Sciences
Åbo Akademi University Department of Computer Science Institute for Advanced Management Systems Research
Turku School of Economics and Business Administration Institute of Information Systems Science