International Journal of Foundations of Computer Science Vol. 15 No. 5 (2004) 687700 © World Scientific Publishing Company
Vkfi^ World Scientific V F www,WQrldscientmc.Com
FROM REGULAR WEIGHTED EXPRESSIONS TO FINITE AUTOMATA
JEANMARC CHAMPARNAUD, ERIC LAUGEROTTE, FAISSAL OUARDI* and DJELLOUL ZIADlt
Int. J. Found. Comput. Sci. 2004.15:687700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.
L.I.F.A.R., University of Rouen, 76134 MontSaintAignan Cedex, France Received 29 October 2003 Accepted 18 January 2004 Communicated by Sheng Yu ABSTRACT In this article we generalize concepts of the position automaton and ZPCstructure to the regular Kexpressions. We show that the extended ZPCstructure can be built in linear time w.r.t. the size of the Kexpression and that the associated position automaton can be deduced from it in quadratic time.
1. Introduction Weighted automata are state machines used in a lot of practical and theoretical applications such as computer algebra, nonlinear control systems, image compression, speech recognition or text processing. Regular weighted expressions allow us to encode infinite data via a finite representation. In 1961, Schiitzenberger proved the famous equivalence between weighted automata and regular weighted expressions. Recent papers deal with the conversion of a regular weighted expression into a weighted automaton. Caron and Flouret [4] work on a subset of the set of regular weighted expressions and build the position automaton [8,13] in cubic time. Lombardy and Sakarovitch [12] compute a Kautomaton that turns out to be the generalization of Antimirov automaton for the Boolean case [2, 7, 16]. In this paper, we extend the notion of ZPCstructure [15, 16] to regular Kexpressions. In the Boolean case, the complexity of constructing a ZPCstructure is linear in space and time w.r.t. the size of the expression; the complexity of the construction of the position automaton is quadratic [5]. We show here that these complexity issues still hold when regular Kexpressions are considered. Our approach is sensibly different from previous works [4, 12]. First we define the language associated with a Kexpression, which gives soundness to our extension of the notion of position automaton. Next our aim is to improve the time complexity for the conversion of regular Kexpressions: our study yields a quadratic algorithm w.r.t. the size of the Kexpression. This work is a part of a more general project whose aim is designing an algorithm for manipulating weighted automata [1]. Here our approach is to consider extensions of clas* Corresponding author. Email:
[email protected] +Email addresses:{JeanMarc.Champarnaud, Eric.Laugerotte, Djelloul.Ziadi}@univrouen.fr 687
688
J.M. Champarnaud
et al.
Int. J. Found. Comput. Sci. 2004.15:687700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.
sical Boolean notions to the case of multiplicities [16, 6] and to adapt Boolean algorithms to this case [11]. In Section 2, we first introduce the notion of Kexpression and we define the language of a Kexpression. Next we present a brief description of formal series and we give the definition of regular Kexpressions and automata with multiplicities. In Section 3, we introduce the notion of the position automaton associated to a regular Kexpression. In Section 4, we generalize the ZPCstructure [15, 16] to regular Kexpressions. Next we show that, using this structure, the position automaton can be computed in quadratic time w.r.t. the size of the regular Kexpression. 2. Preliminaries Let A be a finite alphabet, and (K, 0 , (g), 0,1) be a semiring (commutative or not). The operator star ® can be partially defined, the scalar y® eK being a solution (if there exists) of the equation y (g) x 0 1 = y and x (g) y 0 1 = y [9], [10] a . If we consider the Boolean semiring B = {0,1}, the star of any scalar is 1. But when the semiring is the set of rational numbers equipped with the standard operations, the star x® is equal to j ^ for any number x e Q such that \x\ ^ 1. More precisely, if x < 1, one has x® — 1 4 x 4 x2 4 • • • .In the following definition, we introduce the notion of Kexpression: Definition 1 Kexpressions over an alphabet A are inductively defined as follows:  a e A and k G K are Kexpressions,  ifF and G are Kexpressions, then (F 4 G), (F • G), and (F*) are Kexpressions. When there is non ambiguity, the Kexpression (F • G) will be denoted (FG). Let E be a Kexpression. We will denote AE the alphabet of E. The linearized version E of E is the Kexpression deduced from E by ranking every letter occurrence with its position in E. Subscripted letters are called positions. The size of E, denoted \E\ is the size of the syntactical tree of E. For example, if E = (  • a* 4 ^ • 6*)* • a*, we get AE = {a, &}, E = (^ • a\ + \ • 62)* • a^, Ag = {ai, b2l as} and JE7 = 13. In order to introduce the language associated with a Kexpression, we define the null term of a Kexpression E, denoted c(E). Definition 2 Let E be a Kexpression. The null term c(E) c follows:
°See Page 16, Exercise 2.3.
c(k)
=
k forallk
eK
c(a)
=
0 for all ae
c(F + G)
=
c(F)+c(G),
(3)
c{FG)
=
c{F)c{G),
(4)
c(F*)
=
c(F)\
(5)
A,
(1) (2)
From Regular Weighted Expressions
to Finite Automata
689
The null term of E = (±a* 4 6*)*a* is c(E) = (±0* + ^0*)*0*. In the following, we denote by T# the set of subexpressions of the Kexpression c{E). We define the mapping ev which associates to each term t e T E its evaluation in K: Definition 3 The mapping ev is recursively definedfrom TE to K as follows : ev(t)
Int. J. Found. Comput. Sci. 2004.15:687700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.
ev{ti+t2) ev(*it 2 ) ev(t*)
=
teK
if \t\ = l,
(6)
=
ev(*i)0ev(t 2 ),
(7)
= =
ev(ti)®ev(t 2 ), ev{t)®.
(8) (9)
The next definition introduces the concept of compact language L(E) that is obtained from the classical language C(E) by replacing each sequence of words of form uav, ua2v, • • • , ualv, • • • in C(E) by the word ua*v and eliminating each term in IK* from C{E). For example: £((g)* a *)
=
t 1 ' 2' ( 2 ^ ' " " ' l a ' 2 a ' ^ 2 ^ ' ' ' " l a a ' 2 a a ' ^ 2 ^ a a ' " ' *'
Definition 4 Le/ E he a Kexpression. The "compact" (regular) language L(E) over (AUTE)* associated to E is inductively defined as follows: L(k)
=
0,
(10)
L(a)
=
{lal},
(11)
L ( F + G)

L(F)UI(G),
(12)
L(FG)

L(F)L(G)U{c(F)}L(G)UL(F){c(G)},
(13)
L(F*) = {^^^(U^WWF)*})^.
(14)
2>1
The compact language associated to the Kexpression E = (^)*a* is L(E) = {(^)*0*lal0*, (^)*0*lal0*lal0*, • • • }. Note that according to the formulas of Definition 4, words in L(E) have the form a i l a i l a ^ l c ^ l . . . a m l a m l a m + i , where a^ G T#. In the following, we present a brief description of formal series and we define a subset of the set of IKexpressions, usually called regular Kexpressions, which are associated to regular series [3]. Definition 5 A (noncommutative) formal series with coefficients in K and variables in A is a map from the free monoid A* to K. which associates with the word w G A* a coefficient ( S » GK. A formal series is usually written as an infinite sum: S = XIUGA* (SIU)U Th e support of the formal series S is the language supp(S) = {u G A* \ (S,u) ^ 0}. The set of formal series over A with coefficients in K. is denoted by K ( ( J 4 ) ) . A structure of semiring is defined on K((A)) as follows [3, 10]:
690
J.M. Champamaud
 (S + T,u) = (ST,u)=
et al.
(S,u)®(T,u), ©
(S,tii)®(T,U2),with5,reK«A».
U\U2=U
Int. J. Found. Comput. Sci. 2004.15:687700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.
A polynomial is a formal series with finite support. The set of polynomials is denoted by K(A). It is a subsemiring of K((A)). The star of series is defined by : 5* = J2n>o Sn with 5° = £, Sn = Sn~1S if n > 0. Notice that the star of a formal series does not always exist: Propositionl [10] The star ofa formal series S G K((A)) is defined if and only if (S, e)® is defined in K. In this case: 5*
=
(S,e)®(S0(S,e)®)*
(15)
where the formal series So is defined by (So, s) = 0 and (So, u) = (5, u)for any word u. In the next computations, we will consider the previous construction of star of formal series. Definition 6 The semiring of regular series Krat(A*) C K((A)) is the smallest set of K((A)) which contains the polynomials semiring K(A), and which is stable by the operations of addition, product and star when this latter is defined. The following definition introduces the notion of regular Kexpression which allows us to represent regular series by finite writing. Definition 7 A regular ^expression is defined inductively by:  a e A and k such as k G K are regular "Kexpressions which respectively denote the regular series Sa = a and Sk = k,  ifF,G and H (s.t. ev(c(H)*) exists) are regular Kexpressions which respectively denote the regular series Sp, So and SH, then F + G, FG, and iJ* are regular Kexpressions which respectively denote the regular series Sp + So, SpSc and SH*.
The evaluation of the null term of a regular Kexpression is the coefficient of the empty word in the corresponding regular series: ev(c(E))
=
(SE,e).
(16)
This result is shown by induction on the size of the regular Kexpression using Definition 2. By Proposition 2, regular Kexpressions are Kexpressions where the evaluation of the null term is welldefined. Definition 8 Let A be a finite alphabet, and K be a semiring (commutative or not). We define an automaton with multiplicities A= (Q,qo,5,F, p) as follows: • Q is a finite set of states, •