from regular weighted expressions to finite automata

1 downloads 1 Views 406KB Size Report
data via a finite representation. In 1961, Schiitzenberger proved the famous equivalence between weighted automata and regular weighted expressions. Recent ...

International Journal of Foundations of Computer Science Vol. 15 No. 5 (2004) 687-700 © World Scientific Publishing Company

Vkfi^ World Scientific V F www,WQrldscientmc.Com

FROM REGULAR WEIGHTED EXPRESSIONS TO FINITE AUTOMATA

JEAN-MARC CHAMPARNAUD, ERIC LAUGEROTTE, FAISSAL OUARDI* and DJELLOUL ZIADlt

Int. J. Found. Comput. Sci. 2004.15:687-700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.

L.I.F.A.R., University of Rouen, 76134 Mont-Saint-Aignan Cedex, France Received 29 October 2003 Accepted 18 January 2004 Communicated by Sheng Yu ABSTRACT In this article we generalize concepts of the position automaton and ZPC-structure to the regular K-expressions. We show that the extended ZPC-structure can be built in linear time w.r.t. the size of the K-expression and that the associated position automaton can be deduced from it in quadratic time.

1. Introduction Weighted automata are state machines used in a lot of practical and theoretical applications such as computer algebra, non-linear control systems, image compression, speech recognition or text processing. Regular weighted expressions allow us to encode infinite data via a finite representation. In 1961, Schiitzenberger proved the famous equivalence between weighted automata and regular weighted expressions. Recent papers deal with the conversion of a regular weighted expression into a weighted automaton. Caron and Flouret [4] work on a subset of the set of regular weighted expressions and build the position automaton [8,13] in cubic time. Lombardy and Sakarovitch [12] compute a K-automaton that turns out to be the generalization of Antimirov automaton for the Boolean case [2, 7, 16]. In this paper, we extend the notion of ZPC-structure [15, 16] to regular K-expressions. In the Boolean case, the complexity of constructing a ZPC-structure is linear in space and time w.r.t. the size of the expression; the complexity of the construction of the position automaton is quadratic [5]. We show here that these complexity issues still hold when regular K-expressions are considered. Our approach is sensibly different from previous works [4, 12]. First we define the language associated with a K-expression, which gives soundness to our extension of the notion of position automaton. Next our aim is to improve the time complexity for the conversion of regular K-expressions: our study yields a quadratic algorithm w.r.t. the size of the K-expression. This work is a part of a more general project whose aim is designing an algorithm for manipulating weighted automata [1]. Here our approach is to consider extensions of clas* Corresponding author. E-mail: [email protected] +E-mail addresses:{Jean-Marc.Champarnaud, Eric.Laugerotte, Djelloul.Ziadi}@univ-rouen.fr 687

688

J.-M. Champarnaud

et al.

Int. J. Found. Comput. Sci. 2004.15:687-700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.

sical Boolean notions to the case of multiplicities [16, 6] and to adapt Boolean algorithms to this case [11]. In Section 2, we first introduce the notion of K-expression and we define the language of a K-expression. Next we present a brief description of formal series and we give the definition of regular K-expressions and automata with multiplicities. In Section 3, we introduce the notion of the position automaton associated to a regular K-expression. In Section 4, we generalize the ZPC-structure [15, 16] to regular K-expressions. Next we show that, using this structure, the position automaton can be computed in quadratic time w.r.t. the size of the regular K-expression. 2. Preliminaries Let A be a finite alphabet, and (K, 0 , (g), 0,1) be a semiring (commutative or not). The operator star ® can be partially defined, the scalar y® eK being a solution (if there exists) of the equation y (g) x 0 1 = y and x (g) y 0 1 = y [9], [10] a . If we consider the Boolean semiring B = {0,1}, the star of any scalar is 1. But when the semiring is the set of rational numbers equipped with the standard operations, the star x® is equal to j ^ for any number x e Q such that \x\ ^ 1. More precisely, if |x| < 1, one has x® — 1 4- x 4- x2 4- • • • .In the following definition, we introduce the notion of K-expression: Definition 1 K-expressions over an alphabet A are inductively defined as follows: - a e A and k G K are K-expressions, - ifF and G are K-expressions, then (F 4- G), (F • G), and (F*) are K-expressions. When there is non ambiguity, the K-expression (F • G) will be denoted (FG). Let E be a K-expression. We will denote AE the alphabet of E. The linearized version E of E is the K-expression deduced from E by ranking every letter occurrence with its position in E. Subscripted letters are called positions. The size of E, denoted \E\ is the size of the syntactical tree of E. For example, if E = ( | • a* 4- ^ • 6*)* • a*, we get AE = {a, &}, E = (^ • a\ + \ • 62)* • a^, A-g = {ai, b2l as} and |JE7| = 13. In order to introduce the language associated with a K-expression, we define the null term of a K-expression E, denoted c(E). Definition 2 Let E be a K-expression. The null term c(E) c follows:

°See Page 16, Exercise 2.3.

c(k)

=

k forallk

eK

c(a)

=

0 for all ae

c(F + G)

=

c(F)+c(G),

(3)

c{FG)

=

c{F)c{G),

(4)

c(F*)

=

c(F)\

(5)

A,

(1) (2)

From Regular Weighted Expressions

to Finite Automata

689

The null term of E = (±a* 4- |6*)*a* is c(E) = (±0* + ^0*)*0*. In the following, we denote by T# the set of subexpressions of the K-expression c{E). We define the mapping ev which associates to each term t e T E its evaluation in K: Definition 3 The mapping ev is recursively definedfrom TE to K as follows : ev(t)

Int. J. Found. Comput. Sci. 2004.15:687-700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.

ev{ti+t2) ev(*it 2 ) ev(t*)

=

teK

if \t\ = l,

(6)

=

ev(*i)0ev(t 2 ),

(7)

= =

ev(ti)®ev(t 2 ), ev{t)®.

(8) (9)

The next definition introduces the concept of compact language L(E) that is obtained from the classical language C(E) by replacing each sequence of words of form uav, ua2v, • • • , ualv, • • • in C(E) by the word ua*v and eliminating each term in IK* from C{E). For example: £((g)* a *)

=

t 1 ' 2' ( 2 ^ ' " " ' l a ' 2 a ' ^ 2 ^ ' ' ' " l a a ' 2 a a ' ^ 2 ^ a a ' " ' *'

Definition 4 Le/ E he a K-expression. The "compact" (regular) language L(E) over (AUTE)* associated to E is inductively defined as follows: L(k)

=

0,

(10)

L(a)

=

{lal},

(11)

L ( F + G)

-

L(F)UI(G),

(12)

L(FG)

-

L(F)L(G)U{c(F)}L(G)UL(F){c(G)},

(13)

L(F*) = {^^^(U^WWF)*})^.

(14)

2>1

The compact language associated to the K-expression E = (^)*a* is L(E) = {(^)*0*lal0*, (^)*0*lal0*lal0*, • • • }. Note that according to the formulas of Definition 4, words in L(E) have the form a i l a i l a ^ l c ^ l . . . a m l a m l a m + i , where a^ G T#. In the following, we present a brief description of formal series and we define a subset of the set of IK-expressions, usually called regular K-expressions, which are associated to regular series [3]. Definition 5 A (non-commutative) formal series with coefficients in K and variables in A is a map from the free monoid A* to K. which associates with the word w G A* a coefficient ( S » GK. A formal series is usually written as an infinite sum: S = XIUGA* (SIU)U- Th e support of the formal series S is the language supp(S) = {u G A* \ (S,u) ^ 0}. The set of formal series over A with coefficients in K. is denoted by K ( ( J 4 ) ) . A structure of semiring is defined on K((A)) as follows [3, 10]:

690

J.-M. Champamaud

- (S + T,u) = -(ST,u)=

et al.

(S,u)®(T,u), ©

(S,tii)®(T,U2),with5,reK«A».

U\U2=U

Int. J. Found. Comput. Sci. 2004.15:687-700. Downloaded from www.worldscientific.com by 41.140.217.76 on 03/20/13. For personal use only.

A polynomial is a formal series with finite support. The set of polynomials is denoted by K(A). It is a subsemiring of K((A)). The star of series is defined by : 5* = J2n>o Sn with 5° = £, Sn = Sn~1S if n > 0. Notice that the star of a formal series does not always exist: Propositionl [10] The star ofa formal series S G K((A)) is defined if and only if (S, e)® is defined in K. In this case: 5*

=

(S,e)®(S0(S,e)®)*

(15)

where the formal series So is defined by (So, s) = 0 and (So, u) = (5, u)for any word u. In the next computations, we will consider the previous construction of star of formal series. Definition 6 The semiring of regular series Krat(A*) C K((A)) is the smallest set of K((A)) which contains the polynomials semiring K(A), and which is stable by the operations of addition, product and star when this latter is defined. The following definition introduces the notion of regular K-expression which allows us to represent regular series by finite writing. Definition 7 A regular ^-expression is defined inductively by: - a e A and k such as k G K are regular "K-expressions which respectively denote the regular series Sa = a and Sk = k, - ifF,G and H (s.t. ev(c(H)*) exists) are regular K-expressions which respectively denote the regular series Sp, So and SH, then F + G, FG, and iJ* are regular K-expressions which respectively denote the regular series Sp + So, SpSc and SH*.

The evaluation of the null term of a regular K-expression is the coefficient of the empty word in the corresponding regular series: ev(c(E))

=

(SE,e).

(16)

This result is shown by induction on the size of the regular K-expression using Definition 2. By Proposition 2, regular K-expressions are K-expressions where the evaluation of the null term is well-defined. Definition 8 Let A be a finite alphabet, and K be a semiring (commutative or not). We define an automaton with multiplicities A= (Q,qo,5,F, p) as follows: • Q is a finite set of states, •