PARIKH MATRICES, AMIABILITY AND ISTRAIL MORPHISM

2 downloads 0 Views 213KB Size Report
Nov 9, 2010 - November. 9,. 2010 13:24 WSPC/INSTRUCTION. FILE. S0129054110007702. 1022 A. Atanasiu proposes the Istrail morphism, which extends ...
November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

International Journal of Foundations of Computer Science Vol. 21, No. 6 (2010) 1021–1033 c World Scientific Publishing Company

DOI: 10.1142/S0129054110007702

PARIKH MATRICES, AMIABILITY AND ISTRAIL MORPHISM

ADRIAN ATANASIU Faculty of Mathematics and Computer Science, Bucharest University Str. Academiei 14, Bucharest 010014, Romania [email protected] http://www.galaxyng.com/adrian atanasiu Received 5 October 2009 Accepted 18 January 2010 Communicated by Arto Salomaa Using the fact that the Parikh matrix mapping is not an injective mapping, the paper investigates some properties of the set of words having the same Parikh matrix; these words are called “amiable” or “M - equivalent”. The presented paper uses the results obtained in [3] for the binary case. The aim is to distinguish the amiable words by using a morphism that provides additional information about them. The morphism proposed here is the Istrail morphism. Keywords: Parikh matrix mapping; amiable words; scattered subwords, subword occurrences, Istrail morphism.

1. Preliminary The idea of identifying binary sequences using as input data the number of a’s and b’s is quite old. Unfortunately this information (given by the Parikh mapping  associated to the sequence) is insufficient: there are

x+y x

binary sequences α with

|α|a = x and |α|b = y (we denote by |α|w the number of appearances of the scattered sequence w in α). Once the Parikh matrix mapping ([10]) has been defined and especially when the Parikh matrix mapping associated with the binary sequences ([3]) has been studied (where |α|ab is also taken into consideration), the number of sequences defined by the same characteristics has been drastically decreased. For instance, from the 184756 binary sequences α having |α|a = 10 and |α|b = 10, only 5448 do have |α|ab = 50. This represents almost 3% from all 184756 posible strings. Because this number still remains quite large, the possibility of identifying the sequences by using this procedure is reduced (especially for the balanced sequences, when |α|0 is almost equal to |α|1 ). A remarkable improvement seems to be the use of some morphisms which distinguish the amiable binary words by their Parikh matrices. To this aim, the paper 1021

November 9, S0129054110007702

1022

2010 13:24 WSPC/INSTRUCTION

FILE

A. Atanasiu

proposes the Istrail morphism, which extends the binary sequence over a 3 - letter ordered alphabet: {a, b, c}. The number of subsequences of abc’s distinguishes many binary sequences that were identical by means of the Parikh matrix mapping. For instance, only 98 from 5392 binary sequences α having |α|a = 10, |α|b = 10 and |α|ab = 48 have |α|abc = 770. One could check that 98 is the maximum number of sequences (from 184756 possible) keeping the values (|α|a , |α|b , |α|ab , |α|abc ) costant (for other details see Example 4 below). One problem seems to be the length of the images of words, which grows quickly when the Istrail morphism is applied several times. As it will be shown, the Parikh matrices associated to the Istrail images of binary words α can be generated using only the information provided by the binary sequences α. 2. Introduction The Parikh matrix mapping (introduced in [10]) is an extension of the Parikh mapping ([11]). The extension is based on a special type of matrices, where the classical Parikh vector appears as the second diagonala. We start with some basic notations and definitions. Let N be the set of nonnegative integers and Σ be an alphabet. The set of all words over Σ is Σ∗ and λ denotes the empty word. For α ∈ Σ∗ , |α| denotes the length of α. The mirror image of a word α ∈ Σ∗ , denoted mi(α), is defined as: mi(l) = l, mi(x1 x2 . . . xn ) = xn . . . x2 x1 , where xi ∈ Σ, 1 ≤ i ≤ n. In this paper a ternary alphabet Σ1 = {a, b, c} will be used, where an order relation < is defined. Without loss of generality, we consider a < b < c. The basic properties will be defined in the binary alphabet Σ = {a, b} and the results obtained in [3] will be followed. The number of occurrences of a letter a ∈ Σ in a word α ∈ Σ∗ is denoted by |α|a . If u, v ∈ Σ∗ , then the word u is a scattered subword of v if u = β1 β2 . . . βr and v = γ0 β1 γ1 . . . γr−1 βr γr , for some r ≥ 1 and βi , γj ∈ Σ∗ . We denote by |α|u the number of occurrences of u in α as a scattered subword. For instance |abab|ab = 3. If A and B are two finite nonempty alphabets, a morphism on A is an application f : A∗ −→ B ∗ such that f (uv) = f (u)f (v) for all u, v ∈ A∗ . It is uniquely determined by its value on the alphabet A. Definition 1. Let Σ = {a1 , a2 , . . . , as } be an ordered alphabet and Ms+1 be the multiplicative monoid of (s + 1) - dimensional upper-triangular matrices with nonnegative integral entries and unit diagonal. The Parikh matrix mapping, denoted Ψs , is the morphism Ψs : Σ∗ −→ Ms+1 a By

the second diagonal of an (s + 1) × (s + 1) matrix M we mean the diagonal of length s immediately above the main diagonal.

November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

Parikh Matrices, Amiability and Istrail Morphism

1023

defined by the condition: if k = 1, . . . , s and Ψs (ak ) = (mi,j )1≤i,j≤s+1 , then for each 1 ≤ i ≤ s + 1, mi,i = 1, mk,k+1 = 1, all other elements of the matrix Ψs (ak ) being 0. Because in this paper s = 2 or s = 3 there will be no confusion if we denote Ψs (α) by Mα . A matrix M ∈ Ms+1 such that M = Mα for a particular word α ∈ Σ∗ is called a Parikh matrix. The following result will be needed in the sequel. Theorem 1 ([10]) Consider Σ = {a1 , a2 , . . . , as } and α ∈ Σ∗ . The matrix Mα = Ψs (α) = (mi,j )1≤i,j≤s+1 has the following properties: • mi,j = 0 for all 1 ≤ j < i ≤ s + 1, • mi,i = 1 for all 1 ≤ i ≤ s + 1, • mi,j+1 = |α|ai ...aj for all 1 ≤ i ≤ j ≤ s. Example 1. For the alphabet Σ = {a, b, c}, Theorem 1 implies that   1 |α|a |α|ab |α|abc  0 1 |α|b |α|bc   Mα =  0 0 1 |α|c  0 0 0 1 Definition 2. Two words α, β ∈ Σ∗ are called “amiable”, denoted α ∼a β, if and only if Mα = Mβ b . For further notions and results on Parikh matrix mapping, as well as for language-theoretic considerations not detailed here, the reader is referred to [3], [5], [6], [7], [10], [12], [13] and references given therein. 3. The Istrail Morphism In order to obtain a good “disambiguation” of binary amiable words, a very interesting direction seems to be using the information provided by the Istrail morphism ([8], [9]). 3.1. Definitions and general properties Let us consider a binary ordered alphabet Σ = {a, b} and its extension Σ1 = {a, b, c}. Definition 3. The Istrail morphism ([8]) is a mapping φ : Σ∗1 −→ Σ∗1 defined φ(a) = abc, b In

[12] the term “M - equivalent” is used

φ(b) = ac,

φ(c) = b.

November 9, S0129054110007702

1024

2010 13:24 WSPC/INSTRUCTION

FILE

A. Atanasiu

Let us define ∀α ∈ Σ∗ , φ0 (α) = α,

φk+1 (α) = φk (φ(α)) = φ(φk (α))

As a remark, φk (α) ∈ Σ∗ only when k = 0; otherwise, φk (α) ∈ Σ∗1 . Lemma 1. For any α ∈ Σ∗1 and k ∈ N : 1. |φk+1 (α)|a = |φk+1 (α)|c = |φk (α)|a + |φk (α)|b 2. |φk+1 (α)|b = |φk (α)|a + |φk (α)|c 3. |φk+1 (α)|aa = |φk+1 (α)|ca = |φk+1 (α)|cc = = |φk (α)|aa + |φk (α)|ab + |φk (α)|ba + |φk (α)|bb k+1 4. |φ (α)|ab = |φk (α)|a + |φk (α)|aa + |φk (α)|ac + |φk (α)|ba + |φk (α)|bc 5. |φk+1 (α)|ac = |φk (α)|a + |φk (α)|b + |φk (α)|aa + |φk (α)|ab +|φk (α)|ba + |φk (α)|bb k+1 6. |φ (α)|ba = |φk (α)|aa + |φk (α)|ab + |φk (α)|ca + |φk (α)|cb k+1 7. |φ (α)|bb = |φk (α)|aa + |φk (α)|ac + |φk (α)|ca + |φk (α)|cc 8. |φk+1 (α)|bc = |φk (α)|a + |φk (α)|aa + |φk (α)|ab + |φk (α)|ca + |φk (α)|cb 9. |φk+1 (α)|cb = |φk (α)|aa + |φk (α)|ac + |φk (α)|ba + |φk (α)|bc Proof. All these equalities are obtained from 

(∀x ∈ Σ1 ) |φk+1 (α)|x = 

(∀x, y ∈ Σ1 ) |φk+1 (α)|xy =

X

u∈Σ1 ,|φ(u)|xy >0

X



u∈Σ1 , |φ(u)|x >0

φk (α)|u 

|φk (α)|u +

X

u,v∈Σ1 ,|φ(u)|x ·|φ(v)|y >0

(1)



|φk (α)|uv 

(2) For the relation (1), let us denote w = φk (α). The character ’x’ will appear in φ(w) only by applying the morphism φ on each letter u from w and checking if φ(u) contains the letter x. Therefore the number of appearances of the letter ’x’ in φ(w) is the sum of the number of characters u from w having the property |φ(u)|x 6= 0. The relation (2) will be proved in a similar way. The main problem is the exponential growth of the lengths of words. Generally we have Theorem 2. For α ∈ Σ∗ , and k ∈ N : (1) |φk+1 (a)| = 3 · 2k , |φk+1 (b)| = 2k+1 , |φk+1 (c)| = 2k ; (2) |φk+1 (α)| = 3 · 2k · |α|a + 2k+1 · |α|b .

November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

Parikh Matrices, Amiability and Istrail Morphism

1025

Proof. (1) Induction on k. For k = 0, obviously |φ(a)| = 3, |φ(b)| = 2, |φ(c)| = 1. k −→ k + 1: Let us denote xk = |φk (a)|, yk = |φk (b)|. Then |φk (c)| = |φk−1 (φ(c))| = |φk−1 (b)| = yk−1 . We have xk+1 = |φk+1 (a)| = |φk (φ(a))| = |φk (abc)| = |φk (a)| + |φk (b)| + |φk (c)| = xk + yk + yk−1 , and yk+1 = |φk+1 (b)| = |φk (φ(b))| = |φk (ac)| = |φk (a)| + |φk (c)| = xk + yk−1 . The system of recurrences  xk+1 = xk + yk + yk−1 yk+1 = xk + yk−1 with initial values y1 = 2, y2 = 4, and x1 = 3 is xk = 3 · 2k−1 , yk = 2k . (2) We shall prove by induction the relation |φk+1 (α)| = 2k |φ(α)|. For k = 0 is obvious. For k = 1: |φ2 (α)| = |φ2 (α)|a + |φ2 (α)|b + |φ2 (α)|c = (|φ(α)|a + |φ(α)|b ) + (|φ(α)|a + |φ(α)|c ) + (|φ(α)|a + |φ(α)|b ) = 3|φ(α)|a + 2|φ(α)|b + |φ(α)|c = 3(|α|a + |α|b ) + 2(|α|a + |α|c ) + (|α|a + |α|b ) = 6|α|a + 4|α|b = 2(3|α|a + 2|α|b ) = 2|φ(α)|. In this sequence of computations we used: (i) Lemma 1, (ii) α ∈ Σ∗ (hence |α|c = 0), and (iii) |φ(α)| = |φ(α)|a + |φ(α)|b + |φ(α)|c = 3|α|a + 2|α|b . Let us consider k ≥ 1. In this case |φk+1 (α)| = |φk (φ(α))| = 2k−1 |φ2 (α)| = 2k−1 · 2|φ(α)| = 2k |φ(α)| Now, we use the fact that |φ(α)| = 3|α|a + 2|α|b . Therefore |φk+1 (α)| = 2k (3|α|a + 2|α|b ) = 3 · 2k |α|a + 2k+1 |α|b Because the length of the images φk (α) with α ∈ Σ∗ increases quickly, it is difficult to work with the sequences φk (α). So, we are interested to obtain data about Parikh matrices Mφk (α) using only entries of     1 |α|a |α|ab 1nq Mα =  0 1 |α|b  =  0 1 p  (3) 0 0 1 001 Theorem 3. Let α ∈ Σ∗ . Then, for any k ∈ N :

November 9, S0129054110007702

1026

2010 13:24 WSPC/INSTRUCTION

FILE

A. Atanasiu

p 3n + 2p k (1) |φk+1 (α)|a = |φk+1 (α)|c = · (−1)k + ·2 ; 3 3 2p 3n + 2p k (2) |φk+1 (α)|b = · (−1)k+1 + ·2 3 3 Proof. Let us denote xk = |φk (α)|a , yk = |φk (α)|b , zk = |φk (α)|c . We have to solve the system of recurrences  xk = xk−1 + yk−1 y = xk−1 + zk−1  k zk = x k with initial values x1 = n + p, x2 = 2n + p, y1 = n. After the substitution of zk−1 = xk−1 , and then of yk , we obtain xk = xk−1 + 2xk−2 ,

yk = 2xk−1

(k ≥ 2)

The solution is the set of relations asserted by the Theorem. Therefore, for every k ∈ N , the Parikh vector of the word φk (α) is completely determined by the Parikh vector of α. Theorem 4. Let α, β ∈ Σ∗ be two amiable words. Then ∀k ≥ 0 |φk (α)|w = |φk (β)|w ,

w ∈ Σ1 ∪ (Σ1 )2



 1nq Proof. Let us consider M =  0 1 p  the common Parikh matrix of α and β. For 001 any character w with |w| = 1, the Theorem results from Theorem 3. The assertion remains to be proved only for the words w of length 2. We shall use an induction on k: For k = 0, with Lemma 1 and φ0 (α) = α ∈ Σ∗ , we can easily obtain the equalities |α|ab = |β|ab , |α|ba = |β|ba Let us consider now |φk (α)|xy = |φk (β)|xy for all x, y ∈ Σ1 . Using (2) and induction on k will result |φk+1 (α)|xy = |φk+1 (β)|xy , ∀x, y ∈ Σ1 Theorem 5. Let α, β ∈ Σ∗ be two binary words. If ∃k ∈ N with φk (α) ∼a φk (β) then α ∼a β. Proof. Let us consider the Parikh matrices of α and β:     1 n 2 q2 1 n 1 q1 Mβ =  0 1 p 2  Mα =  0 1 p 1  , 0 0 1 0 0 1

November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

Parikh Matrices, Amiability and Istrail Morphism

1027

From the system |φk (α)|a = |φk (β)|a , |φk (α)|b = |φk (β)|b (see Theorem 3) we obtain n1 = n2 , p1 = p2 . From |φ0 (w)|ab = q for an arbitrary w ∈ Σ∗ (see (3)) and (2) we have |φk (w)|ab = fk,n,p + gk · q, where fk,n,p and gk are positive integers. By replacing n2 with n1 and p2 with p1 in the equality |φk (α)|ab = |φk (β)|ab will result q1 = q2 . Therefore Mα = Mβ , so α ∼a β. Example 2. Let us suppose that for α, β ∈ Σ∗ we have φ2 (α) ∼a φ2 (β). Then: (1) From Theorem 3: |φ2 (α)|a = 2n1 + p1 , |φ2 (β)|a = 2n2 + p2 , |φ2 (α)|b = 2n1+ 2p1 , |φ2 (β)|b = 2n2 + 2p2 . 2n1 + p1 = 2n2 + p2 Therefore 2n1 + 2p1 = 2n2 + 2p2 with solution n1 = n2 , p1 = p2 . (2) From Lemma 1: |φ2 (α)|ab = 2p21 + p1 + n21 + n1 + 2p1 n1 + 2q1 |φ2 (β)|ab = 2p22 + p2 + n22 + n2 + 2p2 n2 + 2q2 . Because |φ2 (α)|ab = |φ2 (β)|ab and n1 = n2 , p1 = p2 , will result q1 = q2 . Finally, Mα = Mβ , that is α ∼a β. Remark 1. A stronger result like  (∀k ∈ N ) φk+1 (α) ∼a φk+1 (β)

=⇒

φk (α) ∼a φk (β)

is not true. For example, if α = abba, β = baab, then: 1. α ∼a β 2. φ(α) 6∼a φ(β) (|φ(α)|abc = 8, |φ(β)|abc = 12), 3. φ2 (α) 6∼a φ2 (β) (|φ2 (α)|abc = 65, |φ2 (β)|abc = 53), 4. φ3 (α) ∼a φ3 (β) (|φ3 (α)|abc = |φ3 (β)|abc = 3318).



The converse of Theorem 5 is not true. The basic idea is that the entry |φk (α)|abc of the Parikh matrix Mφk (α) depends on several 3-letters subsequences with nonregular properties: |φk (α)|abc = |φk−1 (α)|a + 2|φk−1 (α)|aa + |φk−1 (α)|ab + |φk−1 (α)|ba + |φk−1 (α)|aaa + |φk−1 (α)|aab + |φk−1 (α)|aca + |φk−1 (α)|acb + |φk−1 (α)|baa + |φk−1 (α)|bab + |φk−1 (α)|bca + |φk−1 (α)|bcb (4) But the next result can be easily deducted (from (4)):   ∀α ∈ Σ∗ , k ∈ N (α ∼a mi(α)) ⇐⇒ (φk (α) ∼a φk (mi(α))) 4. Some Particular Cases In the following we shall consider two particular cases: k = 1 (the Istrail morphism) and k = 3 (the Istrail tri-morphism).

November 9, S0129054110007702

1028

2010 13:24 WSPC/INSTRUCTION

FILE

A. Atanasiu

4.1. The case k = 1 The next result is a major step in the attempt of disambiguating binary amiable words. Theorem 6. Let α = abγba and β = baγab (γ ∈ Σ∗ ) be two binary amiable words. Then φ(α) and φ(β) are not amiable. Proof. By applying the Istrail morphism φ we have φ(α) = abcacφ(γ)acabc,

φ(β) = acabcφ(γ)abcac

Now, we obtain |φ(α)|abc = 8 + |φ(γ)|a + 4|φ(γ)|b + |φ(γ)|c + 2|φ(γ)|ab + 2|φ(γ)|bc + |φ(γ)|abc |φ(β)|abc = 12 + 2|φ(γ)|a + 4|φ(γ)|b + 2|φ(γ)|c + 2|φ(γ)|ab + 2|φ(γ)|bc + |φ(γ)|abc . Therefore |φ(β)|abc − |φ(α)|abc = 4 + |φ(γ)|a + |φ(γ)|c = 4 + 2(|γ|a + |γ|b ) = 4 + 2|γ| > 0. Theorem 7. Let α, β ∈ Σ∗ be two binary words with |α|w = |β|w , w ∈ {aab, baa, bab}. Then φ(α) ∼a φ(β)

⇐⇒

α ∼a β.

Proof. For α ∈ Σ∗ with the Parikh matrix defined in (3), we can establish the evaluation n(n − 1)(n − 2) + |α|bab + |α|aab + |α|baa |φ(α)|abc = np + n2 + 6 Indeed, from (4) will result (for k = 1): |φ(α)|abc = |α|a +2|α|aa + |α|aaa + |α|ab + |α|ba + |α|aab + |α|baa + |α|bab n Using |α|ai = , i ≥ 1, we obtain i |φ(α)|abc = np + n2 + n(n − 1)(n − 2)/6 + |α|bab + |α|aab + |α|baa . From this relation, the Theorem is obvious. The next result is important from the complexity point of view: the value |φ(α)|abc can be determined directly from α, without counting subsequences. Theorem 8. Let α ∈ Σ∗ be a word with |α| = s and I = {i | pri (α) = a} (pri (α) represents the i-th character of the string α). Then X |φ(α)|abc = i · (s + 1 − i). i∈I

Proof. α is defined over the binary alphabet Σ = {a, b}. Therefore the characters ’b’ appear in φ(α) only by applying φ on the a’s from α (because |φ(α)|b = |α|a ). Let α = xay be a representation which points out one appearance of the letter

November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

Parikh Matrices, Amiability and Istrail Morphism

1029

’a’; so, φ(α) = φ(x)abcφ(y). Then, the number of abc’s generated by this ’a’ is (1 + |φ(x)|a ) · (1 + |φ(y)|c ) = (1 + |x|a + |x|b )(1 + |y|a + |y|b ) = (1 + |x|)(1 + |y|). But |x| + |y| = s − 1; therefore, by denoting |x| = i − 1, the proof is finished. From Theorem 8 some interesting results can be obtained; for example:   n+2 n Corollary 1. ∀n ≥ 1, |(abc) |abc = . 3 Example 3. Let us consider Example 3 from [3], where all words with the Parikh vector Ψ = (19, 2) are listed. After applying the Istrail morphism, there are no amiable words α, β with φ(α) ∼a φ(β). Indeed, Table 1 |α|ab 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |Cα | 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 #φ 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10

Table 1 lists all binary words having the Parikh vector Ψ = (19, 2) (in fact, only the first half of the Table 1 is constructed; according to Lemma 1 [3], the second half is a reflected copy of the first half ). For every value of q = |α|ab , the second row of the table shows the number of amiable words from the set Cα = {w | w ∼a α}: the words having the Parikh matrix   1 19 q M = 0 1 2 0 0 1

The third row gives the number of classes of amiable words in which the set {φ(w) | w ∈ Cα } is divided. Example 4. The result obtained in Example 3 is a best possible one; unfortunately, this situation is not always met. Let us consider the Parikh matrix   1 10 20 M =  0 1 10  0 0 1

Cα contains 433 amiable words. After applying the Istrail morphism, the largest set of amiable words is C = {α ∈ Σ∗ | Mα = M, |φ(α)|abc = 750}. It has 14 elements. Namely:    abbbbbbbabbaaaaaaaba, babbbbbabbabaaaaabaa, babbbbbbabaaabaabaaa,         bbabbabbbbbaaaaaabaa, bbabbbabbbaabaaabaaa, bbabbbbaabbbaaaabaaa,  C = bbabbbbababaababaaaa, bbabbbbbaaaabbbaaaaa, bbbababbbbaaababaaaa,      bbbabbababbabaabaaaa, bbbabbabbaababbaaaaa, bbbbaabbabbaabbaaaaa,      bbbbabaabbabbabaaaaa, bbbbbaaaabbbbbaaaaaa

November 9, S0129054110007702

1030

2010 13:24 WSPC/INSTRUCTION

FILE

A. Atanasiu

Theorem 7 shows that the implication α ∼a β =⇒ φ(α) ∼a φ(β) is true only in some circumstances; a special situation is pointed out by the next Lemma: Lemma 2. Let w, w0 ∈ Σ∗ be two binary words. If w0 = α1 baα2 abα3 abα4 baα5

(1) w = α1 abα2 baα3 baα4 abα5 , (2) |α1 | = |α5 |, |α2 | = |α4 |, then φ(w) ∼a φ(w0 ).

Proof. Let us suppose |α1 | = i, |α1 abα2 | = j; then, the first two a’s from the representation of w (see (4)) are on the places i + 1 and respectively j + 2. Hence, the other two a’s from w are on positions n + 1 − (i + 1) − 1 = n − i − 1 and n − j respectively, where n = |w| = |w 0 |. In w0 , all the 4 a’s from the representation (4) are situated on the positions i + 2, j + 1, n − j − 1 and n − i respectively. Then |φ(w)|abc = S + (i + 1)(n + 1 − i − 1) + (j + 2)(n + 1 − j − 2) + (n − j)(j + 1) + (n − i − 1)(i + 2) = |φ(w 0 )|abc where S is the number of abc’s generated by all characters ’a’ present in the sequence α1 α2 . . . α5 . The other equalities between the entries of the two Parikh matrices associated with φ(w) and φ(w0 ) respectively, are checked easily. 4.2. The case k = 3 Let Σ = {a, b} be a binary alphabet and α ∈ Σ∗ . We define the morphism Φ(α) = φ3 (α) The next result can be established. Theorem 9. Let us define the Parikh (constant) matrices:      11 1110 1211 0 1 0 1 2 0 0 1 1 1   C= B= A= 0 0 0 0 1 1, 0 0 1 1, 00 0001 0001

 00 1 1  1 2 01

Then for every word α = a1 a2 . . . an , ai ∈ Σ, MΦ(α) = (AXC)n where the i-th X (i = 1, 2, . . . , n) has the value B if i ∈ I (see Theorem 8); otherwise, the value of the i-th X is I4 . Proof. Because MΦ(α) = MΦ(a1 ) · MΦ(a2 ) · . . . · MΦ(an ) ,

November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

Parikh Matrices, Amiability and Istrail Morphism

1031

it only remains to check the equalities MΦ(a) = ABC,

MΦ(b) = AC

We compute the values Φ(a) = φ3 (a) = (abca)(cbab)(cbac) Φ(b) = φ3 (b) = (abca)(cbac) (the parenthesis are used only for grouping the subwords). It is easy to see that A is the Parikh matrix of the word abca, B is the Parikh matrix of cbab, and C is the Parikh matrix of cbac. Remark 2. For a word α ∈ Σ∗ of length s, the matrix MΦ(α) is a product of k = |α|a matrices   1 4 9 17 0 1 4 9   MΦ(a) = ABC =  0 0 1 4  000 1 and s − k matrices   1335 0 1 2 4  MΦ(b) = AC =   0 0 1 3 . 0001 The order of the product is the same with order of the appearances of the characters ’a’ and ’b’ (in α).

Lemma 3. Let α ∈ Σ∗ , |α| = s. Then MΦ(α) = AX1 Y m1 X2 Y m2 . . . Y mn Xs C where (1) Y = CA; k (2) Xi ∈ {B, I4 } and   X1 X2 . . . Xs = B , |α|a = k; 3t(t − 1)(2t + 1) 2 + 2 1 3t 3t − t   2   2 t   0 1 2t 3(t − t + 1) (3) ∀t ≥ 1, Y =    0 0 1 3t 0 0 0 1 n X (4) mr = s − 1 and exactly k terms of the sum are nonzero. r=1

Proof. The first two parts result immediate from Theorem 9.   1322 0 1 2 3  3. We have Y = CA =   0 0 1 3 . 0001

November 9, S0129054110007702

1032

2010 13:24 WSPC/INSTRUCTION

FILE

A. Atanasiu

 1 a t bt ct  0 1 d t et  t+1  = Y t · Y will result the recurrences If we denote Y t =   0 0 1 ft , from Y 0 0 0 1  bt+1 = bt + 2at + 2,  at+1 = at + 3, dt+1 = dt + 2, et+1 = et + 3dt ,  ft+1 = ft + 3, ct+1 = ct + 3at + 3bt + 2 with initial values a1 = 3, b1 = 2, c1 = 2, d1 = 2, e1 = 3, f1 = 3. The solution of these recurrences is the Parikh matrix from the Lemma. 4. The sum is obvious. The second remark is a result of the fact that Y ·B 6= B·Y . 

Theorem 10. ∀w, w 0 ∈ Σ∗ , if w = αabbaβ, w 0 = αbaabβ then Φ(α) ∼a Φ(β). Proof. The relation is true because

2 2 MΦ(a) · MΦ(b) · MΦ(a) = MΦ(b) · MΦ(a) · MΦ(b)

 1 28 340 3318  0 1 24 344   =  0 0 1 28  0 0 0 1 

From this theorem it results that some binary amiable words are distinguished by the morphism φ but not by Φ. Corollary 2. Φ(αabγbaβ) ∼a Φ(αbaγabβ)

⇐⇒

γ=λ

5. Conclusions and Future Work The attempt to reduce the ambiguity (defined by the relation of amiability) of binary words by using morphisms is quite promising. As it was shown in [1], the number of amiable words which preserve this property on their images by a morphism can be – for some morphisms – significantly lower. The Istrail morphism φ is such a solution. It has some good properties: it is a weakly square-free morphism ([9]); between the 3-letters morphisms, it has the best behavior concerning the disambiguisation action of amiable words ([1]). Moreover – using this morphism – a good Message Authentication Code was proposed in [4]. But the Istrail morphism also shows the weaknesses of the 3 - letter morphisms in the desambiguisation process: there are binary words – for instance α and mi(α) with mi(α) ∈ Cα – which remain amiable no matter how many times the Istrail morphism is applied. Moreover, it is surprising, but starting with k = 3, the classes of amiable words defined by the original Parikh matrices are partially recomposed.

November 9, S0129054110007702

2010 13:24 WSPC/INSTRUCTION

FILE

Parikh Matrices, Amiability and Istrail Morphism

1033

Acknowledgments I would like to thank the anonymous referees for their valuable comments and useful remarks about this paper. References [1] A. Atanasiu - Morphisms on Amiable Words, submitted to LATA 2010 [2] A. Atanasiu, R. Atanasiu, I. Petre - Parikh Matrices and Amiable Words, TCS vol. 390, no. 1 (2008), pp. 102-109 [3] A. Atanasiu - Binary amiable words, Intern. J. Found. Comput. Sci. 18, 2(2007), 387-400. [4] A. Atanasiu, R. Atanasiu - Message Authentication Code based on Parikh Matrices, SECITC C 2008, 27-28 Nov. 2008 Bucharest, pp. 7-14 [5] A. Atanasiu, C. Martin - Vide, Al. Mateescu - On the injectivity of Parikh matrix mapping, Fundamenta Informaticae 49 (2001), 166-180. [6] A. Atanasiu, C. Martin - Vide, Al. Mateescu - Codifiable languages and Parikh matrix mapping, Journal of Universal Computer Science, 7, 9(2001), 783-793. [7] S. Fosse, G. Richmomme - Some characterisations of Parikh matrix equivalent binary words, Inf. Processing Letters Vol. 92(2), 77-82 (2004). [8] S. Istrail - On irreducible languages and nonrational numbers, Bull. Math. Soc. Sci. Math. R. S. Roumanie 21 (1977), 301 - 308. [9] S. Kitaev, T. Mansour, and P. Sebold - Counting ordered patterns in words generated by morphisms, Integers: Electronic Journal of Combinatorial Number Theory 8 (2008), #A03. [10] Al. Mateescu, A. Salomaa, K. Salomaa, S. Yu - On the extension of the Parikh mapping, Theoret. Informatics Appl. 35 (2001), 551-564. [11] R.J. Parikh - On context-free languages, J. Assoc. Comput. Mach., 13 (1966), 570-581. [12] A. Salomaa - Connections between subwords and certain matrix mappings, Theor. Comput. Sci. 340 (2005) 188-203. [13] A. Salomaa - Independence of certain quantities indicating subword occurrences, Theoretical Computer Science 362, 1(2006), 222-231.