Combining of Off-line and On-line Feature Extraction ... - IAPR TC11

63 downloads 1459 Views 380KB Size Report
are based on the verification of signatures, in criminal cases where a sample ..... 10. 15. Figure 6. Multi-Fractal spectrum for the On-line signal of the word presented in Figure 3 .... [12] S. Al-Ma'adeed, E. Mohammed, D. Al Kassis, F. Al-Muslih.
2011 International Conference on Document Analysis and Recognition

Combining of Off-line and On-line Feature Extraction Approaches for Writer Identification Aymen Chaabouni, Houcine Boubaker Monji Kherallah and Adel M. Alimi REGIM: REsearch Group on Intelligent Machines, University of Sfax, National School of Engineers BP 1173, Sfax, 3038, Tunisia {ayman.chaabouni, houcine-boubaker, monji.kherallah, adel.alimi}@ieee.org

Haikal El Abed Technische Universit¨at Braunschweig, Institute for Communications Technology (IfN),Braunschweig, Germany [email protected]

analysis. They use the Gabor filter and the calculation of cooccurrence matrix for the features extraction, and k-nearest neighbors and the Euclidean distance for the identification. Bulacu et al. [5] used the edge-based directional probability distributions as features for writer identification. Bensefia [6], completed the study made by Nosary [7], on the invariants of the writer. This approach is based on the comparison of the respective grapheme of documents, by a measure of similarity. Schlapbach et al. [8], presented an on-line system of writer identification based on one gaussian mixture models. To test the system the writings of 200 writers were used. The identification rate was 88.96% at the text line level, and 98.56% at the paragraph level. In other works, Chapran and Fairhurst [9] propose a method for dynamic writer identification which uses the relation between static and dynamic information in a handwritten text. The topic of writer identification from on-line Arabic handwriting has not been addressed in the literature at our knowledge. However, some researches that treat the Arabic off-line handwriting have been performed. A study for Arabic writer identification made by Al-Zoubeidy et al. [10] has adapted the approach proposed by Said et al. [4]. The features are extracted from the image of writing using a gabor filter and multichannel calculation of co-occurrence matrix. An identification rate of 92.8% was achieved with the Euclidean distance. Gazah and Ben Amara [11] propose the use of 2D Discrete wavelet Transforms for Arabic writer identitification. Al-Ma’adeed [12] evaluated the performance of edge-based directional probability distributions in Arabic text-dependent writer identification. Another work on Arabic, using the benchmarking IfN/ENIT database [13], [14], was made by Bulacu et al. [15], where probability distribution functions are extracted. We present in this paper a new method for text-dependent writer identification, where the author, has to write the same samples of writing that have been used for the training phase. This work is an extension and combination with the work presented in [16] that consists to identify the writer from the images of words. Our contribution in this study

Abstract—Writer identification still remains as a challenge area in the field of off-line handwriting recognition because only an image of the handwriting is available. Consequently, some information on the dynamic of writing, which is valuable for identification of writer, is unavailable in the off-line approaches, contrary to the on-line approaches where temporal and spatial information for the handwriting is available. In this paper we present a new method for writer identification based on Multi-Fractal features for both types of presented approaches. This method consists to extract the multi-fractal dimensions from the images of Arabic words and the on-line signals for the same words. In order to enhance the performance of our writer identification system, we have combined both on-line and off-line approaches; taking the advantage it provides ADAB database, which allows to recover the on-line signal and image for the same handwriting. In this way, our work consists to take advantage of static and dynamic representations of handwriting, in order to identify the writer in realistic conditions. The tests are performed on the writing of 100 writers from the ADAB database. The obtained results show the effectiveness of the proposed writer identification system. Keywords-Writer Identification, Off-line, On-line, MultiFractal Features.

I. I NTRODUCTION Nowadays, it is important to identify or authenticate the author of a writing or signature. Indeed, person identification based on handwriting can be useful in a variety of applications in particular in financial activities in the banks which are based on the verification of signatures, in criminal cases where a sample of handwriting is the only evidence available for investigators, as in the case of threat or ransom letters, bill of sales, wills, etc. Recently, remarkable progress has been made and different approaches have been proposed in the field of writer identification. These approaches are generally categorized as off-line, where only a scanned image of the handwriting is available, and On-line, where temporal and spatial information about the writing is available [1], [2], [3]. Among these approaches, we can cite the system of Said et al. [4], who presents an off-line approach based on the texture 1520-5363/11 $26.00 © 2011 IEEE DOI 10.1109/ICDAR.2011.261

1299

is to introduce the on-line features and the combination with the off-line approach. The main objective of this study was to exploit the dynamic and static information about writings and to explore the utility of multi-fractal features as a new method for on-line and off-line writer identification. To evaluate our method we have used ADAB [17], [18] database, which allows to recover the on-line and off-line signals for the same handwriting (Figure 4). The rest of the paper is organized as follows. In section 2 we present the proposed methods. Then we present our experimental results in Section 3. Finally, conclusion and future work are described in Section 4.

 (q−1)  (q−1)Dq M (R) R h i≈ M0 L where the h...i denotes the average over the centers. B. Multi-Fractal Dimensions for On-line Handwriting

In this sub-section, we present how multi-fractal can be applied on on-line handwriting to exploit the dynamic of writing as velocity, pressure, temporal and spatial information, which are not available with off-line data. In this purpose, we have adapt the method of DLA presented in the previous sub-section in order to extract the multi-fractal dimensions for the on-line handwriting. The procedure that allows to extract these multi-fractal dimensions consists in choosing randomly N points. Then, to exploit the temporal order and the velocity of the writing, for every box of radius Ri = R1 , R2 , . . . , Rmax , centered on a randomly chosen point, we calculate the number of points inside this box, by counting only the points which are in the temporal order before the current point (Figure 3). Thereafter, the counts were used to calculate logh(Mi (R)/M0 )q−1 i/(q − 1) versus log( R L ) for each value of q, where the h...i denotes the average over the centers. Dq values correspond to the slopes of the straight lines, obtained by least squares fitting. The main steps of the method of On-line Multi-Fractal Features Extraction are summarized in the Figure 2.

II. P ROPOSED M ETHODS The proposed method here is based on multi-fractal features. This method consists to calculate the multi-fractal dimensions for the image and the on-line signal for the same handwriting. A. Multi-Fractal Dimensions for Off-line Handwriting Given a textured image with the size L and the number of pixels containing information M0 , covered by boxes of size l, the Multi-Fractal dimensions Dq for the image [19], [20] is defined by the following equation : X  Mi q i

M0

 (q−1)Dq l ≈ L

(2)

(1)

Where Mi is the number of pixels in the ith box, and q is a variable which allows to distinguish fractals properties at different scales. A large difference between fractal (monofractals) [21], [22], [23], [24] and multi-fractal (multifractals) objects, is that for the mono-fractals, Dq is the same for all q varies between −∞ and +∞, by cons for the multi-fractals, The multi-fractal dimensions Dq is a monotonic decreasing function for all values of q within the interval [−∞, +∞]. In practice the direct application of (1) has the disadvantage that the process does not able to resolve regions with high or low density of mass. This problem arises when q < 0. To solve this problem, a solution has been proposed in [25]. This solution is based on the application of Generalized sand box method that is used to demonstrate the multi-fractality of the DLA (Diffusion Limited Aggregates). This procedure consists in choosing randomly N pixels belonging to the structure, and counting for every pixel i the number of pixels Mi , inside boxes of linear dimension R, centered on the selected pixel. The left-hand side of the equation (1) can be interpreted as the average of the quantity  (q−1)   Mi Mi According to the probability distribution M0 M0 . When the centers of the boxes are chosen randomly, the averaging is made during this distribution, and consequently, the equation (1) becomes:

no

Select of N random points Create boxes of radius Ri centered on randomly selected points Calculate the number of points in every box of radius Ri = R1 → Rmax  q−1  Calculate logh MM(R) i/(q − 1) 0 R versus log L for a range values of q

Simulation Number = M? yes Calculate the final values of Dq by averaging these M simulations Figure 2.

1300

The Main Steps of On-line Multi-Fractal Features Extraction

Figure 1.

Application of the method of DLA (extracted from [16])

The direction of the temporal order of the writing

3

The points to be counted for the box of radius R1 Points to be counted with the red points for the box of radius R2.

2 1

Points to be counted with the red and black points for the box of radius Rn.

4

R1

The current point

R2 Rn

Figure 3.

Multi-Fractal Dimensions for On-line Handwriting

III. EXPERIMENTS AND RESULTS

As ranges of writing of the writers is very different, a procedure of normalization of the scripts was applied for both types of approaches. The Multi-Fractal dimensions are extracted from each binary image of word. We have selected randomly 100 pixels from the image(M0 is about 400 pixels). Then we have applied the procedure that allows to extract the Multi-Fractal dimensions as described previously. This procedure is repeated 50 times (Figure 5). The final values of Dq are obtained bye averaging these 50 simulations. The same work is repeated to extract the MultiFractal dimensions for the on-line signals of words, where 100 points are selected randomly from the on-line signal (M0 is about 200 points) and the final values of Dq are also obtained by averaging 50 simulations (Figure 6). We show the results for some Arabic words by using the K-Nearest-Neighbor classifier in Table I.

We have used the ADAB database for our experiments, we have selected a 100 writers, each one has wrote the 24 Tunisian cities repeated 12 times. two thirds of words have been used for the training phase, and the rest has been used for the identification. The system was trained by the writing of the 24 names of Tunisian cities, for each writer. The Tests are then performed for each name of these 24 cities. Label Off-line

Textured image

Figure 4.

On-line

Coordinates(X,Y)

Table I R EPRESENTATION OF SOME RESULTS

Image and On-line signal for the same Arabic Word

A very important step in the process of calculating of multi-fractal dimensions is to determine the sequence of values of moments q. In our case, the calculation of the Multi-Fractal dimensions is made for −20 < q < 20 to obtain 41 features for each image of word and 41 features for each On-line signal. For the case where q = 1, the Equation (2) is non-analytical, hence the choice of q±ε, with ε = 0.001. The equation becomes: Dq =(Dq+ε + Dq−ε )/2. The original data format is a binary image of word for the off-line approach, and a coordinates (X, Y ) of the word with the temporal order for the on-line approach.

Off-line Features

On-line Features

Combination

Words

TOP1

TOP10

TOP1

TOP10

TOP1

TOP10

. ø YJ ƒ YK PñK

 à@ ð Q ®Ë@ ¯A ®“ ñK

80.9%

96.5%

84.6%

98.5%

93.2%

98.5%

80.1%

96.5%

83.2%

97.1%

92.5%

98.2%

79.2%

94.8%

82.4%

96.5%

91.9%

97.4%

78.7%

93.2%

81.6%

94.1%

90.8%

96.6%

The experimental results show that the on-line features are better than the off-line features to characterize the styles of writings. An explanation of this fact is that, the

1301

Dq

Dq

1.8

1.8

1.7

1.7

1.6

1.6

1.5

1.5

1.4

1.4

1.3

1.3

1.2

1.2

1.1

1.1

1.0

1.0

0.9

0.9

-20

-15

-10

-5

5

0

10

15

20

-20

-15

-10

-5

5

0

10

15

20

q

q

Figure 5.

Multi-Fractal spectrum for the image presented in Figure 1

Dq

Dq

2.0

2.0

1.5

1.5

1.0

1.0

0.5

0.5 -20

-15

-10

-5

5

0

q

Figure 6.

10

15

20

-20

-15

-10

-5

0

1 2

5

q

10

15

20

Multi-Fractal spectrum for the On-line signal of the word presented in Figure 3

dynamic features of writing are more discriminates than the static features. The best obtained result is for the word . ø YJ ƒ, with an identification rate of 80.9% in TOP1 YK PñK

and 96.5% in TOP10 for the off-line features, and 84.6% in TOP1 and 98.5% in TOP10 for the on-line features. In order to enhance the result of identification we have combined both approaches. The best identification performance with a correct identification rate of 93.2% is obtained by combination of off-line and on-line Multi-Fractal features.

IV. C ONCLUSIONS We presented a new method for writer identification. This method is based on the extraction of Multi-Fractal features from the images of words, as well as their on-line signals. The fundamental objective of this study was to explore the potential utility of multi-fractal features as a new method to differentiate persons by means of their on-line and off-line writings. A comparison of the Multi-Fractal features for both types of approaches have shown the ascendancy of those applied for the on-line handwriting. However both types of approaches are complementary. The static information concerning the geometry of the writing and the pixels density and the dynamic information concerning the temporal order and the speed of writing. Besides, in most cases the errors occurring in an approach are corrected by the other approach. The obtained result by combining off-line and online features for the same handwriting shows that the Multifractal features are rather powerful and lead to quite high recognition rates. The experimental studies presented in this paper were realized on the Arabic handwriting in both approaches, offline and on-line. Considering that Multi-Fractal Features are rather generic and their applicability in the other scripts as Latin, English, Chinese, etc., is interesting and we envisage that it will be our occupation in our future work.

In this work, writer identification in text-dependent mode where writers are led to write the same text is more difficult than the task of signature identification where each writer has his own signature. This reflects the fact that the variability inter-writer in the task of writer identification is less than that of signature identification. Despite these difficulties, the recognition rates are very encouraging and demonstrate the effectiveness of multi-fractal features. The performance achieved by our approach using only the results for off-line features are better than results reported by Al-Ma’adeed [12] that is, to our knowledge, the only work on writer identification using images of Arabic handwritten words with a text-dependent approach.

1302

V. ACKNOWLEDGMENTS

[12] S. Al-Ma’adeed, E. Mohammed, D. Al Kassis, F. Al-Muslih. ”Writer identification using edge-based directional probability distribution features for arabic words” IEEE/ACS International Conference on Computer Systems and Applications, pp. 582–590, 2008.

This work was realized within the framework of the DAAD project ”On the way of information society”. We are grateful for Dr. Volker M¨argner for his helpful and precious advices. In addition, we acknowledge the financial support of this work by grants from the General Direction of Scientific Research and Technological Renovation (DGRST),Tunisia, under the ARUB program 01/UR/11/02.

[13] H. El Abed and V. M¨argner, “The ifn/enit-database - a tool to develop arabic handwriting recognition systems,” in IEEE International Symposium on Signal Processing and its Applications (ISSPA), 2007.

R EFERENCES

[14] ——, “ICDAR 2009 – Arabic Handwriting Recognition Competition,” International Journal on Document Analysis and Recognition, vol. 1433-2833, 2010.

[1] H. Boubaker, A. Chaabouni, M. Kherallah, A. M. Alimi, H. El Abed. Fuzzy Segmentation and Graphemes Modeling for Online Arabic Handwriting Recognition, International Conference on Frontiers in Handwriting Recognition . pp.695700, 2010.

[15] M. Bulacu, L. Schomaker, and A. Brink, “Text-independent writer identification and verification on offline Arabic handwriting,” in Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR), vol. 2, 2007, pp. 769–773.

[2] M. Kherallah, L. Haddad, A. M. Alimi, and A. Mitiche, “Online handwritten digit recognition based on trajectory and velocity modeling,” Pattern Recognition Letters, vol. 29, pp. 580–594, 2008.

[16] A. Chaabouni, H. Boubaker, M. Kherallah, A. M. Alimi, H. El Abed. ”Fractal and Multi-fractal for Arabic Offline Writer Identification”. Ithe 20th International Conference on Pattern Recognition. pp. 3793-3796, 2010.

[3] M. Kherallah, F. Bouri, and A. M. Alimi, “On-line arabic handwriting recognition system based on visual encoding and genetic algorithm.” Engineering Applications of Artificial Intelligence, vol. 22, pp. 153–170, 2009.

[17] H. El Abed, M. Kherallah, V. M¨argner, and A. M. Alimi, “ICDAR 2009 – Arabic Online Handwriting Recognition Competition,” in Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR), vol. 3, July 2009, pp. 1388–1392.

[4] H. E. S. Said, G. S. Peake, T. N. Tan, and K. D. Baker, “Writer identification from non-uniformly skewed handwriting images,” in In Proceedings of the 9th British Machine Vision Conference, 1998, pp. 478–487.

[18] ——, “On-line Arabic handwriting recognition competition – ADAB database and participating systems,” International Journal on Document Analysis and Recognition, vol. 1433– 2833, 2010.

[5] M. Bulacu, L. Schomaker, and L. Vuurpijl, “Writer identification using edge-based directional features,” in in Proceedigs of International Conference on Document Analysis and Recognition (ICDAR), 2003, pp. 937–941.

[19] T. Vicsek, Fractal Growth Phenomena. Publishing Co Pte Ltd, 1991.

[6] A. Bensefia, T. Paquet, and L. Heutte, “Grapheme based writer verification,” in In Proceedings of the 11th Conference of the Graphonomics Society (IGS), 2003, pp. 274–277.

World Scientific

[20] T. Stosic and B. D. Stosic, “Multifractal analysis of human retinal vessels,” IEEE transactions on medical imaging, vol. 25, pp. 1101–1107, 2006.

[7] A. Nosary, L. Heutte, T. Paquet, and Y. Lecourtier, “Defining writer’s invariants to adapt the recognition task,” in In Proceedings of the 5th International Conference on Document Analysis and Recognition (ICDAR), 1999, pp. 765–768.

[21] B. Mandelbrot, Les Objets fractals : forme, hasard et dimension, survol du langage fractal. Flammarion, 1975.

[8] A. Schlapbach, M. Liwicki, H. Bunke, “A writer identification system for on-line whiteboard data,” Pattern Recognition, vol. 41, no. 7, pp. 2381–2397, July 2008.

[22] S. Ben Moussa, A. Zahour, A. Benabdelhafid, and A. M. Alimi, New features using fractal multi-dimensions for generalized Arabic font recognition, Pattern Recognition Letters, vol. 31, no. 5, pp. 361–371, April 2010.

[9] J. Chapran, M.C. Fairhurst, ”Biometric writer identification based on the interdependency between static and dynamic features of handwriting” in Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, pp. 505–510, 2006.

[23] N. Vincent, V. Bouletreau, P. Faure, H. Emptoz. and R. Sabourin , Fractals and neurological methods in Handwriting Analysis, IGS, Italy, pp 105–107, August 1997. [24] N. Vincent, V. Bouletreau, H. Emptoz and R. Sabourin, How to Use Fractal Dimensions to Qualify Writings and Writers, Fractals, Vol. 8, No 1, pp 85–97, 2000.

[10] L. M. Al-Zoubeidy and H. F. Al-Najar, “Arabic writer identification for handwriting images,” in International Arab Conference on Information Technology, pp. 111–117, 2005.

[25] T. Vicsek, F. Family, and P. Meakin, Multifractal geometry of diffusion-limited aggregates, EPL (Europhysics Letters), vol. 12, no. 3, pp. 217–222, 1990.

[11] S. Gazzah, N. Ben Amara, ”Arabic Handwriting Texture Analysis for Writer Identification Using the DWT-Lifting Scheme,” Ninth International Conference on Document Analysis and Recognition, v. 2, pp. 1133–1137, 2007

1303