Writer Identification and Writer Retrieval Using the Fisher Vector on ...

1 downloads 0 Views 277KB Size Report
Stefan Fiel and Robert Sablatnig ... Model and applying the Fisher kernel. For each document image the features are calculated and the Fisher Vector is.
2013 12th International Conference on Document Analysis and Recognition

Writer Identification and Writer Retrieval using the Fisher Vector on Visual Vocabularies Stefan Fiel and Robert Sablatnig Computer Vision Lab Vienna University of Technology Email: {fiel,sab}@caa.tuwien.ac.at

Abstract—In this paper a method for writer identification and writer retrieval is presented. Writer identification is the task of identifying the writer of a document out of a database of known writers. In contrast to identification, writer retrieval is the task of finding documents in a database according to the similarity of handwritings. The approach presented in this paper uses local features for this task. First a vocabulary is calculated by clustering features using a Gaussian Mixture Model and applying the Fisher kernel. For each document image the features are calculated and the Fisher Vector is generated using the vocabulary. The distance of this vector is then used as similarity measurement for the handwriting and can be used for writer identification and writer retrieval. The proposed method is evaluated on two datasets, namely the ICDAR 2011 Writer Identification Contest dataset which consists of 208 documents from 26 writers, and the CVL Database which contains 1539 documents from 309 writers. Experiments show that the proposed methods performs slightly better than previously presented writer identification approaches.

Figure 1. Sample page of the CVL Database where the writer used two pens (Writer Id:201, Text Id: 3)

Figure 2. Sample page of the CVL Database where the writing style changes within the text (Writer Id:191, Text Id: 1)

I. I NTRODUCTION Writer identification is the task of identifying the writer of a handwritten document. A set of documents from known writers has to be known in advance to assign a new document to one of these writers. First features are computed on the handwriting of a reference document and then these features are compared to the ones stored in the dataset. The writer with the highest similarity is then assigned to the document. Thus writer identification can be used for tasks in forensics and for historical document analysis by identifying the writer of books of which the authors are unknown. The main goal of our approach is not to identify the writer but to find all documents written by the same writer as a reference document. This task, which is known as writer retrieval, is related to writer identification but instead of having a database with features of known writers, one has to calculate the features of all documents and compare those with the ones generated from a reference document. The documents are then ranked according to their similarity. The challenges for writer identification and writer retrieval include the use of different pens, which changes the style of the writing for each person, the fact if the writer has written a text in a hurry or not, and also that one word is rarely written exactly the same way twice. Figure 1 shows 1520-5363/13 $26.00 © 2013 IEEE DOI 10.1109/ICDAR.2013.114

an image taken from the CVL Database [1] where a writer used two pens in one document. It can be seen that, at least for humans, the writing differs clearly, even the slant of the writing has changed. Figure 2 shows one document where the writer changed the writing style within one document. It looks like the middle part of the text is written very fast. Thus, the slant increases, the shape of different characters is changing, and the writing looks different. Figure 3 shows two German documents of the CVL database where the same word appears four times at the beginning of the text line. The “D” of the writers are written in two different ways, but also other characters differ from line to line. Another challenge, which is not covered by any database, is that the handwriting of a person may change over the years and therefore makes the identification a harder task. Current state of the art of writer identification can be divided into two groups: the first group analyzes the characters itself, whereas the second group uses textural features of the handwriting. When analyzing the character itself, first the handwriting has to be segmented from the background, thus the writer identification algorithm is dependent on the segmentation method, which can introduce errors in the case of faded out or blurred ink. After segmentation, features for 545

distribution of different pixel pairs is regarded as the feature representing the writing style. This method won the “ICDAR 2011 Writer Identification Contest” [7]. Jain and Doermann [8] propose a method for offline writer identification by using K-adjacent segment features in a bag-of-features framework to model the handwriting. It represents the relationship between sets of neighboring edges in an image which are then used to form a codebook and classify new handwritings using k−N N . With this approach they were able to win the “ICFHR 2012 Competition on Writer Identification” [9]. A binarization free approach is presented by Hiremath et al. [10]. The writing is assumed as texture image and thus the writer identification can be seen as texture classification. They use subband images of the wavelet transform to compute co-occurrence matrices for 8 directions. When dealing with 30 writers at a time the identification rate is 88%. The evaluation of all methods has been carried out on different databases and so the results cannot be compared with one other. The proposed method is based on our previously presented work [3] and on the writer identification from Gordo et al. [11] who apply the bag of words model to musical scores.

Figure 3. Sample pages which are showing the intra writer difference (Writer Id:923 and 26, Text Id: 6)

individual writers can be calculated on the characters. The writer identification methods which use textural features skip this segmentation step and calculate features directly in the image and are thus independent from segmentation. Brink et al. [2] show that using strong features 100 characters are sufficient for writer identification. When using less powerful features, the text has to be longer than 200 characters. According to [2] the drawback of methods using textural features is that these features are not as discriminative as calculated on characters. Thus, more text is needed for the correct identification of a writer. This paper presents an approach using textural features based on Fiel and Sablatnig [3] which uses a Bag of Words (BoW) approach for writer identification and writer retrieval. In the proposed method, first local features are calculated on the image. Using a Gaussian Mixture Model (GMM) the Fisher vector is calculated using Gaussian distributions generated from a training dataset. The Fisher vector of one image is then compared to the vectors from other images and the distance indicates the similarity between the handwritings. This work is organized as follows: Section II gives a brief description of the current state of the art of writer identification. The methodology used is described in Section III and the experiments and results are presented in Section IV. Finally, a short conclusion is given in Section V.

III. M ETHODOLOGY Our approach uses textural features for writer identification. First local features are computed on the normalized image using the Scale Invariant Feature Transform (SIFT) from Lowe [12]. The Fisher kernels, which were introduced by Perronnin and Dance [13] and improved by Perronnin et al. [14], are applied to the visual vocabulary which is then used to calculate a feature vector of each document image. This vector is then used to determine the similarity between two handwritings. In contrast to the method proposed in [3], where the BoW model was applied to the SIFT features and a histogram of occurrences was generated, this approach takes the distance from the feature to the center of one cluster into account. These steps will now be explained in detail.

II. R ELATED W ORK

A. Local Features

A writer identification method which is based on features extracted from text lines or characters is proposed by Marti et al. [4]. They use features like width, slant, and three heights of the writing zones (descender height, ascender height, and the height of the writing itself). With a neural network classifier they achieve a recognition rate of 90.9%. Bulacu et al. [5] introduce new features which are extracted on character level like the contour-hinge, the writer-specific grapheme emission and the run-length. Using a k − N N for classification a result of 89% is achieved. Grid Microstructure Features is introduced by Li and Ding [6] for writer identification. First the edges of the handwriting are extracted and for each border pixel the neighborhood is described. After global statistic, the probability density

The first step for the proposed method is to calculate the SIFT features on the image. Most of the features are lying on crossings, loops, and peaks of the characters and describe the curvatures and slant of the writing. Because of the properties of the handwriting, the rotation invariance of the SIFT features is disabled up to 180 degrees by mirroring the angle of the keypoints which have a larger angle then 180 degrees as proposed by Diem and Sablatnig [15] for character recognition. So the same structures which are rotated by 180 degrees are generating different SIFT features but the descriptor is still robust to small rotation changes. Experiments show, that the angle flipping of the keypoints can improve the results for writer identification by up to

546

Figure 4. The same word written by two different writers with some selected SIFT features. The marked keypoints in the upper word generate similar descriptors as the marked ones in the lower word. When using SIFT features with rotational dependence these features can be distinguished easily.

Figure 5. Schematic partitioning of a 2d feature space when using the Fisher Vector (colors) instead of k-means (dashed lines).

5%, but as the experiments show in Section IV, there was one task (namely the “Top 7” evaluation of the hard criterion) where the standard SIFT features performed better. Especially when there is only a small amount of words in the image, the rotational dependence improves the performance up to 5%. Figure 4 shows the same word written by two different writers with selected SIFT features. The keypoints which are marked in the upper word are generating similar descriptors to the features marked in the lower word even though they appear on completely different places in the writing. Since the upper profile and the lower profile of a writer are discriminative feature of handwritings, SIFT features with rotational dependence up to 180 degrees are used. Therefore the marked features of both words can be distinguish and the performance of the proposed method increases.

also get the weights, or mixing coefficients, wk of each distribution, which is the average responsibility which that component takes for explaining the feature. The GMM with the parameters gained can then be seen as our vocabulary. Experiments showed that 50 Gaussians (clusters) performed best on the evaluation databases. In contrast to the BoW model where k-means for clustering the feature space is used, here the GMMs are used. Figure 5 shows a schematic partitioning of a 2d feature space. It can be seen that instead of hard borders (dashed lines) which are generated by k-means, the Fisher Kernel encodes additional information about the distribution of the descriptors (colors). C. Generating the feature vector With the vocabulary we generated in the training step we can calculate a feature vector for each image. Again, all SIFT features X = {xt , t = 1 . . . T } for the image have to be calculated and then the Fisher kernels are applied. The feature vector is computed by [14]:

B. Generating Vocabulary Before an identification of a writer or the retrieval of the documents from a writer takes place, a vocabulary has to be generated. This is done using a training set, which is different from the test set. The vocabulary describes common features of handwritings. For the generation of the vocabulary SIFT features are calculated on the training set and these features are then clustered. The visual words are represented by means of a GMM. Each SIFT feature is considered as an observation of a Gaussian mixture. With the Expectation-Maximization (EM) algorithm we can fit a GMM with a given amount of distributions. To speed up this fitting, like in [13], a Principal Component Analysis (PCA) is applied to the SIFT features to reduce the dimensionality from 128 to 64. The EM algorithm estimates iteratively the parameters of the different Gaussians. For each distribution we need three parameters. We estimate the probability P that a feature n is from distribution number k. This probability is denoted P (k|n).Furthermore the means μk and the covariance matrices k are estimated for each distribution. To find the best values for the parameters the overall likelihood is maximized. With this estimation we

T 1  xt − μ i GkX = √ P (k|xt )( ) wk t=1 σi

where GkX is the feature vector for one specific distribution k. wk are the weights of the k-th distribution, μ and σ are the means respectively the variation of the particular distribution. The feature vectors for all distributions are concatenated, thus receiving a feature vector of N D-dimension where N is the number of distributions and D the dimension of the features. In contrast to the BoW method, where kmeans is used, this feature vector is not only an occurrence histogram of the cluster centers, but due to the influence of the probability, by which distribution a SIFT feature is generated, the feature space is represented more precisely. D. Calculating the similarity Perronnin et al. [14] show that the cosine distance is a natural measure of similarity on Fisher vectors, but does not

547

necessarily lead to the optimal accuracy. For this reason they introduced the α normalization for the Fisher vector. Every dimension of the vector is raised to the power of a value α ∈ [0, 1] (using the same value α for all dimension). Our approach uses this distance with the same α proposed in their paper (0.8) too, since it leads to the best results.

Table I S OFT CRITERION RESULTS ON THE ICDAR 2011 complete dataset Top 1 Top 2 Tsinghua 99.5 99.5 MCS-NUST 99.0 99.5 Tebessa C 98.6 100.0 proposed method 99.5 100.0 cropped dataset Tsinghua 90.9 93.8 MCS-NUST 82.2 91.8 Tebessa C 87.5 92.8 proposed method 91.3 94.2

IV. E XPERIMENTS AND R ESULTS The experiments have been carried out on two databases. First the database of the ICDAR 2011 Writer Identification Contest [7] is tested, which consists of 26 writers which have written the same 8 documents in four different languages (English, Greek, German, and French). The second dataset contains the same documents, but only the first two text lines from each document are taken. The other database used is the CVL-Database [1] where 309 writers have written the same 5 text parts and thus consists of 1545 pages. For each writer there is one text in German, the 4 others in English. Both databases provide evaluations of previous writer identification methods. To train the GMM a different database, namely the TrigraphSlant dataset [16], has been used to ensure independence to the evaluation databases. The evaluation is done in the same way as in the ICDAR 2011 contest. For each document a ranking of the similarity to the other documents is generated. There the top N documents are looked up whether they are from the same writer or not. Two criteria have been defined: for the soft criterion at least one document of the top N documents has to be from the same writer as the reference document to count as correct hit. For the hard criterion all k documents have to be written by the same writer. The values of N used for the soft criterion are 1, 2, 5, and 10. For the hard criterion the values 2, 5, and 7 are used for the ICDAR 2011 dataset and 2, 3, and 4 are used for the CVL database. The values differ because of the different numbers of documents of one writer in the databases. Since in the ICDAR 2011 dataset each writer has 8 documents, 7 is the maximal number which can be used for the hard criterion. For the CVL database the maximal number which can be used is 4. To evaluate the writer retrieval a new criterion is introduced, since for writer retrieval it is important to get all documents which are written from the same writer as a reference document. We calculate the percentage of the correct documents in the top N from the ranking. For the ICDAR 2011 dataset the same N values for this criterion are used (like for the hard evaluation). Again, they differ for both databases, because with a higher value then the maximal 100% cannot be achieved. So for the CVL database the values 2, 3, and 4 are evaluated. The results for the experiments of the soft criterion on the ICDAR 2011 dataset are presented in Table I. Also the results of the three best ranked methods of the contest are included in this table, which are “Tsinghua” [6], “Tebessa C” [8], and “MCS-NUST”. When using “Top 1” on the

DATASET ( IN

Top 5 100.0 99.5 100.0 100.0

Top 7 100.0 99.5 100.0 100.0

98.6 96.6 97.6 97.6

99.5 97.6 99.5 99.5

%)

Table II H ARD CRITERION RESULTS ON THE ICDAR 2011 DATASET ( IN %) complete dataset Top 2 Top 5 Tsinghua 95.2 84.1 MCS-NUST 93.3 78.9 Tebessa C 97.1 81.3 proposed method 96.2 89.9 cropped dataset Tsinghua 79.8 48.6 MCS-NUST 71.6 35.6 Tebessa C 76.0 34.1 proposed method 81.3 47.1

Top 7 41.4 39.9 50.0 55.3 12.5 11.1 14.4 15.4

Table III R ETRIEVAL CRITERION RESULTS ON THE ICDAR 2011 complete dataset Top 2 Top 3 proposed method 99.3 98.7 cropped dataset proposed method 87.7 84.6

DATASET ( IN

Top 5 97.2

Top 7 91.2

79.3

69.2

%)

complete dataset only one document was not retrieved correctly. For the other values at least one document of the same writer is in the top k ranked documents. When the cropped dataset is used the results for the “Top 1” and “Top 2” are slightly better than the results of the competition participants, whereas for the “Top 5” the result is one percent worse than the best participant. The “Top 7” results are equally good. Table II shows the evaluation of the hard criterion. When using the complete dataset the proposed method performed up to 5.4% better, than the participants of the competition. When disabling the rotation of the SIFT keypoints higher 180 degree the result for the “Top 7” is increased to 59.1%. This result was achieved when all 7 top ranked documents have to be from the same writer as the reference document, so all other documents in the dataset, which have been written from the same writer, have to be found. When using the cropped dataset the proposed method achieved slightly higher results for the “Top 2” and “Top 7”. For the newly introduced criterion of the writer retrieval, there are no results available for the participants of the ICDAR contest. Thus only our results are listed in Table III. So when regarding the first two documents in the ranking

548

ACKNOWLEDGMENT

Table IV S OFT CRITERION RESULTS ON THE CVL-DATABASE ( IN %)

Tsinghua Tebessa C proposed method

Top 1 97.7 97.6 97.8

Top 2 98.3 97.9 98.6

Top 5 99.0 98.3 99.1

The authors would like to thank the Fraunhofer-Institute for Production Systems and Design Technology (IPK), Berlin for supporting the work.

Top 10 99.1 98.5 99.6

R EFERENCES [1] M. Diem, S. Fiel, F. Kleber, and R. Sablatnig, “CVLDatabase: An Off-line Database for Writer Retrieval, Writer Identification and Word Spotting,” in 2013 ICDAR (forthcoming), 2013. [2] A. Brink, M. Bulacu, and L. Schomaker, “How much handwritten text is needed for text-independent writer verification and identification,” in ICPR, Dec. 2008, pp. 1 –4. [3] S. Fiel and R. Sablatnig, “Writer Retrieval and Writer Identification Using Local Features,” in DAS, M. Blumenstein, U. Pal, and S. Uchida, Eds. IEEE, 2012, pp. 145 –149. [4] U.-V. Marti, R. Messerli, and H. Bunke, “Writer identification using text line based features,” in ICDAR, 2001, pp. 101 –105. [5] M. Bulacu, L. Schomaker, and L. Vuurpijl, “Writer identification using edge-based directional features,” in ICDAR, Aug. 2003, pp. 937 – 941. [6] X. Li and X. Ding, “Writer Identification of Chinese Handwriting Using Grid Microstructure Feature,” in Advances in Biometrics, ser. Lecture Notes in Computer Science, M. Tistarelli and M. Nixon, Eds. Springer Berlin / Heidelberg, 2009, vol. 5558, pp. 1230–1239. [7] G. Louloudis, N. Stamatopoulos, and B. Gatos, “ICDAR 2011 Writer Identification Contest,” ICDAR, pp. 1475–1479, 2011. [8] R. Jain and D. Doermann, “Offline Writer Identification Using K-Adjacent Segments,” in ICDAR, sept. 2011, pp. 769 –773. [9] G. Louloudis, B.Gatos, and N. Stamatopoulos, “ICFHR2012 Competition on Writer Identification, Challenge 1: Latin/Greek Documents,” in ICFHR, 2012, pp. 825–830. [10] P. Hiremath, S. Shivashankar, J. Pujari, and R. Kartik, “Writer identification in a handwritten document image using texture features,” in International Conference on Signal and Image Processing, dec. 2010, pp. 139 –142. [11] A. Gordo, A. Forn´es, E. Valveny, and J. Llad´os, “A bag of notes approach to writer identification in old handwritten musical scores,” in DAS. ACM, 2010, pp. 247–254. [12] David G. Lowe, “Distinctive Image Features from ScaleInvariant Keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [13] F. Perronnin and C. Dance, “Fisher Kernels on Visual Vocabularies for Image Categorization,” in CVPR, 2007, pp. 1 –8. [14] F. Perronnin, J. S´anchez, and T. Mensink, “Improving the fisher kernel for large-scale image classification,” in ECCV, ser. ECCV’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 143–156. [15] Markus Diem and Robert Sablatnig, “Recognizing Characters of Ancient Manuscripts,” in Proceedings of IS&T SPIE Conference on Computer Image Analysis in the Study of Art, D. G. Stork and J. Coddington, Eds., vol. 7531, 2010. [16] A. Brink, R. Niels, R. van Batenburg, C. van den Heuvel, and L. Schomaker, “Towards robust writer verification by correcting unnatural slant,” Pattern Recognition Letters, vol. 32, no. 3, pp. 449 – 457, 2011.

Table V H ARD CRITERION RESULTS ON THE CVL-DATABASE ( IN %)

Tsinghua Tebessa C proposed method

Top 2 95.3 94.3 95.6

Top 3 94.5 88.2 89.4

Top 4 73.0 73.0 75.8

Table VI R ETRIEVAL CRITERION RESULTS ON THE CVL-DATABASE ( IN %)

Tsinghua Tebessa C proposed method

Top 2 96.8 96.1 97.1

Top 3 94.5 94.2 95.1

Top 4 90.2 90.0 91.4

for the complete dataset, still 99.3% of these documents are assigned to the correct writer. This value drops to 91.2% when the first seven documents are taken into account. For the cropped dataset the result for the first two documents is 87.7% and drops to 69.2% when the first seven documents of the ranking are regarded. Next, experiments were carried out on the CVL-Database. The results for the soft criterion are presented in Table IV, also the results for two participants of the ICDAR 2011 contest are presented, the third one is missing in the evaluation of the database. In these experiments all methods perform nearly equal. The results of the hard criterion are presented in Table V. For “Top 2” and “Top 4” the results differ only slightly, only for “Top 3” Tsinghua performed 5% better. For the writer retrieval task the results are presented in Table VI. Again, the proposed method performs minimal better than the other algorithms. Remarkable is the result for the “Top 3” evaluation, where Tsinghua performed 5% better at the hard evaluation. V. C ONCLUSION A method for writer identification and writer retrieval has been presented. The method uses SIFT features to describe the characteristics of the handwriting. The features of a trainings set are clustered using a GMM which is used as a vocabulary. Each feature of a document is then predicted using this GMM and the Fisher Vector is generated. After a power normalization, the vectors are compared using the cosine distance. The proposed method has been evaluated on two different datasets and it has been shown that the method can keep up with the state of the art and performs slightly better for writer retrieval tasks.

549