On the Recognition of Online Handwritten

0 downloads 5 Views 421KB Size Report
Abstract—This paper describes an online recognition system ... An efficient online system of recognising ... regarding the order that symbols are entered.

On the Recognition of Online Handwritten Mathematics Using Feature-Based Fuzzy Rules and Relationship Precedence Ray Genoe and Tahar Kechadi

Abstract—This paper describes an online recognition system based on a pen-based interface that can be used to analyse the structure of mathematical expressions. Some of the topics include a feature-based spatial analysis technique and a procedure for constructing expressions based on relationship precedence. The algorithm discussed combines symbol recognition, spatial analysis and parsing techniques, to generate an expression tree of the given mathematical expression.



EASEARCH into the area of automatic recognition of handwritten expressions has recently been driven by the advent of powerful pen-based interfaces and the desire to combine the natural advantages of handwritten input with the data processing capability of computers [1], [2], [3], [4], [5], [6], [7], [8]. A number of researchers have created systems that can recognise a user’s handwriting and use this ability to perform simple tasks such as writing memos [9]. This is a relatively straightforward undertaking, with respect to mathematical expressions due to the linear nature of the input, and the process usually involves sorting symbols with similar baselines from left to right. Many of these textual input systems do not offer mathematical expression recognition capabilities. This avoids the complexity of dealing with the multi-tiered nature of sophisticated symbols such as fractions and integrals, the numerous relationships that may be included in mathematical expressions and the many ambiguities that may arise from this added complexity. When considering the subject of recognising handwritten mathematical expressions, three main topics arise; symbol recognition, spatial relationships and grammar. As the subject of symbol recognition has been widely explored, this paper will is primarily concerned with the structural analysis aspects of handwritten mathematical expressions.

A. Problem Statement While recognising handwritten mathematical expressions, the relative positioning and size of one mathematical symbol to another reflects embedded mathematical meaning and must be investigated. Furthermore, the relationships discovered must be constructed in a manner that reflects a grammatically correct mathematical expression. Therefore, a Manuscript received December 14, 2007. Ray Genoe is with the School of Computer Science and Informatics,University College Dublin, Belfield, Dublin 4, Ireland (phone: +353 1 7162403; email: [email protected]). Tahar Kechadi is with the School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland (phone: +353 1 7162478 ; email: [email protected]).

method for identifying the spatial relationships between symbols must be established and have the ability to work in tandem with an efficient procedure for determining the overall structure of the expression. An online approach was adopted due to the many advantages online recognition has over offline recognition. The main advantages are efficiency and interactivity. Most offline systems perform all structural analysis after the user has finished writing and subsequently pressed a “button”. This can take a considerable amount of time when compared to online systems. An efficient online system of recognising expressions will display the solution when the user has lifted the pen after each stroke. As online structural analysis is performed after each stroke, this adds another benefit to the user. If the system displays its interpretation to the user after each stroke, misrecognised symbols or relationships can easily be identified and corrected. A previous attempt, by the author, to solve the problem of structural analysis involved using the symbols’ bounding boxes when considering spatial analysis [10]. After further testing it was discovered that using bounding boxes alone was not sufficient when analysing some symbols and relationships. In Figure 1 below, the first pair of symbols, b and c should be interpreted as a multiplication relationship, and the second pair should be interpreted as a subscript relationship. While the bounding boxes are identical for both pairs of symbols, humans can recognise that the features of the letter b, primarily the ‘o-shape’, can determine the nature of the relationship when compared to the letter c. A similar problem can also be seen when identifying the multiplication and superscript relationships between the two pairs of symbols, g and c. This suggests that special consideration should be given to certain symbols when determining implicit relationships.

Fig. 1. Bounding Box Problem

This previous attempt also tried to maintain a constraintfree environment, meaning that a user could enter symbols in whichever order they preferred. While this was a useful approach to the problem, it became obvious that maintaining this environment would have negative implications on the

scalability of the system. Due to the many symbols and relationships that are involved in mathematical expressions, scalability is a very important issue and must be considered with respect to structural analysis. As a result, the previous approach for constructing the expression based on the relationships discovered, needed to be redressed. The next section details some of our research into the area of analysing handwritten mathematical expressions. The following section outlines an approach to solve some of the problems encountered when dealing with structural analysis. A feature-based spatial analysis technique and an appropriate method for constructing mathematical expressions are described, as well as the algorithm that combines these techniques with the existing symbol recogniser. B. Related Work Chan and Yeung proposed a method to increase the efficiency of parsing mathematical expressions based on Definite Clause Grammar (DCG). DCG was used as a formalism to define a set of replacement rules for parsing mathematical expressions but was notoriously inefficient due to its frequent use of backtracking. They improved its efficiency by using left-factored rules, and binding and fence symbol preprocessing [8]. However, they did not address the issues of ambiguity resolution, error detection or error correction and described these as issues for further research. To handle the problems caused by the ambiguous nature of handwriting, one system is based on a soft-decision approach [1]. If ambiguity arises when analysing the relationship between two symbols, alternatives for the solution are generated. The output string for the expression must be syntactically verified by each alternative generated, otherwise it is considered invalid. This method of generating alternative solutions is highly important regarding the problem of recognising handwriting, as ambiguities regularly arise. The authors did not submit any information regarding the complexity of the system and this could be an important factor when maintaining alternative solutions. Kosmala & Rigoll [11] presented a system based on Hidden Markov Models (HMMs), which have the advantage of simultaneous segmentation and recognition of symbols. This avoids complex pre-processing and the results are used in the spatial analysis phase of the system. Furthermore, HMMs boast excellent learning capabilities and can adapt quickly to a new user’s handwriting style. However, due to the temporal nature of the input and the absence of a twodimensional grammar, constraints are placed on the user regarding the order that symbols are entered. An example of one of these constraints is when a user wishes to enter a fraction they must first enter the numerator, then the dividing line and finally the denominator. Zanibbi proposed a tree transformation based method [5], whereby a recursive search identifies linear structures in an expression and constructs a Baseline Structure Tree. This tree is then subjected to lexical analysis to produce the final tree.

Garain et al. [12] approached the task by segmenting the expression into atomic boxes and then repeatedly merging adjacent boxes according to a set of production rules corresponding to spatial relationships. The system presented, analysed the expression when the user had finished entering it, i.e., offline. However, to increase efficiency, it did gather online information such as bounding box coordinates and the relative positioning of symbols to the baseline of others. The authors in [13] proposed an approach called Fuzzy Shift-Reduce Parsing (FSRP). This approach is built upon traditional shift-reduce parsing methods, thus providing syntax checking and also a basis for efficiency. Fuzzy logic is introduced to cope with the imprecision of handwritten input. Multiple parses are pursued if ambiguities arise and the most likely expression tree is selected as the result. However, dealing with the individual rules’ parameters is not an easy task. II. PROPOSED APPROACH A. Spatial Analysis As mentioned previously, when investigating the relationships between symbols, analysing the bounding box information alone may not be sufficient for the handwriting style of some users. The process of identifying specific relationships that are implied, such as multiplication and scripts, can benefit from added information about the features of the symbols in question. Therefore we propose a new feature-based fuzzy rules technique for spatial analysis. Information gleaned during the symbol recognition phase regarding the features of various symbols, is used to establish the limits of the fuzzy functions that determine relationship confidence. Figure 2 below illustrates how these thresholds could be determined by the system for a superscript relationship. The thresholds on the left (UL), represent the area where the top of the scripted symbol should appear. The thresholds on the right (LL) represent the area where the bottom of the scripted symbol should appear.

Fig. 2. Feature Based Thresholds

All thresholds are determined by the features of the lefthand symbol. The limits a, b and c are determined with respect to the symbol’s top coordinate and height and the limits d, e and f are determined by the bottom feature (i.e.,

the ‘o-shape’). An example of a fuzzy function has been illustrated in Figure 3. As can be seen, optimal displacement lies between the limits c and d and will return a confidence of 100%.

Fig. 4. Undesired relationship discovered

Fig. 3. The Fuzzy Function

The method described above is just one of the steps required to determine relationship confidence, i.e., vertical confidence. Most relationships are investigated by considering vertical and horizontal confidence. Relative size is also used to weight the decision in some cases. Once the relationships have been determined for a symbol, the best relationship found is presented to the system for semantic analysis. B. Semantic Analysis Any relationships discovered for a new symbol are represented by an expression tree, once it has been verified that they are semantically correct. In order to verify this, the system must perform a number of steps: 1) Test for regressive and undesired relationships. 2) Gather groups of symbols such as numeric strings. 3) Gather relationships of equal/lesser precedence than REL. 4) Test for regressive and undesired relationships. Regressive entries symbols that have been added to an expression that do not adhere to the natural handwriting order of most users hereafter referred to as progressive entries. This can happen for many reasons. The user may simply have an unnatural style of handwriting or they may have forgotten to enter a symbol or stroke earlier. Progressive entries are usually written left-to-right and topdown. For example, if a user wished to enter a fraction to the right of a ‘+’ symbol, they would write the plus, followed by the numerator, followed by the dividing-line and finally the denominator. Some progressive entries may vary, such as the summation function. In this case, users normally enter the ∑ symbol and then the upper bound followed by the lower bound. Undesired relationships are relationships that should not have been presented for semantic analysis. This usually occurs when multiple relationships with similar confidence values have been identified between the new symbol and various other symbols. Figure 4 below illustrates an example of when this could happen. The new symbol 2 is added to an existing superscript expression and the system incorrectly presents the best relationship for the new symbol as being superscript rather than concatenation.

Regressive entries and undesired relationships can easily be identified by examining the expression tree. Figure 5 shows how the tree might look before the new symbol in figure 4 was added. As the system relies on progressive entries being added to the system, the node representing the expression on the left-hand side of a relationship that has been discovered needs to be examined. If it does not exist or is the left child of an internal node then an error has occurred. The systems ability to identify these is very important when constructing the expression and could be a useful part of the error detection process, later. However, the author would like to stress that dealing with regressive entries and undesired relationships is an area for future work.

Fig. 5. Using the tree to detect regressive/undesired relationships

Once it has been established that the new relationship (REL) is neither regressive nor undesired, the system can group sub-expressions to appropriately reflect the true nature of the relationship. As the REL is progressive, only the left/upper symbol (c1) need be investigated. The right/lower symbol (c2) is the new symbol. The method of grouping sub-expressions consists of two parts: 1) If c1 is the right-most symbol in a string (e.g., “cos”, where “s” is c1) or multiplication, the appropriate subexpression is set as c1. 2) Next, investigate the relationships around c1. Keep building the sub-expression c1, while the relationships satisfy the conditions below. a. If REL is not a numerator relationship, the relationship must be of a lesser precedence than REL and satisfy horizontal confidence with c2. b. If REL is a numerator relationship, the relationship must be of lesser precedence and satisfy numerator confidence with c2.

Once REL has been updated with the appropriate subexpression and verified as being semantically correct, it may be represented by the expression tree. The algorithm for the entire process may be seen below. C. The Algorithm 1) Identify the symbol (S) created by a new stroke. Resolve any multi-stroke symbol issues, if they exist. 2) If S is the first symbol added, go to 5, else 3. 3) Determine all relationships shared by the new symbol using spatial analysis technique. 4) If no relationship found, go to 5, else 6. 5) Create a node for S and go to 8 6) Present the best relationship (REL) found for semantic analysis a. Test for regressive and undesired relationships; if found go to 5 b. Use the tree to get the appropriate sub-expression for the l.h.s. of REL c. Test for regressive and undesired relationships; if found go to 5 7) Update the tree to include the semantically correct REL 8) Await next stroke D. Updating the Tree Every internal node in the tree contains a sub-expression that reflects the contents of the nodes below it. Therefore a number of steps must be carried out when adding new relationships to the tree. The relationship must be built appropriately and the tree must be updated above the node for this relationship, all the way to the root. The root node of the tree will then contain the overall expression. For every explicit binary operator that is added to the system, the user is required to add the left/upper operand followed by the operator and then the right/lower operand. This means that 2 relationships are required to determine the confidence of a binary relationship, the left/upper and the right/lower. When the user adds the second relationship the system identifies the node (X) in the tree that contains the operand, creates a node for the right/lower operand and sets this node as the right child of X. A different approach is required when dealing with certain groups of symbols. For example, say the user wishes to enter the number 2¾. The user enters the symbols one at a time and the system correctly identifies the superscript relationship 23 when the second symbol is added. After adding the remaining symbols to complete the fraction, the system will redress this superscript relationship. By not identifying the fact that the group of symbols that form the fraction, do not have a superscript relationship with the 2, an incorrect interpretation of the expression (i.e., 2¾ instead of 2¾) will be produced. Therefore, when a fraction is being created it is split off from the rest of the tree until it is completed (i.e., the numerator and denominator have been added). The fraction is then considered as one entity to be reintroduced to the expression.

Another example of when a relationship, that has been previously identified, needs to be redressed is when a user enters a new symbol to the right of a minus (see Figure 6). In this case the unary operation -3 should be considered as a group and the relationships between this group and the number 2 should be investigated. If this investigation results in a superscript relationship being discovered then the old relationship, i.e., the right-minus relationship between the 2 and the minus, should be discarded.

Fig. 6. A left-minus relationship is found but other relationships may be affected

III. TESTING The system was tested using 300 expressions entered by 12 users. The dataset entered by each user consisted of 25 mathematical expressions (see Appendix). This dataset was created in order to solely test the symbols and relationships that the system is currently capable of recognising. The users were given no information regarding how the system worked and the system was offline at the time of data entry. Therefore, each user had no idea of how the system interpreted their previous expression and as a result continued with their natural style of handwriting throughout the production of each dataset. Before discussing the results of this testing, the author would like to point out that in order for a mathematical expression to fail the system test, only one mathematical relationship needs to be inaccurately recognised. Furthermore, an expression will fail the test if any of the symbols are misclassified prior to spatial/structural analysis. One of the users was asked to use the touchpad on a laptop, with which they had little or no prior experience. This resulted in a poor reflection of their natural style of handwriting and the numerous errors suggested that this test set (0.8% of the overall set) should not be considered as a true reflection of the system’s capabilities. Furthermore, 0.4% of the expressions were entered incorrectly by the users. The pre-processing phase of the system, i.e., symbol recognition, failed at recognising at least one of the symbols in 25.1% of the expressions. In actual fact, 18.6% of these were only single instances of symbol misclassification. All of these issues accounted for a total of 26.33% of the expressions. As the issues above account for 79 of the expressions, for the benefit of testing the section of the system discussed in this paper, namely spatial and semantic analysis, we shall only be considering the remaining 221 expressions. The system accurately recognised 87.3% of these expressions.

The remaining 12.7% of the expressions were incorrectly recognised due to no relationships being found between symbols (1%), undesired relationships being discovered (6.3%) or regressive entries (5.4%) that the system is currently not capable of dealing with. IV. CONCLUSION The result of testing the structural analysis phase of the system has yielded encouraging results. While the system is designed with scalability in mind, testing this aspect of the system has yet to be carried out. This will need to be done in tandem with improving the symbol recognition phase, due to the fact that new symbols will be needed in order to start introducing new relationships to the system. A more efficient symbol recognizer will be required and a decision must be made as to whether to adopt a new symbol classifier or develop the current one. A more obvious area for future work is to facilitate regressive entries and handle undesired relationships. This will boost the systems recognition rates by approximately 11.7%, with respect to the current test set. Regressive entries could be allowed by determining an efficient way to reorder and rebuild the expressions. Undesired relationships could possibly be dealt with by maintaining alterative solutions to the system’s best interpretation. Maintaining alternative solutions will have to be developed with due care given to minimizing the complexity of the system. APPENDIX


[2] [3] [4] [5]


[7] [8]

[9] [10]

[11] [12] [13]

The Dataset for Testing

H.-J. Winkler, H. Fahrner, M. Lang, A Soft-Decision Approach for Structural Analysis of Handwritten Mathematical Expressions, 1995 Int. Conference on Acoustics, Speech, and Signal Processing (ICASSP-95), Detroit, USA, Vol. 4, pp. 2459-2462, May 1995 Ernesto Tapia, Understanding Mathematics: A System for the Recognition of On-Line Handwritten Mathematical Expressions, Freien Universitat Berlin, 2004. Smithies, S., Novins, K., Arvo, J. Equation Entry and Editing via Handwriting and Gesture Recognition, Behaviour and Information Technology 20(1):53-67, January 2001. Richard Zanibbi, Kevin Novins, James Arvo & Katherine Zanibbi. Aiding Manipulation of Handwritten Mathematical Expressions through Style-Preserving Morphs. Queens University, Canada, 2001. Richard Zanibbi, Dorothea Blostein & James R. Cordy, Recognising Mathematical Expressions Using Tree Transformations IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 11, November 2002. Kam-Fai Chan, Dit-Yan Yeung, Mathematical Expression Recognition: A Survey, Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (Date unknown). Nicholas E. Matsakis, Recognition of Handwritten Mathematical Expressions, Massachusetts Institute of Technology, May 1999. Kam-Fai Chan, Dit-Yan Yeung, An Efficient Syntactic Approach to Structural Analysis of On-line Handwritten Mathematical Expressions, Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, August 1998. ] Poupyrev, I., Numada, T. and Weghorst, S. Virtual Notepad: Handwriting in Immersive VR, In Proceedings of IEEE Virtual Reality Annual Symposium pp. 126-132, VRAIS 1998. R. Genoe, J.A. Fitzgerald & Tahar Kechadi, An Online Fuzzy Approach to the Structural Analysis of Handwritten Mathematical Expressions, WCCI (FUZZ-IEEE), Vancouver, BC, Canada, July 1621, 2006. A. Kosmala, G. Rigoll, Recognition of On-Line Handwritten Formulas, 6th Int. Workshop on Frontiers in Handwriting Recognition (IWFHR), Taejon, S. Korea, August 12-14, 1998. U. Garain and B. Chaudhuri, Recognition of Online Handwritten Mathematical Expressions, IEEE Transactions on Systems, Man and Cybernetics, Part B, Volume 34, Issue 6, pp 2366-2376, 2004. J.A. Fitzgerald, F. Geiselbrechtinger and T. Kechadi, Structural Analysis of Handwritten Mathematical Expressions Through Fuzzy Parsing, The IASTED Intl. Conference on Advances in Computer Science and Technology (ACST06), Puerto Vallarta, Mexico, January 23-25, 2006, pp 151-156.