A Survey on Methods and Strategies on Touched ...

103 downloads 1481 Views 272KB Size Report
International Journal of Research and Reviews in Computer Science ..... First, local minima of contour and profile features are ..... International Conference on Software Engineering, ... [21] W. Seo, and B.J. Cho, “Efficient Segmentation Path.
103 International Journal of Research and Reviews in Computer Science (IJRRCS)

Vol. 1, No. 2, June 2010

A Survey on Methods and Strategies on Touched Characters Segmentation Tanzila Saba, Ghazali Sulong and Amjad Rehman Graphics and Multimedia Department Faculty of Computer Science and Information systems University Technology Malaysia 81310 Skudai Johor Malaysia [email protected]

Abstract: Character segmentation is a challenging problem in the field of optical character recognition. Touched characters make this dilemma more crucial. Accordingly, two broad classes of technique for touched character segmentation are identified. These include methods that perform explicit or implicit character segmentation. The basic methods used by each class of technique are presented and the contributions of individual algorithms within each class are discussed. It is the first survey that focuses on touched character segmentation and provides segmentation rates, descriptions of the test data for the approaches discussed. Finally, the main trends in the field of touched character segmentation are discussed and some important contributions are presented. Keywords: optical character recognition, touched character, documents analysis, character segmentation, explicit segmentation, implicit segmentation.

1. Introduction and Background The optical character recognition (OCR) is one of the most challenging topics in the fields of pattern recognition. In a document image, first step is layout analysis and extraction of text lines. Then each text line is segmented into isolated individual character images. In final, these character images are sent to the classifier and the corresponding class labels are obtained. The whole process is straightforward for wellprinted/ well written documents and is shown in Figure 1.

Figure 1. Flow diagram of traditional OCR system In fact, in many practical documents with poor quality, handwriting style neighboring character may easily touch or overlap each other, and as a result, it is intractable to find the correct segmentation points only by means of image analysis [1]. On the other hand, touching characters are error prone in segmentation stage that contributes to recognition errors. Commonly, the problem is due to the conventional

segmentation approaches that fail to segment the word correctly and mostly unable to determine the correct segmentation points. In addition, many segmentation approaches in the literature are failed when deal with touched character segmentation problem [2]. However, there are two main reasons why the recognition rate decreases when dealing with touching characters. Firstly, for segmentation based or explicit approaches, it is hard to segment characters that have touching part. Secondly, segmentation free or implicit methods cannot recognize touched pattern. The obvious reason is: classifier(s) classifies the touching characters as a single character. Moreover, misclassification is due to ambiguous touching character pairs such as “LI”, “lo”, “nn’, “cl”, “rn". In some printed character fonts, "LI" is similar to "U" and "rn" is similar to "m". In these cases, even a robust character recognizer is unable to recognize them accurately [3]. However, one of the solutions is to implement lexicon-based OCR for obtaining contextual information. Touching character patterns emerge when two adjacent characters are written too close; therefore, some parts of character are connected horizontally/left-right (i.e. Latin/Roman, Bangla alphabet) or vertically/up-down (Chinese alphabet). Particularly for numeral or digit strings, touching is defined in terms of when two adjacent digits are connected by ligatures or horizontally/left-right. Moreover, in some cases the foreground pixels of two adjacent characters are consisted of sharing pixels. Thus, conventional segmentation may produce a broken character that further might leads to misinterpretation [4]. Conventional approaches can segment typical words accurately. In this regard, different strategies such as projection profile, bounding box or contour tracing exhibit promising results [1]. However, they might lead to incorrect segmentation when deal with touching characters as shown in Figure 2.On the other hand, number of strategies for segmenting touching characters are available in the literature. But the techniques mainly deal with connected numeral strings segmentation [5-13] and machine printed touched characters segmentation [14-16, 32]. Although, touched character segmentation is investigated in different languages such as Chinese, Bangla, Arabic, Devangari, and Hangul [17-22]. However, to the best of author’s knowledge no similar study is available for Latin/Roman touched characters in cursive handwriting. In this paper, we present a survey focused on Roman touched character segmentation

104 International Journal of Research and Reviews in Computer Science (IJRRCS)

and provide broad coverage of the topic. Following review the state of the arts, the discussion is organized in two major categories that are further sub-divided accordingly. Section 2 represents transformation from languages to symbols, section 3 is about observations of touched character segmentation problems. Section 4 presents critical review of touched character segmentation techniques. Finally, conclusion and future directions are suggested in section 5.

Vol. 1, No. 2, June 2010

is conducted due to the fact that common methods in the literature can be applied for character in only one direction; horizontal or vertical (e.g., in Japanese documents). Meanwhile in such case, characters or symbols are placed in horizontal, vertical or diagonal directions (e.g. in mathematical expression). In cursive handwriting, different method and strategies of segmentation are introduced to improve the final recognition. Some based on background analysis and the others investigate foreground analysis [21]. Nevertheless, all these problems reveal that this topic is still an issue and open to implement automatic machine reading.

Figure 2. Incorrect segmentation due to touched/ overlapping neighbors

2. From Languages to symbols Touched character segmentation/ recognition problem has been investigated by several researchers both in printed and handwritten characters. Some samples of touch characters are shown in Figure 3. However, most of the studies investigate touched numeral or digit strings, and only a few on alphabets. Problem of touching digit string is also popular in the research community due to its important applications in real and practical life such as bank check form, postal code or accounting documents. Although electronic transaction are robust and reliable today, but still bank checks are more easily and widely used by people due to their strong authentication, simplification and above all independent of internet connection or computer. One of the key issues in bank check automation machine reading is segmentation of touching numeral string. Accordingly, several approaches have been reported in the literature [73]. Automatic machine reading is the main goal of investigation in document analysis and recognition. In this regards, researchers from different countries attempt to develop automatic machine reading in their own languages. Furthermore, alphabets or characters in each language have different styles and characteristics. Thus, problem of touching character will be different for each language and demands particular strategies to solve it. Touching characters can be found in any language such as Latin/Roman, Chinese, Arabic or others. Accordingly, several segmentation techniques have been proposed for different languages such as Roman/Latin [23], Farsi/Arabic [17, 18, 24,25], Chinese [26,27], Japanese [20] and Hangul [28]. Segmentation and recognition of Roman printed characters has been mature and more stable. However, a study is still conducted for special problems in printed environment which is found in mathematical expressions [29, 5, 30], maps with multi-oriented characters direction [31,32] and others that deal with special symbols. In addition, this investigation

Figure 3. Samples of touched characters (a) numeral string, (b) printed character, (c) handwritten touched cursive characters

3. Observation of Touched Segmentation Problems

Character

Segmentation of touching cursive characters is more problematic compared to linked/non-touching cursive characters. Therefore, it is compulsory to investigate the touching characteristics first. Several comprehensive reviews, surveys and discussions on segmentation problems are available in the literature. In [33], an overview of techniques in machine printed character segmentation is presented. Review of hand-printed and handwritten character segmentation methods is available in [34, 35]. Comprehensive observation related to touch Bangla character problem based on statistical features is reported in [36]. Unfortunately, touched cursive handwritten character segmentation is not discussed specifically although it is more challenging than conventional machine printed character segmentation problem. Based on the observations of the researchers, following aspects can be concluded: (i) Mostly, touching characters only consists of two characters. Meanwhile, touching characters of three or more are uncommon.

105 International Journal of Research and Reviews in Computer Science (IJRRCS)

(ii) Some touching character possibly similar to single character. Such as characters like “r” and “n” when touching it may misinterpreted as character “m”. (iii) Touching characters commonly have an aspect ratio (width/height) larger than single isolated characters. (iv) Horizontal thickness of touching part might be thicker. (v) The vertical thickness at the touching part is typically less compared to the thickness of other parts. (vi) The adjacent characters in most cases touch at the middle of the core zone rather than at the top or bottom of the core zone. Based on above observations, one interesting fact is observed that the problem of observation (ii) will remain unsolved due to lack of segmentation recognition based technique. Besides that, misinterpretation of alphabet ‘m’ just can be solved by integrating the whole system with some lexicon. Another observation of six categories of touching digit string is reported in [37]. The categories are single point touching, ligature touching, multiple points touching, overlaps, noise and broken. Likewise, Yi-Kai and Jhing-Fa [38] explore the touching digits problems. They classify the problems in two major categories which are single-touching and multipletouching. If there is an only one run length pixel on touching part, it called single-touching. Meanwhile, multiple-touching denote when there are more than one run length pixels on touching portion. Furthermore, different touching situations using two well-known numeral databases, IRIS-Bell’98 and TNIST are investigated in [39]. Frequency is counted on different types of touching digits and segmentation is performed quite similar to [38]. In this regard, three more types which are overlap, cursive and broken are introduced in [39]. Finally, problem of touched printed characters and three types of merged characters: linear, nonlinear and overlapped characters is highlighted in [40]. As mentioned above, observation mostly deal with digit string and few others deal characters of Bangla, Japanese, and Chinese languages. Yet, observation and study on touch characters in cursive handwriting is still an open problem. However, it is known that character and numeral have some similar characteristics. Thus, these touching types could be applicable to observe the problem of connected cursive handwriting. Approaches of recognizing numeral string can be classified into segmentation-based and segmentation-free recognition methods. The former method includes recognition-free and recognition-based segmentation methods, while the latter includes the holistic method which recognizes numeral string without segmentation. Although there are several point of views in the observation, category and type of segmentation problems. Nevertheless, to some extent they are still correlated to each other.

4. Observation of Touched Segmentation Problems

Character

The development of a character recognition system that recognizes well-formed and well-spaced printed texts is a relatively simple process. However, in documents with many touching characters, the recognition rate of the OCR system is considerably lower [41]. A large proportion of the resulting recognition errors are due to segmentation errors. If touching characters are incorrectly segmented, the character recognizer cannot recognize the characters and this error affects succeeding characters as well.

Vol. 1, No. 2, June 2010

In this section, numerous segmentation approaches that are available in the literature are discussed. Some authors previously have surveyed segmentation in a general scenario, often as part of a more comprehensive work such as [33-35, 42]. Generally, recognition of touching character approaches can be divided into two major classes which are segmentation based approach and holistic approach [43]. For segmentation based approach, the basic principle to solve touching characters is by producing the segmentation hypothesis using some criteria / function [36]. Furthermore, hypothesis with high confidence value is taken based on the recognition result. Meanwhile, in holistic approach, the algorithms employ the word as a single entity and use features of the word to recognize it. Holistic approach use lexicon as knowledge-base and this approach suitable only for small and static lexicon based application. In this regard, both segmentation strategies reported in the literature are discussed below. 4.1 Segmentation Based Approaches (Explicit/ Classical Segmentation) Event though, touching problem is identified during 70’s and several investigations are conducted during 80-90’s; however the segmentation accuracy is still low and need improvement [41]. Two segmentation algorithms for uppercase serif printed character based on quasi-topological and topological features are proposed in [44]. The quasi-topological segmented character use combination of feature extraction and character-width measurements. The features consist of three contour patterns which are black-bit peak contour histogram, a horizontal stroke and a character envelope contour. Meanwhile, the topological segmentation just occupies detection and measurement of the leading and trailing edges of character stroke. Objective function that uses projection profile is suggested in [45]. They argue that joining adjacent characters must have minimum vertical projection. Objective function is calculated from the ratio of the second derivative of projection-profile curve to its height. However, their method is only suitable for single point touching characters. Meanwhile, it is failed when deal with multi point touching characters, particularly such as "oo', "oe" and "od". This is due to lack of the vertical strokes near the touching points so that pixel projection could not maximize. Later on, technique is improved by introducing a peak-to-valley function [33]. Meanwhile, in [46] one-dimensional curve segment is investigated to represent connected component and assessment is based on its curve slope. Likewise, automatic decision rules are generated as a test of cutting candidate in [47]. Afterward, machine printed touched character segmentation approach using discriminating functions is proposed in [48] that actually enhance the strategy proposed in [2]. Pixel projection and profile projection techniques are employed as discrimination functions to solve heavily touching printed character particularly such as “oo”, “od” and “oe”. Furthermore, they apply forward segmentation along with backward merge procedure based on the output of a character classifier that work on the components generated by discriminating functions. Additionally, authors include spell checker to improve final result. Despite all of these efforts, touched character segmentation and recognition accuracy is low.

106 International Journal of Research and Reviews in Computer Science (IJRRCS)

To solve handwritten connected digit string problems, stroke tracing strategy is explored in [49]. In this regards, touching point is detected by tracing contour that have significant right turns. Furthermore, vertical histogram is calculated as distance between lower and upper contour and touching point is expected around the valley. However, significant right turns might lead to fault when the digit contains significant right turns such as ‘4’, ‘6’, ‘7’. Moreover, the method worked only for digit string that is connected by a single stroke. Simple threshold on the recognition score to detect touched characters is implemented in [30]. Subsequently, detected touching characters are blurred and separated based on the “valley” (i.e., the continuous local minima) on the intensity surface. However, this technique has two drawbacks. (i) Touching character detection scheme may perform over-detection or under-detection, (ii) Component characters that have heavily touching or many detected valleys degrade segmentation performance. Segmentation of mathematical formula based on projection profile and discrimination function of [45] is implemented in [29]. Accordingly, original image is blurred into gray-scaled image using Gaussian kernel then segmentation direction is estimated by calculating minimal points horizontally and vertically. An interesting study of touching numeral based on background analysis is reported in [50]. Accordingly, three segmentation strategies for following situations are proposed. (i) A two-point-path which does not cross the background region, (ii) A two-point-path which crosses the background region (iii) A one-point-path. Touching problem share same partitioning path, thus stroke width estimation is taken to reconstruct partitioning path. Figure 4 illustrates general idea of partitioning and reconstruction of the algorithm. Nevertheless, their method increases the computational cost without providing promising result. Since, most of classifier still can recognize the character properly when equally segmented by partitioning path without complex reconstruction.

Figure 4. Partitioning and reconstruction [50]. Segmentation of single and multi-touching handwritten numeral string is performed in [38]. A set of features are extracted from background and foreground pixels. Figure 5 shows four segment features selected from top, bottom, stroke and hole. Furthermore, several points of fork-points, end-points and corner-points are located. Possible segmentation points are constructed downward and upward by connecting a set of features along with heuristic rules. Finally, mixture Gaussian probability function is used to decide the best segmentation path along possible

Vol. 1, No. 2, June 2010

segmentation points. Besides that, useless stroke are successfully removed. Erroneous of segmentation path is mostly due to the overlapping part. Moreover, probability of segmentation path becomes low when segmentation point deflected too much from center of image and one of segmented digit has a width larger than the others.

Figure 5. Four segment features: top, bottom, stroke and hole segment. Likewise, novel segmentation approach by taking a set of points based on contour and profile features is presented in [51]. Two points are considered to determine segmentation path. First, local minima of contour and profile features are defined as basic point (BP). Second, a point with more than two pixels in its neighborhood is defined as intersection point (IP). Afterward, Euclidean distance scheme is applied to determine proximity between IP and BP. Nevertheless, their approach cannot solve the problem of multiple touching. Different strategies using multiple agent-based segmentation approach are proposed in [52]. Accordingly, two agents are developed to locate possible segmentation points. First agent deals with original image to locate possible segmentation points based on the estimation of stroke width. Sudden change of stroke thickness is considered as possible segmentation points. Some work on thinned image is also carried out to generate possible touched character segmentation. Four segment types are taken to represent touching points on horizontal-top, horizontal-middle, horizontal-bottom, and vertical segment. Several rules are applied in negotiation and conflict resolution phase. Touching in Hangul character is investigated in [53]. A runlength code that scans vertically and horizontally is used to define closed section, half-opened section, and opened section, systematically. Furthermore, several adjacent of runlength code are analyzed using three touching type functions to detect touching point candidates. Rather than only three functions of touching type, more study is desired to overcome confusing touching cases. However, run-length code approach is extremely time-consuming. Due to simplicity of projection analysis, it has been widely used by many researchers to solve segmentation. Projection profile is implemented to locate potential touching points of Japanese handwritten characters based on connection condition of lines [54]. Simplified projection along with number of crossing is taken to determine connected lines. Furthermore, potential points are evaluated using two evaluation approaches. Whereas, it takes into account the correlation between connected components and connected

107 International Journal of Research and Reviews in Computer Science (IJRRCS)

condition of lines. Finally, fourth evaluation step is proceeded to obtain correct segmentation points. Meanwhile, morphological structural features are investigated to separate touching handwritten numeral strings [55]. Firstly, four types of structural region points are taken to determine lower and upper region. Furthermore, two left numeral are extracted based on the structural regions points. Finally, as part of segmentation strategies, they further detect several features such as hole contour features, the determination of the touching hole contour, the recognition of the left numeral, the determination of the single or the double-touching model, the searching of the touching point or match-touching point. Most of the process is conducted by applied tree-likes rules. The whole system is encouraging but demands high computational and storage for preprocessing, detecting and analyzing the morphological structure. Water reservoir to locate space between touching numeral strings is introduced in [56,57]. In this sense, if water is poured from top or bottom then the space will be filled by water. Prior to segmentation, isolated and touched numeral are separated by using specific rules. In this regards, distinction is performed base on size and number of water reservoir. However, the whole scheme is based on the assumption that touching numeral always acquire large and more water reservoir. Following separation, bounding box (BB) is applied on touching component to find touching position. Touching position is detected horizontally and vertically. Gravity center of water reservoir that lies in the vertical middle of BB is defined as best reservoir for touching. Moreover, when water reservoir lies in horizontal top, middle, bottom regions then touching numeral is detected in those regions. After the detection of touching position and analyzing the profile of the reservoir, the initial feature points for segmentation are determined. In this regards, authors considered close loops, reservoir heights and distance from center of the component. Furthermore, the initial feature points are ranked and the best feature point (highest rank point) is taken as segmentation point. Finally, based on the best feature point, close loop positions and morphological structure of touching region, the cutting path is generated. Several drawbacks are reported, such as, ratio of two segmented digit is too big, cutting length is very long, best reservoir dose not exists, and boundary of reservoir contains break point. Nevertheless, water reservoir approach might failed when deal with broken character, where the water cannot find a space that can be filled. In addition, it is error-prone. An enhanced segmentation approach for touched Japanese (Kanji) characters is presented in [20]. To estimate the candidate separating points of the touched characters, projection and graphical analysis is performed. Finally, to generate a separating line two evaluation procedures are applied to evaluate each of segmented patterns i.e. overlapping width on modified envelope and the angle of virtual corner. However, they deal with only two touched Kanji characters. An accuracy rate of 83% is reported. Mould for several types of numeral string is introduced in [58]. Authors define five types of mould segmentation as shown in Figure 6. In this regards, mathematical models is used to design all types of segmentation moulds. Multimould segmentation procedure evaluates four parameters called start point, height, slant angle slope and central point. However, the start point is the crucial point to determine among the mould parameters. Accordingly, it is based on

Vol. 1, No. 2, June 2010

average strokes width and minimum up-down in the range; the optimum point is selected as start point. Furthermore, each segmentation mould cost is calculated using cost function that combines two factors: number of segmented strokes and crossing position of foreground pixels. Minimum cost is considered as final segmentation mould.

Figure 6. Types of segmentation path (a) diagonal shape, (b) arc shape, (c) up-arc and down-diagonal shape, (d) updiagonal and down-arc shape, (e) up-arc and down-arc. [58] Novel segmentation approach particularly for singletouching of digit string is proposed in [11]. In this regards, they classify the problems into four types of single touching numerals as follows: (i) Single-touch that shared one point. (ii) Single-segment touch with same shared contour/stroke. (iii) Smooth touch that shared one stroke smoothly. (iv) Single touch with extra ligature. Pre-processing is required since this method is based on thinned image. The segmentation is started by detecting two separation points based on deepest valley ( Pv ) and highest hill ( Ph ) of histogram. In case, if

Pv and Ph do not exist,

central line of the image is considered as vertical separation point. Additionally, three kinds of feature points which are end, branch and cross points; closest to Pv and Ph are

Phv is determined by finding nearest vertical line between Pv and Ph . Finally, heuristic rules are employed extracted.

for deciding the initial separation of the two digits as well as how to restore each of them. A heuristic rule works on the basis of features points that have been found. Encouraging result is shown on single touching digit string. However, thinning process is computationally expensive and sometimes produces noise on strokes that might be extracted as useless feature points. Moreover, there is high probability that heuristic rules are failed for removing ligature/extra stroke and reconstructing digit string when broken characters are present. Graph representation has been widely used for stroke extraction. Using graph representation, thinned-based segmentation strategies for single and multiple touching problems of handwritten numeral are presented in [59]. At the pre-processing stages, input image is enhanced by image smoothing, noise elimination and slant normalization [60] and thinning using Hilditc’s algorithm [61]. In the thinned image, 8-neighbours of each pixel are evaluated to find end points, T-joint points, and crossing points. Moreover, edges

108 International Journal of Research and Reviews in Computer Science (IJRRCS)

between two vertices are traced to locate significant turning points. Edges that are less than the width of average stroke is removed as well related vertices are reduced to one. Afterward, two vertices of touching points are located by calculating upper valley and lower hill of vertical histogram. Following graph representation, heuristic rules that based on theory graph are applied to classify the touching as singlepoint, single-segment or multiple touching. In single-point touching, useless ligatures in the image are eliminated using bounding box strategies. Meanwhile, in single-segment touching two edges are duplicated and grouped to left and right part to reconstruct the graph. To find shortest path in the multiple touched area, Djikstra’s algorithm [62] is applied. Finally, potential strokes of two segmented graph of numeral are calculated using estimated stroke width of digits. However, graph representation need an input image that is smooth and without spurious stroke. Also at the final stage, graph is reconstructed based on numeral stroke width. Hence, pre-processing and post-processing contribute high computation cost. However, deepest and highest points on valley and hill respectively, might present an undesired position due to writing style. Thus, heuristic rules misclassify the category of touching characters that leads to incorrect segmentation. Historically, character segmentation of Oriya scripts (Indian) is implemented in [63]. In Oriya script, statistical analysis shows that the touching position is mostly at the top, bottom or in the middle. Analysis and recognition is performed on the whole document to segment text into lines, words and characters. Particularly for character segmentation their method is based on water reservoir [56].Likewise previous methods, segmentation of isolated and touched characters is performed for further processing. Water reservoir is used to determine the touching position. In this regard, three different heuristic segmentation strategies are applied to each touching position. Following touching segmentation, the output is passed to isolated and touched detection module, to check whether any of these segmented parts is connected. The output is again fed to segmentation module upon detection of touching pattern. Using this iterative process, touching more than three characters can be handled. However, water reservoir might find an undesired position due to writing style that makes segmentation error-prone. Chinese language has a large set of characters. The primary set consists of 3,755 characters along with different writing styles. Touching problem on Chinese characters is investigated in [19]. In this regards, authors employ genetic algorithm to find the best segmentation path. Regarding to the fitness function, Mixture Gaussian function is implemented [38] and eight features are defined as parameters to the function. Accordingly, six features are adopted from [38] that calculates ratio of height, width, number of black pixel and vertical length. Additionally, convex-hull ratio feature is adopted from [27]. Lastly, Ycoordinate variance is introduced which calculates the Ycoordinate covariance of all points on the segmentation path. Segmentation path zone (SPZ) is determined as boundary of segmentation path features where it expected to be found. SPZ is calculated by 1/5 of height from middle of touching characters image to its upper and lower. However, speed and efficiency of their method is lower than expectation due to the randomness in genetic algorithm. Using mixture Gaussian function, computational cost is quite high even for

Vol. 1, No. 2, June 2010

simple segmentation problems. Moreover, it only work for two touching Chinese characters. A new algorithm for separating a touching pair of digits by using the graph representation of the pattern is proposed in [10]. The segmentation can be regarded as grouping these edges and vertices into two disconnected sub-graphs. This process is executed by applying graph theory methods and certain heuristic rules are also employed. An accuracy rate up to 88.40% is reported on NIST database. Jayarathna and Bandara [8] propose an approach for the segmentation of offline handwritten connected two-digit strings based on the analysis of the foreground pixel distribution. In the segmentation stage, the junction based splitting technique decides complete segments of the connected digit strings. Major segments are differentiated from the minor segments using fuzzy characteristic values. At the character segmentation stage, all major segments are combined with each of minor segments to generate set of different connection sketches. However, the algorithm deals with the one stroke connection of two adjacent digits. Another segmentation scheme for touched and multi-oriented machine printed characters is introduced in [32]. Proposed segmentation of touching character is based on cavity region at the background portion. Convex hull is used to generate all points in cavity region. Furthermore, several predictions of segmentation lines are taken from convex hull residue. Finally, all predictions are fed to SVM classifier and final segmentation point with highest confidence recognition is selected. In addition, SVM classifier is trained based on angular information that divides into several zones using circular ring and convex hull. Author reports that based on the experiments, by using 1200 touched character data in multi-oriented direction, proposed scheme can achieve 94.44% segmentation accuracy. However, the approach is applicable to two touched characters only. Touched characters also present serious problems in interpretation of mathematical expressions. Technique of touched characters segmentation in mathematical expressions is presented in [16]. Partitioning paths based on contour features and width to height ratio. However, they deal with machine printed mathematical expressions. Dropfall splitting algorithm to segment handwritten touched numeric is introduced in [7]. A set of Drop-fall algorithms construct four possible cutting paths. Based on pattern oriented strategy they determine which Drop-fall algorithm produces best separation path. Accordingly, background region of touching digits is analyzed by using water reservoirs. Finally, to recognize segmented digits, structural features are extracted. An accuracy rate of 88.18% is reported. Recently, an enhancement for segmentation of two Roman touched cursive handwritten characters is proposed in [3]. According to the authors, touching pairs can be divided into three regions (i.e left, right and middle region) and these regions acquire unique characteristics. By using SOM the touching parts are identified based on their characteristics. Figure 7 exhibits segmentation results. The methodology consists of three steps: (i) Estimate the core zone (ii) Extract the feature points as input to SOM (iii) Determine the segmentation column

109 International Journal of Research and Reviews in Computer Science (IJRRCS)

Vol. 1, No. 2, June 2010

5. Recognition Based Segmentation (Implicit Segmentation)

Figure 7. Results of SOM segmentation The improved technique successfully solves single- and multiple-touching problems. It is a dynamic contribution to the Roman cursive touched character problem but side by side it also has some serious deficiencies. The main drawbacks of this technique are: firstly it performs linear (vertical) segmentation which results into broken characters. Consequently, one character deprives of its important part while the other one has extra junk which lead to misclassification at later stage. Secondly, it is applicable only for two touched characters and fails for more than two touched characters. Additionally, it doesn’t work when ratio of touch characters is uneven due to the complexity of desired middle region as its application lies in core zone. Finally, it is training base algorithm for which an accurate training set is mandatory. But almost in all benchmark databases touch character samples are very rare. So researchers produce the synthetic data which is a labour intensive task as well as computationally expensive. Finally, trained SOM for two touched characters is failed for three or more touched character segmentation. Likewise, a new approach of touching string segmentation is proposed in [64]. The supervised learning is used on the labelled examples and the Markov Random Field (MRF) has been applied on. Further, propagation minimization method is employed to select the candidate patches based on the compatibility of the neighbour patches. The output of the MRF after the iterative belief propagation forms a segmentation probability map. Finally, the cut position is extracted from the map. An accuracy rate of 94.8% is reported. However, the algorithm deal with machine printed touched character segmentation. A segmentation scheme of two connected numerals based on characteristic position and contour detection is proposed in [65]. First, graph-representation of the image is derived using characteristic position and segmentation path is derived using contour analysis. Finally, a heuristic criterion is applied to choose the best segmentation from the candidates. An accuracy rate up to 88.2% is reported at 2.7 ms speed. Another, handwritten touching numerals segmentation algorithm is proposed in [66]. The method uses Drop-fall algorithm based on the moving of a marble on either side of the touching characters for selection of the cutting path of the fused components. However, no accuracy rate is reported. Authors also claim that approach is applicable to touched handwritten Roman alphabets segmentation.

Generally, most of strategies in the literature over-segment the touching characters then join adjacent partitions based on pattern recognition as segmentation decision. Meanwhile, other strategies use intelligent techniques to determine touching segmentation points. Using intelligent system it might achieve high segmentation accuracy; however, it needs huge training data and time that creates considerable overhead. In addition, for recognition-based approaches, the accuracy of segmentation depends greatly on the robustness of recognizer. Casey and Nagy [2] introduce a recursive segmentationrecognition based method for machine printed touched characters. Accordingly, segmentation decisions are confirmed by successful classification of the resulting character patterns. In case classification fails, then different partitions of the input pattern are explored using adaptive window. However, even for a specific font with proportional character, their method is still not easy to select a suitable decision on the first attempt. Likewise, touched character segmentation approach based on recognition results is proposed in [67]. Prospective touching points are found by accumulating the black pixel using “AND” operation between neighboring columns. Furthermore, decision tree and a set of additional rules are constructed to find the character component sequences. However, building a decision tree and searching for a correct path need exhaustive computation. Moreover, it is possible that there are several paths in the tree concurrently. Procedure to split italic machine printed touching numeral in cadastral maps is proposed in [31]. Firstly, touching numeral string is fed to OCR that based on Fourier descriptor [68]. In this regards, first touching numeral are detected when certainty value of contour is less then some threshold. Following, touched part detection, convex hull of outer contour is computed along with baseline detection. Afterward, parallelogram is used to detect italic fonts. Finally, splitting position is determined using peak-to-valley function [69]. Touched Hangul (Korean) printed characters and alphanumeric characters segmentation is presented in [28]. Accordingly, MLP (Multi-layer perceptron) with 72 inputs and 60 outputs is trained to produce candidates’ segmentation in touching characters. Furthermore, two robust classifiers for Hangul printed and alphanumeric characters are employed to choose correct cutting points among candidates. They claim segmentation accuracy rates up to 96.2%. However, it shows that lack of training data will lead to segmentation error. For printed characters, segmentation-recognition based approach using side profile is proposed in [70]. Cutting cost from several segmentation candidates is examined. In this regard, confidence value is calculated by taking a few columns in its neighborhood. The cutting path that has a minimum cutting cost is first considered to segment the touching characters. Finally, pattern matching is performed repeatedly along with candidate cutting path until correct result is achieved. Hitherto, touching problems with combination of ‘r’ and ‘n’, ‘u’ and ‘i’, ‘nn’ and ‘m’ etc are still incorrectly segmented. Zhang and Suen approach [71] deal with numeral string taken from courtesy amount of bank checks. Two

110 International Journal of Research and Reviews in Computer Science (IJRRCS)

segmentation stages named global and local segmentation are proposed. Global segmentation generate hypothesis tree consist of sub-images that based on sub-image width and connected component distance. Further, local segmentation stage is applied to process the touching problem that is rejected by global segmentation. In this regards, significant contour points propose in [72] locate entry and exit points of segmentation paths. Segmentation path produces left and right sub-images. Left and right sub-images are verified by a trained MLP classifier. In addition, Zhang and Suen [73] implement double zeros classifier that particularly recognize touching “00” as single unit. However, their approach is only effective for single touching since it uses only corner points of outer contour. Moreover, the MLP classifier is particularly trained for “00” touching problem. Thus it cannot deal with multiple-touching such as “88”. In printed Devanagari characters, touching problems might appear due to ink spreading. Such problems are investigated using recognition-based segmentation in [74]. Horizontal and vertical projections are calculated to determine possible segmentation points. Moreover, further segmentation is conducted on segmented image that has width greater than the average where image contains shadow of characters. Another segmentation-based recognition strategy for 3,500 touching digit pairs of NIST (SD-19) is proposed in [12]. Several structural features are extracted to generate multiple hypotheses for further verification by MLP digit classifier. Broken character problem using connected component analysis is also taken into account. When a broken part is detected it is merged into a neighboring connected component. Furthermore, touching type detector is applied to find break point for possible segmentation path. Touching type detector consists of four structural analyses. First, candidate break points consist of transition points, minimum of valley and maximum of mountain are defined from upper and lower contour. Transition points are determined by calculating difference of upper and lower contour or by considering points that are greater than threshold. Meanwhile, minimum/maximum valley/mountain is determined by tracing its local min/max. Second, ligature analysis is performed to check existence of ligature and fake ligature. If average value of vertical difference between starting and ending break points of candidate ligature is smaller than threshold, then the candidate ligature is considered as ligature. Otherwise, it is considered as fake ligature or part of digits. Third, touching type’s analysis are employed to classify six types of touching based on existence of ligature or fake ligature. Fourth, non-vertical candidate break points are employed to overcome vertical segmentation that can cause separation in the body of digit. Slant angle estimation is based on vertical and diagonal direction of chain code pixels. Finally, all candidate break points are analyzed and deduced by segment combination generator. Afterward, all segment combination is verified using MLP isolated digit recognizer. However, their approach is limited for two touching digit string. Moreover, segmentation accuracy for multiple touching still is quite less because it fails to detect the break points accurately. Several threshold are adopted that might not suitable for other digit data set. In addition, it is reported that bad segmentation leads to misclassification for digit “1” and “7”. Recognition-based segmentation of touching character in mathematical expressions is also presented in [5]. In preliminary study, experiments are restricted for only two

Vol. 1, No. 2, June 2010

touched characters. Firstly, touching points are detected using aspect ratio and peripheral features. Secondly, candidate of touching characters are extracted as two components. Furthermore, each component is verified using matching procedures. If the matching is true then both components are judged as separate characters else judge character as isolated. Authors report 51% success rate for touched characters detection. In addition, three main reasons of failure are also mentioned. (i) Two character candidates may not find in touched character pattern. (ii) Single character that seems to be touched characters, (iii) Touching character are separated incorrectly as two candidate characters Figure 8 exhibits methodology.

Figure 8. Diagram of the proposed approach [5] using template matching. In the limited environment, recognition problems can be solved by defining hierarchy of all possible words. Using this strategy, problems can be divided into small pieces thus higher recognition rates can be achieved. In this regards, recognition strategies for handwritten Chinese address are proposed in [75]. Chinese character have complicated touching problems hence difficult to segment all characters correctly. However, extracting key character of a Chinese character’s string is more easy and suitable to recognize the whole word. Additionally, authors enhance conventional key character extraction by calculating matching distance of each single key character and its combinations. Candidate with less matching distance value is considered as final recognition. Holistic approach is applied to recognize two touching characters as a single unit. Hence, it can overcome complex and error-prone segmentation process of Chinese characters. Critical concave points (CP) to determine cutting candidate points in machine printed touched characters is suggested in [76]. Characters concave points consist of up-turned (∩) and down-turned (U) structures. Both structures are found using chain code representation along with decision rules to determine others up-turned or down-turned. Figure 9 illustrates detected concave points. Following detection, there are several cutting candidates that should be in ordered. Accordingly, authors define three rules for probability of candidate segmentation. (i)

Best location on x-axis is calculated by special cutting cost function.

111 International Journal of Research and Reviews in Computer Science (IJRRCS)

(ii) (iii)

For cutting along y-axis, there should be less number of pixels. Closer to top of the image.

Cutting is performed locally rather on whole image height to overcome slicing into unwanted parts. Finally, all possible cutting points are verified using robust classifier to determine whether the segmented characters are correctly recognized or misclassified. Experiments are conducted on printed characters; however, accuracy of method is also claimed for hand-printed characters. Finally, detecting all critical concave points is time consuming because it traces all the contours. In addition, concave up-turned and down-turned only suitable for single touching hence it fails for long touching without concavity and holes due to multiple touching.

Figure 9. Black points show all critical concave points. Contour and projection analysis is performed to segment touched handwritten numeric string in [13]. Firstly, height of the input image is normalized and width scale factor is calculated. To determine, whether input image is single or string, single-digit classifier is adopted [77]. The single-digit classifier takes input features named multi-scale directional element and produce output as nested-subset Mahalanobis distances. In this regard, fuzzy inference function is applied to map output values into two categories, single and string digits. Further processing is conducted for string digits that contain more than one digit. Candidate segmentation points are generated from corner points of contour and horizontal projection. For this purpose, corner points of digit string are traced on top and bottom contour. Several rules are employed to eliminate unwanted corner points. Afterward, projection-based algorithm is developed to add more candidate segmentation points that cannot be detected by contour analysis. All establish end-points are used to determine candidate segmentation lines that produce clique of digit string image as depicted in Figure 10. Finally, the mathematical expression is developed to obtain optimal segmentation and search for the optimal segmentation is implemented by three steps.

(i) (ii)

Vol. 1, No. 2, June 2010

Recognition of all cliques. Evaluate the optimal recognition result of each clique. (iii) Evaluate the optimal segmentation method that maximized the mathematical expression. However, the proposed method is suitable for single point touched handwritten digits only.

Figure 10. Clique of digit string image.

6. Conclusion and Future Directions This paper has presented state of the arts on touched character segmentation. A critical description of the major approaches is exhibited. Our efforts have been conducted in comparative performance evaluation and highlighted strengths, weaknesses of individual algorithms. In the last decades, research in handwriting recognition area has much progressed due to invention of the high computational power machines. However, current systems still have application in restricted domains such as bank check recognition, postal services, pharmacy and have only tested small data set. The future research needs systems for widespread applications at high speed and accuracy. One possible solution for the touched character segmentation and recognition is the availability of the linguistic information. It is also believed that wide use of context and classifier confidence will lead to improve accuracies. However, there is some experimental data to permit an estimation of the amount of improvement to be attributed to advanced techniques. Perhaps with the wider availability of benchmark databases for touched characters, experimentation will be carried out to shed light on this issue. Finally, it is hoped that this comprehensive discussion will provide insight into the concepts involved, and will provide further guidelines. We regret to researchers whose valuable contributions may have been overlooked.

References [1] S. Marinai, M. Gori, and G. Soda. “Artificial Neural Networks for Document Analysis and Recognition”. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 27, No. 1, pp. 23-35, 2005. [2] R. G. Casey, and G. Nagy. “Recursive segmentation and classification of composite character patterns”. Proceedings of the 6th International Conference on Pattern Recognition, Munich, Germany. 1982. [3] F. Kurniawan, A. Rehman, M. Dzulkifli and S. Mariyam, S. “Self Organizing Features Map with Improved Segmentation to Identify Touching of Adjacent Characters in Handwritten Words”. Proceedings of the Ninth International Conference on Hybrid Intelligent Systems (HIS 2009) Shenyang, LiaoNing, China, Vol. 1, pp. 475-480, 2009. [4] H. Fujisawa. “A View on the Past and Future of Character and Document Recognition”. Proceedings of the Ninth International Conference on Document Analysis and Recognition, pp. 3-7, 2007. [5] A. Nomura, K. Michishita, S. Uchida and M. Suzuki “Detection and Segmentation of Touching Characters in Mathematical Expressions”. Proceedings of the Seventh

112 International Journal of Research and Reviews in Computer Science (IJRRCS)

International Conference on Document Analysis and Recognition, Vol.1, pp. 126-130, 2003. [6] T. Zhang, X. Wang, C. Chen and J. Liu. “Connected Numeral Strings Segmentation Based on the Combination of Characteristic Position and Contour Detecting”. Proceedings of the First International Conference on Digital Image Processing, pp. 81-84, 2009. [7] M.A. Rui, Z. Yingnan, X. Yongquan, and Y. Yunyang. “A Touching Pattern-oriented Strategy for Handwritten Digits Segmentation”. Proceeding of International Conference on Computational Intelligence and Security, pp. 174-179, 2008. [8] U.K.S. Jayarathna, G.E.M.D.C. Bandara. “New Segmentation Algorithm for Offline Handwritten Connected Character Segmentation”. Proceedings of the First International Conference on Industrial and Information Systems, pp. 540-546, 2006. [9] S. Ouchtati, M. Bedda and A. Lachouri. “Segmentation and Recognition of Handwritten Numeric Chains” Journal of Computer Science. Vol. 3, No. 4, pp.242248, 2007. [10] M. Suwa. “Segmentation of connected handwritten numerals by graph representation”, Proceedings of Eighth International Conference on Document Analysis and Recognition, Vol. 2. 750 – 754, 2005. [11] A. Elnagar, and R. Alhajj “Segmentation of connected handwritten numeral strings”. Pattern Recognition, Vol. 36, No. 3, pp. 625-634, 2003. [12] K.K. Kim, J.H. Kim, C.Y. Suen. “Segmentation-based recognition of handwritten touching pairs of digits using structural features”, Pattern Recognition Letters, Vol. 23, No. 13, pp. 13-24, 2002. [13] L. Yun, C.S. Liu, X.Q. Ding, and F. Qiang,. “A recognition based system for segmentation of touching handwritten numeral strings”. Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition, pp. 294-299, 2004. [14] G. Yong, Z.Yan and H. Zhao. Touching String Segmentation Using MRF. International Conference on Computational Intelligence and Security, pp. 520-524, 2009. [15] L. Zhongkang, C. Zheru, S. Wan-Chi, S. Pengfei. “A Background-thinnig-based Approach for Separating and Recognizing Connected Handwritten Digit Strings”. Pattern Recognition, pp. 921-933, 1999. [16] X. Tian, and Y. Zhang. “Segmentation of Touching Characters in Mathematical Expressions Using Contour Feature Technique”. Proceedings of Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, pp. 206-209, 2007. [17] S. Wshah,. Z. Shi, and V. Govindaraju. “Segmentation of Arabic Handwriting Based on Both Contour and Skeleton Segmentation”. Proceedings of the Tenth International Conference on Document Analysis and Recognition, pp. 793 – 797, 2009. [18] Lorigo, L. and Govindaraju, V. “Segmentation and Prerecognition of Arabic Handwriting”. Proceedings of International Conference on Document Analysis and Recognition, Vol. 2, pp. 605-609, 2005. [19] X. Wei, S. Ma, and Y. Jin. “Segmentation of Connected Chinese Characters Based on Genetic Algorithm”. Proceedings of the Eighth International Conference on

Vol. 1, No. 2, June 2010

Document Analysis and Recognition (ICDAR'05), Vol.2, pp. 645-649, 2005. [20] Y. Teruyuki, T. Shinji, Y. Tomohiro, S. Tsuyoshi, M. Eiji, O. Hisao. “A Segmentation System for Touching Handwritten Japanese Characters. Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02), pp. 407-412, 2002. [21] W. Seo, and B.J. Cho, “Efficient Segmentation Path Generation for Unconstrained Handwritten Hangul Characters”. Lecture Notes in Artificial Intelligence, Vol.3192, pp. 438–446, 2004. [22] J.H. Bae, K.C. Jung, J.W. Kim, H.J. Kim. “Segmentation of touching characters using an MLP”. Pattern Recognition Letters, Vol. 19, pp. 701–709, 1998. [23] A. Rehman, and M. Dzulkifli. “A Simple Segmentation Approach for Unconstrained Cursive Handwritten Words in Conjunction of Neural Network”. International Journal of Image Processing, Vol. 2(3), pp. 29-35, 2008. [24] A. Broumandnia, J. Shanbehzadeh, M. Nourani. “Segmentation of Printed Farsi/Arabic Words”. Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, pp. 761 – 766, 2007. [25] S. Wshah, Z. Shi, and V. Govindaraju. “Segmentation of Arabic Handwriting based on both Contour and Skeleton Segmentation Proceedings of the 10th International Conference on Document Analysis and Recognition, 793 – 797. [26] Z. Han, C.P. Liu, X.C. Yin. “A two-stage handwritten character segmentation approach in mail addresses recognition”. Proceedings of Eighth International Conference on Document Analysis and Recognition, Vol. 1, pp. 111-115, 2005. [27] X. Wei, and S. Ma. “Segmentation of touching Chinese character based on convex hull ratio feature”. Journal of Chinese Information Processing, 91-96, 2005. [28] B. Jin-Hak, J. Kee-Chul, K. Jin-Wook, and K. HangJoon. “Segmentation of touching characters using an MLP”. Pattern Recognition Letters, Vol. 19(8), pp. 701-709, 1998. [29] O. Masayuki, S. Syougo, and S. Tadashi. “Segmentation of Touching Characters in Formulas”. Proceedings of Third IAPR Workshop on Document Analysis Systems Theory and Practice, 1999. [30] M. Okamoto, S. Sakaguchi, S. and T. Suzuki. “Segmentation of touching characters in formulas”. Lecture Note in Computer Science Vol. (1655), 1999. [31] G. Monagan. “A Procedure for Segmenting Touching Numbers in Cadastral Maps”. Proceedings of IAPR Workshop on Machine Vision Applications (MVA 1994), Kawasaki, Japan, 1994. [32] P.P. Roy, U. Pal, and J. Llados, Recognition of Multioriented Touching Characters in Graphical Documents. Proceedings of Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 297-304, 2008. [33] Y. Lu. “Machine Printed Character Segmentation: An Overview”. Pattern Recognition, Vol. 28(1), pp. 67-80, 1995.

113 International Journal of Research and Reviews in Computer Science (IJRRCS)

[34] Y. Lu, and M. Shridhar. “Character Segmentation in Handwritten Words–An Overview”. Pattern Recognition, Vol. 29, No. 1, pp. 77–96, 1996. [35] R.G. Casey and E. Lecolinet. “A Survey of Methods and Strategies in Character Segmentation”. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 17, pp. 690-706, 1996. [36] U. Garain and B.B. Chaudhuri. “Segmentation of Touching Characters in Printed Devnagari and Bangla Scripts Using Fuzzy Multifactorial Analysis”. IEEE Transaction on Systems, Man and Cybernetics, Vol. 32(4), 449-459, 2002. [37] W. Xian, V.Govindaraju and S. Srihari. “Multi-experts for touching digit string recognition”. Proceedings of the Fifth International Conference on Document Analysis and Recognition, pp. 800-803, 1999. [38] C. Yi-Kai and W. Jhing-Fa. “Segmentation of Single or Multiple-Touching Handwritten Numeral String Using Background and Foreground Analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, 1304-1317, 2000. [39] X. Zhu, and X. Yin. “A New Textual/Non- textual Classifier for Document Skew Correction”. Proceedings of the 16th International Conference on Pattern Recognition (ICPR), pp. 480-482, 2002. [40] W. Xianghui, M. Shaoping and J. Yijiang. “Segmentation of Connected Chinese Characters Based on Genetic Algorithm”. Proceedings of the Eighth International Conference on Document Analysis and Recognition, 2009. [41] G. Yong, Z. Yan and H. Zhao. “Touching String Segmentation Using MRF”. International Conference on Computational Intelligence and Security, 520-524, 2009. [42] C.E. Dunn and P.S.P. Wang. “Character segmentation techniques for handwritten text - A Survey”. Proceedings of 11th International Conference on Pattern Recognition, 577-580, 1992. [43] S. Jiqiang, L. Zuo, M.R. Lyu, and C. Shijie. “Recognition of merged characters based on forepart prediction, necessity-sufficiency matching, and character-adaptive masking”. IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 35, No. 1, pp. 2-11, 2005. [44] R. L. Hoffman and J. W. Mccullough. “Segmentation Methods for Recognition of Machine-printed Characters”. IBM Journal of Research and Development, pp. 153-165, 1971. [45] S. Kahan, T. Pavlidis and H.S. Baird. “On the recognition of printed characters of any font and size”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 9(2), pp. 274–288, 1987. [46] H.J Lee, and M.C. Lee. “Understanding Mathematical Expressions in a Printed Document”. Proceedings of Second International Conference on Document Analysis and Recognition, pp. 502-505, 1993. [47] T. Bayer and U. Kressel. “Cut classification for segmentation”. Proceedings of International Conference on Document Analysis and Recognition, Tsukuba Science City, Japan, pp. 565–568, 1993. [48] S. Liang, M. Shridhar, and M. Ahmadi. “Segmentation of touching characters in printed document recognition”. Pattern Recognition, Vol. 27(6), 825–840, 1994.

Vol. 1, No. 2, June 2010

[49] Z. Shi. and V. Govindaraju. “Segmentation and Recognition of Connected Handwritten Numeral Strings”. Pattern Recognition, Vol. 30(9), 1501-1504, 1997. [50] H. Jianming, Y. Donggang, and Y. Hong. “Construction of partitioning paths for touching handwritten characters”. Non-Linear Analysis, Vol. 20, No. 3, pp. 293-303, 1999. [51] L. E. S Oliveira, E. Lethelier, F. Bortolozzi, and R. Sabourin. “A new segmentation approach for handwritten digits”. Proceedings of the 15th International Conference on Pattern Recognition, Vol. 2, pp. 323 – 326, 2000. [52] R. Alhajj, F. Polat. and A. Elnagar. “Employing multiagents to identify touching of adjacent digits in handwritten Hindi numerals”. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Vol. 4, pp. 2725 – 2730, 2000. [53] J.J. Yoon and G. Kim. “An Approach for Active Segmentation of Unconstrained Handwritten Korean Strings using Run-length Code”. Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam, 2000. [54] T. Yamaguchi, T. Yoshikawa, T. Shinogi, S. Tsuruoka and M. Teramoto. “A segmentation method for touching Japanese handwritten characters based on connecting condition of lines. Proceedings of Sixth International Conference on Document Analysis and Recognition, 837-841, 2001. [55] D. Yu, and H. Yan. “Separation of touching handwritten multi-numeral strings based on morphological structural features”. Pattern Recognition, Vol. 34, No. 3, pp. 587-599, 2001. [56] U. Pal. A. Belaid, and C. Choisy. “Water Reservoir Based Approach for Touching Numeral Segmentation”. Proceedings of Sixth International Conference on Document Analysis and Recognition, 892-896, 2001. [57] U. Pal, A. Belaid, and C. Choisy, (2003). “Touching Numeral segmentation using water reservoir concept”. Pattern Recognition Letters, Vol. 24, No. 13, pp. 261272, 2003. [58] Z. Hong-Gang, L. Gang, X. Wei-Ran, and G. Jun. “An algorithm of handwritten digits segmentation based on multi-mould”. Proceedings of International Conference on Machine Learning and Cybernetics, Vol. 2, pp. 1081–1084, 2002. [59] S. Misako and N. Satoshi. “Segmentation of Handwritten Numerals by Graph Representation”. Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition, 2004. [60] F. Kimura, M. Shridhar, and Z. Chen. “Improvements of a lexicon directed algorithm for recognition of unconstrained handwritten words”. Proceedings of the Second International Conference on Document Analysis and Recognition, pp. 18-22, 1993. [61] C.J. Hilditch. “Linear Skelton from Square Cupboards”. Machine Intelligence, Vol. 4, pp. 403-420, 1969. [62] E.W. Dijkstra. “A note on two problems in connection with graphs”. Nuerische Mathematik, Vol. 1, pp. 269271, 1959. [63] N. Tripathy and U. Pal. Handwriting Segmentation of Un-constrained Oriya Text. Proceedings of the International Workshop on Frontiers in Handwriting Recognition, pp. 306-311, 2004.

114 International Journal of Research and Reviews in Computer Science (IJRRCS)

[64] G. Yong, Z. Yan and H. Zhao. “Touching String Segmentation Using MRF” Proceedings of International Conference on Computational Intelligence and Security, pp. 520-524, 2009. [65] T. Zhang, X. Wang, C. Chen and J. Liu. “Connected Numeral Strings Segmentation Based on the Combination of Characteristic Position and Contour Detecting”. Proceedings of the First International Conference on Digital Image Processing, pp. 81-84, 2009. [66] A.V.S. Rao, M. Subbarao, N.V. Rao, A.S.C.S. Sastry, L.P. Reddy. “Segmentation of Touching Hand written Numerals and Alphabets”. Proceedings of Second International Conference on Computer and Electrical Engineering, 304-307, 2009. [67] S. Tsujimoto, and H. Assada. “Resolving ambiguity in segmenting touching characters”. Proceedings of the First International Conference on Document Analysis and Recognition, 1991. [68] O. Lorenz, and G. Monagan, “Retrieval of Line Drawings. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval”, Las Vegas, Nevada, 1994. [69] Y. Lu. “On the segmentation of touching characters”. Proceedings of International Conference on Document Analysis and Recognition, Tsukuba Science City, Japan, pp. 440-443, 1993. [70] J. Min-Chul, S. Yong-Chul, and S.N. Srihari. “Machine printed character segmentation method using side profiles”. Proceedings of IEEE SMC '99 Conference on Systems, Man, and Cybernetics, 1999. [71] S. Zhang and M.A. Karim, “A new impulse detector for switching median filters”. IEEE Signal Processing Letters Vol. 9, pp. 360-363, 2002. [72] N. W. Strathy, C.Y. Suen and A. Krzyzak. “Segmentation of handwritten digits using contour features”. Proceedings of the Second International Conference on Document Analysis and Recognition, pp. 577-580, 1993. [73] L.Q. Zhang and C.Y. Suen. “Recognition of courtesy amounts on bank checks based on a segmentation approach”. Proceedings of Eighth International Workshop on Handwriting Recognition, 298-302, 2002. [74] V. Bansal, and R.M.K. Sinha. Segmentation of touching and fused Devanagari Characters”. Pattern Recognition, Vol. 35, No. 4, 875-893, 2002. [75] W. Chunheng, Y. Hotta, M. Suwa, and N. Naoi. Handwritten Chinese address recognition. Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition, 539 – 544, 2004. [76] A. Ventzislav “Using Critical Points in Contours for Segmentation of Touching Characters”. Proceedings of the 5th International Conference on Computer Systems and Technologies, 2004. [77] J. Y. Zhang. and X.Q. Ding. “Multi-Scale feature extraction and nested-subset classifier design for high accuracy handwritten character recognition”. Proceedings of the 15th International Conference on Pattern Recognition, Vol. 2, pp. 581-584.

Vol. 1, No. 2, June 2010