Address Block Location and Pin Code Recognition for ...

4 downloads 33601 Views 459KB Size Report
Email: [email protected], [email protected]. Abstract. In this paper, we present a system towards Indian postal automation. In the proposed system ...
Address Block Location and Pin Code Recognition for Indian Postal Automation K. Roy, U. Pal, and B.B. Chaudhuri

Computer Vision and Pattern Recognition Unit; Indian Statistical Institute 203 B. T. Road; Kolkata-108; India. Email: [email protected], [email protected] Abstract

or more scripts. For example, see Fig.1 (a). Here the destination address part is written both by Bangla and English. Thus, development of Indian postal automation system is a challenging work. In this paper, we propose a system towards Indian postal automation. In the proposed system, at first, using run length smoothing approach and characteristics of different component, the postal stamp and postal seal parts are detected and removed from the documents. Next, based on the positional information DAB is located and pin code from the pin-code box is extracted. Finally, based on new features obtained from the concept of water reservoir principle [7] as well as topological and structural features, the Bangla and English numerals of the pin code part are recognized.

In this paper, we present a system towards Indian postal automation. In the proposed system, at first, using run length smoothing approach, we decompose the image into smaller blocks. Based on the black pixel density and number of components inside a block, non-text block (postal stamp, postal seal etc.) are detected. Using positional information, the destination address block (DAB) is identified from text block. Next, pin code box from the destination block address is detected and numerals from the pin code box are extracted. Finally, using water reservoir, structural and statistical features numerals are recognized for the sorting of the letters according to their pin-codes.

1. Introduction:

Rest of the paper is organized as follows. Pre-processing including data collection, noise removal, postal stamp detection and deletion, DAB location, pin code box detection and pin code extraction are described in Section 2. Section 3 deals with the recognition techniques of the numerals of the pin-code. Finally, results and discussion are given in Section 4.

Postal automation is a topic of interest for research for last few years and many pieces of published work are available towards postal automation of non-Indian languages documents [1-6]. Some systems are available for

postal automation of USA, UK and Australia, but no work has been done towards the automation of Indian postal system. To

the best of our knowledge there is no published work towards postal automation on Indian language scripts.

2.

Preprocessing:

2.1 Data collection and noise removal

One of the most important components in postal automation is to locate destination address block (DAB) and to extract the pin-code on the envelope. There are several difficulties in locating DAB on the envelope because an envelope is composed of not only DAB but also several other meaningful blocks such as return address block, postage stamp block, graphics etc. Furthermore, there exists wide variation due to several kinds of writing instruments, writing habits and the status of the enveloped surface. Detection of pin code from DAB is also difficult problem. In some Indian postal documents there are pincode boxes (e.g. post-card, Inland letters etc.) and there are some Indian postal documents without pin-code box (e.g. ordinary envelope, business letter etc.). From the experiment it is noted that some people write pin-code outside the pin-box area of post-card and inland letters although there is a specified pin-box area in these documents.

Document digitization for the present work has been done from real life data collected from a post-office (Cossipore post office of North Kolkata circle, West Bengal, India). We use a flatbed scanner (manufactured by UMAX, Model AstraSlim) for the digitization. The digitized images are in gray tone and we have used a histogram based thresholding approach to convert them into two-tone (0 and 1) images. Here ‘1’ represents object pixel and ‘0’ represents background pixel. The digitized document images may be skewed and we used Hough transform to de-skew the documents. The digitized image may contain spurious noise pixels and irregularities on the boundary of the characters, leading to undesired effects on the system. Also, to improve recognition performance, broken numerals should be connected. For preprocessing we use the method described in [8]. 2.2 Postal stamp detection and deletion. The binary image is processed to remove the Postal stamps and other graphics part present in the image. We have used Run Length Smoothing technique for the purpose [9]. At first, simple horizontal and vertical smoothing operations are performed. The two smoothing results are then combined in a logical AND operation. The

System development towards postal automation for a country like India is very difficult than that of other country because of its multi-lingual and multi-script behaviour. An Indian postal document may be written by any of the 18 official languages of India. Moreover, some people write the destination address part of a letter in two 1

The pin code box is the part where the pin code of the destination address is written. In India we use 6-digit pin code and it plays a vital role to determine the destination of the mail pieces. In some postal documents (e.g. Post card, Inland letter and envelop) there are specific pincode box and people generally write the destination pin code inside this box. Here, at first, we detect whether there is a pin code box or not. If it exists our method will extract the pin code from the pin code box.

results after horizontal, vertical and logical AND operation of Fig.1 (a) are shown in Fig.1 (b), (c) and (d), respectively. The result of logical AND operation is further smoothed to delete the stray part (see Fig. 2(a)). After run length smoothing, component analysis is applied over the image to get individual blocks. Block-wise checking is done to detect postal stamp or postal seal block. For each smoothed block component we find the boundary of the component and check the density of black pixels over the corresponding boundary area on the original image. We note that for postal stamp or postal seal block the density of black pixels is very high with compare to text line block. Also, we noticed that the postal stamp or postal seal block contains many small components whereas such small components are not present in other blocks. After detection of a postal stamp or postal seal block we delete that block from the documents for future processing.

(a)

(b)

(c)

(d)

(a) (b) Fig. 3: (a) Extracted part of pin box from the DAB shown in Fig. 2(b). (b) Extracted pin code numeral from the pin box.

For pin code box extraction we apply component labelling and select those components as candidates which satisfy the following criteria. A component is selected as candidate component if the length of the component is greater than five times of the width of the component and the length of the component is less than seven times of the width of the component. Since an Indian pin code box contains six square boxes, so the length of a pin code box component will about six times the width of the component. Based on this principle we choose the candidate component. Let X be the set of these selected components. If we get only one such component then that is the pin code box. If no such candidate component is obtained then we assume that there is no pin code box. If number of candidate components is two or more then we decide the best component for pin code as follows. We scan each column of a selected component from top and as soon as we get a black pixel we stop and note the row value of this point. Let ti is the row value of the ith column obtained during top scanning. Similarly, we scan each column of the selected component from bottom and as soon as we get a black pixel we stop and note the row value of this point. Let bi is the row value of the ith column obtained during scanning from bottom. We compute the absolute value of bi – ti, for all columns. The selected component for which the mode of these absolute values is equal to component width is noted and this component is chosen as pin code box component.

Fig.1: (a) An example of postal document image obtained from Inland letter. (b) Horizontal run-length smoothing of fig. 1(a). (c) Vertical run-length smoothing of fig. 1(a). (d) Logical AND of 1(a) and 1(b).

2.2

DAB detection

Using positional information of the text block we detect DAB from an envelope images. In the case of Indian postal documents, address on the envelope is generally written in the manner that DAB will be in the right lowermost part of the documents. Using this writing criterion we segment DAB from the postal documents.

After detection of the pin code box, vertical and horizontal lines are detected and deleted. If the pin code box contains more than seven vertical lines the last six images inside the pin code box are considered as pin code number. Extracted pin code box and pin code numerals from a pin box are shown in Fig.3.

3. Numeral Recognition (a)

After extraction of numerals from pin code box we proceed for their recognition.

(b)

Fig.2: (a) Smoothed version of Fig. 1(d). (b) Detected DAB part.

3.1

Feature selection and detection

Recognition results of a system mainly depend on the features to be used in the system. To take care of

2.4 Pin Code Box Detection 2

All the reservoirs obtained in this way are not considered for processing. The reservoir having highest height is considered for all the four directions (i.e. top, bottom, left and right). Some of the numerals show different behaviours in reservoir based features and these behaviours are used for their recognition purpose. The reservoir based features used in the recognition are reservoir height, reservoir width, reservoir area, flow direction etc. Dividing them by character height, width, and area respectively for the first three features are normalized.

variability involved in the handwriting, the features are chosen with the following consideration: (a) Independence of various writing styles of different individuals (b) Simplicity of detection (c) Independence of size (d) Independence of rotation etc. Various features used in the technique are described below: 3.1.1 Water reservoir principle based features The water reservoir principle is as follows. If water is poured from a side of a component, the cavity regions of the component where water will be stored are considered as reservoirs [7]. Now, we will discuss here some terms on water reservoir that will be used in feature extraction.

3.1.2 Loop (hole) Feature The loop (hole) feature mainly searches for loops in the character. By loop we mean the white region enclosed by black pixels. The maximum of height and width of the loop is obtained. If any of height or width is less than the stroke width (RL) the corresponding loop is ignored. The stroke width RL is the length of most frequently occurring horizontal black run in a component. In other words, RL is the statistical mode of the horizontal black run lengths of the component. For a component, RL is calculated as follows. A component is scanned row-wise (horizontally). Suppose the component has n different horizontal run of lengths r1, r2,…….rn with frequencies f1, f2 …..fn, respectively. Then value of RL will be ri if fi = max(fj), j = 1, 2,…..n. The loop is found by simply inverting the image (i.e. negative image) and applying component analysis on it. The area of a loop is the number of white pixels within the loop. The heights and areas of the largest loop and the second largest loop are considered as feature. Dividing loop height and loop area by char height and character area respectively, the loop features are normalized. If the number of loop in a numeral is one then we consider height and area of the other loop as zero. If there is no loop in a numeral, we consider the values of area and height of two loops are zero.

Top (bottom) reservoir: By top (bottom) reservoirs of a component we mean the reservoirs obtained when water is poured from top (bottom) of the component. A bottom reservoir of a component is visualized as a top reservoir when water will be poured from top after rotating the component by 180°. Left (right) reservoir: If water is poured from left (right) side of a component, the cavity regions of the component where water will be stored are considered as left (right) reservoirs. For illustrations, see Fig.4. Here top, bottom, left and right reservoirs are shown in four different numerals. Water reservoir area: By area of a reservoir we mean the area of the cavity region where water can be stored if water is poured from a particular side of the component. The number of pixels inside a reservoir is computed and this number is considered as the area of the reservoir. Water flow direction: The direction in which water overflows from a reservoir is called as water flow direction of the reservoir (See Fig.4). Reservoir base-line: A line passing through the deepest point of a reservoir and parallel to water flow direction level of the reservoir is called as reservoir base-line. Height of a reservoir: By height of a reservoir we mean the depth of water in the reservoir. Width of a reservoir: By width of a reservoir we mean the distance between two boundaries of a reservoir.

3.1.3. Profile based feature: Suppose each numeral is located within a rectangular boundary like a frame. The horizontal or vertical distances from any one side of the frame to the numeral edge are a group of parallel lines which we call the profile. Left and right profiles of two numerals are shown in Fig.5. If we compute left or right profile of the numerals, we can notice some distinct difference among the numerals. For example, some numerals have one transition while some other numerals have two or more transitions. By transition we mean change of the profiles from increasing mode to decreasing mode or vice-versa. In the right profile of the numeral shown in Fig.5, the profiles from A to B are in deceasing mode (decreases or remain constant), and from B to C the profiles are in increasing mode. Again from C to D are in deceasing mode. Thus, the right profile of this

Fig. 4: Reservoirs obtained from top, left, bottom and right side of the components are shown.

3

numeral has three transition points. The profile based features used in the recognition are number of transitions, and distance of the profiles from the sides of the numeral boundary. For a numeral these features are extracted from all four directions (i.e. left, right, top and bottom).

features from profiles of four sides of the image, 4 features from the ring and 36 features for the moments taking 9 from each of the four quadrants.

3.2 Numeral Recognition

Based on the above features we use neural network based scheme [10] for recognition of both English and Bangla numerals. Since pin code in Indian postal documents may be written either in English or in local state language, hence we propose a single scheme to take care numerals written in English and Bangla for West Bengal state. We now discuss briefly about different parts of neural networks used in the proposed scheme.

Fig.5: Left and right Fig.6: Ring feature is shown. profile features are shown. 3.1.4 Ring based feature To consider rotation invariant features we use ring features. Here, a numeral is divided into four concentric rings and the radii of the rings are in arithmetic progression. Numbers of black pixels (number of 1’s) are calculated for each ring and this are used as the features. Important characteristics of this feature is that the areas within the rings will not change if the numeral is rotated at any angle. For example, see Fig.6. We divide area of each ring by the total no of black pixels of the numeral to get normalized feature.

For pattern classification, the number of neurons in the input layer of an MLP is determined by the number of features selected for representing the relevant patterns in the feature space. The neurons in the input layer, act as sensory units. Neurons in hidden and output layers compute the sigmoidal function on the sum of the products of input values and weight values of the corresponding connections to each neuron. The number of neurons in the output layer of the MLP is determined by the number of possible pattern classes to be dealt with for some problem of interest. The class assigned to the output neuron producing the highest output value in output layer determines the class of the input pattern supplied to an MLP. The present work selects a 2-layer perceptron for the handwritten digit recognition. The number of neurons in input and output layers of the perceptron are set to 68 and 16 respectively as the number of features selected for the feature set is 68 and the number of possible classes in hand written numerals is ideally 16. Here though the number of total character was supposed to 20 we have used only 16 classes in the output layer of the MLP. This is because the English “zero” and the Bangla “sunya (zero)” are similar in shapes and we consider these two as a single class. Other three similar shapes classes are: English “two” and Bangla “dui (two)”; English “four” and Bangla “aath (eight)”; and English “nine” and Bangla “sat (seven)”. They are taken as same class in the time of classification. As we know that pin code in Indian postal documents consists of 6 numerals, it is rare that all these six numerals will belong to these four classes stated above. So depending on the others numbers classified by our recogniser we will decide the language in which pin code numbers is written. The number of hidden units, Back Propagation learning rate and acceleration factor is set to a value of 23, 0.45, 0.7, respectively, based on our past experience.

3.1.5 Moment based feature Moments and function of moments have been extensively employed as the invariant global features of an image in pattern recognition and image classification. Generally, these features are invariant under image translation, scale change and rotation. Here, the image is divided into four quadrants by partitioning it along X and Y-axis respectively. For each quadrant second order geometric moments are calculated as follows.

where (r, s) take on values (1,1), (1,2), (2,1), (2,2) and where (m, n) take the values (1,0), (0,1), (1,1), (2,0), (0,2), (2,1), (1,2), (2,2), (3,0). These points are chosen from the experiments. Here, B is the number of black pixels in the entire image, and W and L are width and the height of the image. f(i, j) represents the value of the image at (i, j). The moment-based features used in the recognition are moments calculated by the above formula for the image by taking above values for r and s respectively. Dividing them by maximum moments over the four quadrants we get normalized moments.

The reject capability is necessary to allow the human operator to decide about the class of an input pattern rejected by the perceptron. An input pattern is rejected if the highest activation level of output neurons lie below 0.5 or if the difference between the first and the second highest activation levels of output neurons is less than 0.1. The reject criterion introduced in the 2-layer perceptron helps

For the recognition we use 68 features. Out of these 68 features 16 features from reservoir taking four from each of the four sides, 4 features vectors from the loop, 8 4

us to improve the confidence on how well the data is classified.

References:

1. R. Plamondon and S. N. Srihari, “On-line and off-line handwritten recognition: A comprehensive survey”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol 22, pp 62-84, 2000. 2. Mahadevan U., and S. N. Srihari, “Parsing and Recognition of City, State, and ZIP Codes in Handwritten Addresses”, Proceedings Fifth International Conference on Document Analysis and Recognition (ICDAR99), pp. 325-328, 1999. 3. Wang X., and T. Tsutsumida, “A New Method of Character Line Extraction from Mixed-unformatted Document Image for Japanese Mail Address Recognition”, Proceedings Fifth International Conference on Document Analysis and Recognition (ICDAR99), Bangalore, India, September 1999, pp. 769-772. 4. Bartnik D., V. Govindaraju, S. N. Srihari and B. Phan, “Reply Card Mail Processing”, Proceedings International Conference on Pattern Recognition, Brisbane, Australia, August 1998, pp. 633-636. 5. Kim G., and V. Govindaraju, “Handwritten Phrase Recognition as Applied to Street Name Images”, Pattern Recognition, 31(1), 1998, pp. 41-51. 6. Srihari S. N., and E.J. Keubert, “Integration of Hand-Written Address Interpretation Technology into the United States Postal Service Remote Computer Reader System” , 4th International Conference on Document Analysis and Recognition (ICDAR' 97), pp. 892-896. 1997. 7. U. Pal, A. Belaid and Ch. Choisy, “Water Reservoir Based Approach for Touching Numeral Segmentation,” In Proc. Sixth Int. Conf. on Document Analysis and Recognition, pp 892-896, 2001. 8. B. B. Chaudhuri and U. Pal, “A complete printed Bangla OCR system”, Pattern Recognition, vol 31, pp 531-549, 1998. 9. F. M. Wahl, K. Y. Wong, R. G. Casey, “Block segmentation and text extraction in mixed text / image documents", Computer graphics and image processing, vol. 20, pp.375 390, 1982. 10. William E. Weidemen, Michael T. Manry, Hung-Chun Yau and Wei Gong, “Comparisions of a neural network and a nearest-neighbor classifier via the numeric handprint recognition problem”, IEEE Trans. Neural Networks, vol. 6, no 6, Nov 1995, pp. 1524-1530.

4. Experimental results

4.1 Result on stamp detection, DAB location and pin code extraction

The performance of the proposed system on postal stamp and seal detection, DAB location and pin code extraction are as follows. We have tested our system on 460 postal images and we noted that the accuracy for postal stamp and seal detection, DAB location, and pin code extraction modules are 95.98%, 98.55% and 97.64%, respectively. From the experiment, we noticed some errors in postal stamp/seal detection and DAB location appeared due to overlapping of postal stamp/seal on text portion of address part. Some errors also appeared due to poor quality of the images.

4.2 Result on numeral recognition

For the experiment of the proposed numeral recognition approach 7720 numeral data were collected. By randomly selecting 160 numerals of each digit class of both English and Bangla, a set of 3200 numerals is prepared for training of the proposed recognition system. A test set of 4520 numerals is prepared from the rest of the database. Overall accuracy of the proposed system obtained from the experiment on the above data set are given in Table.1. Table 1: Overall recognition accuracy on the Training and the Test set of data. Recognition MisRejection Data Set rate recognition rate rate Training 98.83% 0.00% 1.17% Test 87.24% 4.94% 7.82% From the experiment we noted that the most confusing numeral pair was Bangla three and six (shown in Fig.7 (a)). From the experiment we noted that about 9.3% cases they confused. Their similar shapes rank their confusion rate at the top position. Second confusion pair is English one and two (see Fig. 7 (b)) and their confusing rate is 4.6%. (a) (b) Fig.7: Examples of some confusing handwritten numeral pairs. (a) Bangla three and six (b) English one and two. Main advantage of the method is that the proposed method is independent of size and styles. Many existing systems normalize the size of the numeral before extracting features from it. We do not use any normalization on the image size here. 5