Thresholding of Badly Illuminated Document Images ... - CiteSeerX

20 downloads 0 Views 555KB Size Report
often contain a uniformly colored background, the global shading variation is estimated by using a two-dimensional. Savitzky-Golay filter that fits a least square ...
ACM DocEng2007

Thresholding of Badly Illuminated Document Images through Photometric Correction Shijian Lu and Chew Lim Tan Department of Computer Science, School of Computing National University of Singapore, Kent Ridge, 117543, Singapore

[email protected], [email protected] ABSTRACT This paper presents a document image thresholding technique that binarizes badly illuminated document images by the photometric correction. Based on the observation that illumination normally varies smoothly and document images often contain a uniformly colored background, the global shading variation is estimated by using a two-dimensional Savitzky-Golay filter that fits a least square polynomial surface to the luminance of a badly illuminated document image. With the knowledge of the global shading variation, shading degradation is then corrected through a compensation process that produces an image with roughly uniform illumination. Badly illuminated document images are accordingly binarized through the global thresholding of the compensated ones. Experiments show that the proposed thresholding technique is fast, robust, and efficient for the binarization of badly illuminated document images.

Categories and Subject Descriptors I.4.6 [Image Processing and Computer Vision]: Segmentation—pixel classification; I.7.5 [Document and Text Processing]: Document Capture—document analysis, optical character recognition; I.4.8 [Image Processing and Computer Vision]: Scene Analysis—photometry, surface fitting

General Terms Algorithm, Design, Experimentation

Keywords Document image analysis, document image thresholding, badly-illuminated document images

1.

INTRODUCTION

Document image thresholding aims to divide a document image into two classes, namely, the foreground text and the

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DocEng’07, August 28–31, 2007, Winnipeg, Manitoba, Canada. Copyright 2007 ACM 978-1-59593-776-6/07/0008 ...$5.00.

Figure 1: Four badly illuminated sample documents including two text documents in (a) and (d), a map document in (b) and a engineering drawing in (c) (the white line in the document in Figure 1(a) is manually inserted for the subsequent discussions).

blank background. Though it is often implemented in the preprocessing stage, its performance may be crucial to the success of the ensuing document management tasks such as OCR (optical character recognition), document image summarization, and document image retrieval. With the proliferation of digital libraries, an increasing number of document images of different characteristics are being produced and archived. The fast and efficient thresholding of these archived document images is very important to facilitate the document digitalization process. On the other hand, digitalized document images often suffer from different types of shading degradation, which is typically introduced in two ways. The first is through the document reproduction such as the document scanning and document photocopying processes. For example, the spine region of scanned document image shown in Figure 1(a) is often degraded by shading because the thick bound volumes

ACM DocEng2007 cannot be flattened over the scanner glass. The second is caused by the lighting variation within a natural environment. For example, while capturing the map and the engineering drawing shown in Figures 1(b) and 1(c) by using a digital camera, shading cannot be avoided in most cases due to the uncontrolled lighting variation. For badly-illuminated document images, global thresholding cannot work well and adaptive thresholding technique is normally required. A large number of adaptive thresholding techniques have been reported in the literature [1, 6] and most of them are window-based, which calculates a local threshold for each image pixel by using the intensity of pixels within a small neighborhood window. As a typical representative, Niblack [4] proposes to estimate the local threshold by using the local mean m and the standard variation s, computed within a small neighborhood window of each image pixel as follows: T =m+k·s

(1)

where k is a user defined parameter and it normally lies between -1 and 0. Though Niblack’s method is able to threshold badly-illuminated document images, a few limitations exist. Firstly, it is very slow because it requires the calculation of intensity mean and variance for each image pixel. Secondly, its performance is closely related to the parameter k in Equation (1). Different choices of the parameter k will produce different binarization results. Thirdly, its performance is closely related to the size of the neighborhood window as well [8]. If the window is too small, the thresholding may produce a large amount of background noise. Otherwise, the computation cost increases dramatically. To suppress the background noise, Sauvola and Pietikainen [3] later modify Niblack’s formula in Equation (1) and propose a new formula as follows: T = m · (1 + k · (

s − 1)) R

(2)

where parameter R refers to the dynamic range of the standard deviation and k is a positive number between 0 and 1. The new thresholding formula in Equation (2) reduces the background noise significantly. However, it requires the knowledge of the document contrast to set the parameter R properly. For document images of low contrast such as the one in Figure 1(d), the value of the parameter R (128) recommended in [3] may classify all image pixels to the background erroneously. Similarly, the adaptive thresholding technique proposed by Gatos etc [9] is sensitive to the variation in the document contrast as well because it uses Sauvola’s method for the initial foreground estimation. In this paper, we propose a fast and robust document image thresholding technique that is capable of segmenting text from badly-illuminated document images. The proposed technique is based on two observations of most typical document images. Firstly, compared with most natural scene images, documents including text documents, maps, and engineering drawings normally contain a large proportion of blank background. Besides, the background of most typical documents is of the same color and brighter than the foreground text in most cases. Therefore, the luminance variation within the background of badly-illuminated document images can be assumed to be solely caused by the illumination variation. Secondly, the environmental illumination normally varies smoothly in the real scene. At the

Figure 2: (a) The first round of polynomial surface Pf that is fitted by using all pixel intensity of the document in Figure 1(a); (b) the intensity of the document pixels and the shape of the Pf along the white scan line highlighted in Figure 1(a). same time, text documents such as the paper lying over tables and pages bound with thick volumes normally lie over a planar or smoothly curved surface. As a result, shading can be approximated to vary smoothly along the surface of badly illuminated document images. Based on the two observations, we proposed to threshold badly illuminated documents through the estimation and compensation of the shading variation. More specifically, we estimate the shading variation by using a two-dimensional Savitzky-Golay filter, which fits a least square polynomial surface to the intensity of pixels of badly illuminated document images. In [10], Savitzky-Golay filter has been utilized for the thresholding of the fingerprints where two local windows including a background window and a focus window are utilized for the estimation of local thresholds. We instead globalize the Savitzky-Golay filter and binarize badly-illuminated document images through two rounds of surface fitting process. In particular, the first round detects the background roughly and the second then estimates the global shading variation by using the detected background pixels. With the estimated shading variation, shading degradation is then corrected through a compensation process, which results in a roughly uniformly illuminated document image that can be easily binarized by using some global thresholding technique Otsu’s [2]. Compared with the reported adaptive thresholding techniques, the proposed technique has a few advantages. Firstly, it is much faster than most reported adaptive techniques, which require the estimation of a local threshold for each image pixel. Secondly, it is much more robust because it is not window-based and so is not sensitive to the size of the neighborhood window. At the same time, it is tolerant to the variation in the document contrast because the shading-compensated document images still hold a bimodal histogram, though the two histogram peaks may be close to each other. Thirdly, it is more efficient. It requires no tuning of different parameters such as the window size and the parameter k in Equations (1) and (2). At the same time, it produces little background noise and so does not requires those complicated post-processing [11].

2.

PROPOSED METHOD

This section describes the proposed document image thresholding technique. In particular, we will divide this section into four subsections, which deal with the description of

ACM DocEng2007 the two-dimensional Savitzk-Golay filter, the estimation of the global shading variation, the compensation of the global shading variation, and the implementations, respectively.

2.1

Two-Dimensional Savitzky-Golay Filter

Savitzky-Golay filter [5], also named as the least square filter, is originally one-dimensional and designed to estimate the variation of the signals degraded by various types of noise. The fundamental idea is to fit a least square polynomial to the data surrounding each data point. The smoothed data are determined as the values of the fitted polynomial at the studied data points. Similar to the one-dimensional Savitzky-Golay filter, twodimensional Savitzky-Golay filters fits a least square polynomial surface to the two-dimensional data points such as images and can be used for data smoothing and noise removal as well. For example, a least square polynomial surface of degree d can be represented as follows: f (x, y) =

d X

ai,j xi y j

Figure 3: (a) The second round polynomial surface Ps that is fitted by using the background pixels detected by the first round polynomial surface Pf ; (b) background pixels and the shape of the Ps along the white scan line highlighted in the document in Figure 1(a).

(3)

i+j=0

where ai,j , i + j = 0 · · · d gives the coefficients of the polynomial surface, which can be estimated as follows: A = (X T · X)−1 · X T · I

(4)

where I refers to the intensity of image pixels and the matrix X is constructed as follows: X=

1 x0 1 x1 .. .. . . 1 xn

y0 y1 .. . yn

x20 x21 .. . x2n

x0 y0 x1 y1 .. . xn yn

··· ··· .. . ···

y03 y13 .. . yn3



where n refers to the number of image pixels within the studied document image. The original Savitzky-Golay filter is a local operator, which means that for each data point, it fits a smoothing polynomial using the data within a small neighborhood window. The smoothed data are then estimated as the value of the fitted polynomial at the studied data point. In this paper, we globalize the local Savitzky-Golay filter and fit the polynomial surface by using the intensity of all pixels within badly illuminated document images. The target is to estimate the global shading variation by using the fitted polynomial surface, which will then be used for the photometric correction as described in the following sections.

2.2

Global Shading Variation Estimation

Based on the two observations described in Section 1, we treat the foreground text as noise and estimate the shading variation through two rounds of Savitzky-Golay filtering process. In particular, the first round aim to detect the blank background roughly and the second further estimates the global shading variation through fitting the polynomial surface to the intensity of the background pixels detected in the first round fitting. For the document image in Figure 1(a), Figure 2(a) shows the first round polynomial surface Pf of order three that is fitted by using the intensity of all image pixels. The two curves in Figure 2(b) show the intensity of document pixels and the shape of the Pf along the white scan line in the document image in Figure 1(a). As Figure 2(b) shows,

Figure 4: (a) The compensated pixel intensity along the white scan line in the document image in Figure 2(a), which is calculated by Equation (6) (b) the compensated document image after the transformation in Equation (7). the polynomial surface Pf is normally lower than the background intensity because the foreground text (normally in dark color) pulls the it down. However, the surface shape still traces the variation of pixel intensity nicely. The background can therefore be roughly detected based on the first round polynomial surface Pf by using the threshold estimated as follows: T =

1 N

PW ×H i=1

Pf (xi , yi ) − G(xi , yi ) (5)

∀ G(xi , yi ) < Pf (xi , yi ) where W and H refer to the width and height of the document image under study. G(xi , yi ) and Pf (xi , yi ) denote the intensity of the studied document image and the value of the first round polynomial surface Pf at (xi , yi ), respectively. N refers to the number of image pixels that satisfy the condition G(xi , yi ) < Pf (xi , yi ). Therefore, if the pixel intensity G(x, y) is not smaller than the Pf (x, y) by T , the corresponding pixel is classified to the background. The second round of smoothing further estimates the global shading variation through fitting a polynomial surface Ps to the background pixels detected in the first round. Figure 3(a) shows the fitted second round polynomial surface Ps of order three. For the white scan line in Figure 1(a), the two curves in Figure 3(b) shows the intensity of document pixels and the shape of the Ps along that white scan line shown in

ACM DocEng2007

Figure 5: (a) A document patch embedded with a graphic component that contains a large solid dark block; (b) the intensity of the document pixels as well as the estimated Pf along the white scan line highlighted in the document in Figure 5(a). Figure 1(a). As Figure 3(b) shows, the second round polynomial surface Ps traces the global shading variation much more accurately. Shading degradation can accordingly be compensated by using the estimated global shading variation, which will be described in the next subsection.

2.3

Global Shading Variation Compensation

Based on the global shading variation estimated in the last subsection, shading degradation can be corrected through a compensation process as follows: 8
2Ps (x, y)) and even lower than zero. Under such circumstance, we restrict the N (x, y) at 1 and -1, respectively. Therefore, the normalized pixel intensity N (x, y) in Equation (6) lies between -1 and 1. For the horizontal scan line in the document image in Figure 1(a), Figure 4(a) shows the pixel intensity normalized by using Equation (6). As Figure 4(a) shows, the shading degradation has been roughly corrected compared with the original pixel intensity in Figure 2(b). The compensated image can therefore be created by a transformation specified as follows: 

G(x, y) = K · 1 + N (x, y)

(7)

where N (x, y) refers to the normalized pixel intensity by using Equation (6). The scale factor K is set at 127.5 and the transformation therefore converts the pixel intensity from [−1 1] to [0 255] accordingly. The luminance of the background within the compensated document images depends on the parameter K in Equation (7). If we set the parameter K at 127.5, the intensity of

Figure 6: Thresholding results of the badly illuminated document image in Figure 1(b) by using Otsu’s method in (a), Niblack’s method in (b), Sauvola’s method in (c), and our proposed document thresholding techniques in (d).

background pixel will lie around 127.5 and the intensity of foreground text will lie close to zero. For the badly illuminated image in Figure 1(a), Figure 4(b) shows the the image that is compensated by using Equations (6-7). As we can see, the shading degradation has been properly corrected. The badly illuminated image in Figure 1(a) can be finally binarized through the global thresholding of the compensated document image in Figure 4(b).

2.4

Implementations

Two issues need to be considered for the implementation of the proposed document thresholding technique. The first is related to the order of the polynomial surface. As described in Section 2.2, we use a least square polynomial surface of order three for the global shading variation estimation. Experiments with 51 badly illuminated images (which will be described in the subsequent experiment section) shows that the polynomial surface of order three can trace the global shading variation properly in most cases. For document images with a more complex shading variation, the polynomial surface of a higher order or even a B-spline surface may be required for the global shading variation estimation. The second issue is related to the round number of the background detection process. For most typical documents that contain a large proportion of uniformly colored background such as the four document images in Figure 1, background can be detected roughly by the first round of polynomial surface Pf . But for documents that contain large solid dark blocks such as embedded graphics shown in Figure 5(a), the background cannot be detected by the Pf properly as dark blocks may pull the Pf close and even lower than the pixel intensity in the block interior illustrated in Figure 5(b). To threshold the embedded graphics properly, iterative fitting of the polynomial surface may be required to detect

ACM DocEng2007 Table 1: Execution time and the character segmentation rate of the proposed document thresholding technique and the other three comparison methods.

Otsu’s method Niblack’s method Sauvola’s method Our method

Figure 7: Thresholding results of the badly illuminated document image in Figure 1(c) by using Otsu’s method in (a), Niblack’s method in (b), Sauvola’s method in (c), and our proposed document thresholding techniques in (d). the background properly. After each round of surface fitting, the background pixels are detected and updated according to the threshold T estimated by using the Pf in Equation (5). The iterative detection process terminates automatically when there exist no image pixels whose intensity is lower than the just fitted polynomial surface by the T . For most documents, the iterative detection process terminates after three to four rounds of fitting process.

3.

EXPERIMENTS AND DISCUSSIONS

The proposed method has been evaluated by using 51 badly illuminated document images including 32 text images, 11 map images, and 8 engineering drawings. As a reference, we compare our proposed document image thresholding technique with Otsu’ global thresholding method and Sauvola and Niblack’s adaptive thresholding techniques, the last of which outperforms others greatly in term of thresholding speed and thresholding efficiency as evaluated in [6]. The proposed document thresholding technique is fast. As Table 1 shows, it takes just 0.89 second on the average for the binarization of the 51 test documents (all are binarized by two rounds of fitting). Though it is slower than Otsu’s global technique, it is much faster than Niblack’s and Sauvola’s adaptive techniques, which take 14.72 and 14.77 seconds on the average. Besides, Niblack’s method may require more time if the post-processing is taken into consideration. The speed advantage of the proposed technique can be explained

Execution time 0.21 0.89 14.72 14.77

Segmentation rate 62.29% 88.37% 83.46% 94.23%

by the fact that it binarizes badly illuminated documents through the photometric correction instead of the estimation of local threshold for each document pixel. Besides, the speed of the proposed technique can be further improved by fitting the polynomial surface to some regularly sampled instead of all document pixels. Figures 6-8 show the thresholding results of the three document images in Figure 1(b-d), which are picked from the 51 test document images described above. As Figures (6-8)a show, Otsu’s global thresholding technique cannot binarize badly illuminated document image properly because it performs the document binarization by using a global threshold. In most cases, it erroneously classifies heavily shaded background to the foreground. As a result, the foreground text in those heavily shaded regions cannot be segmented from the background properly. Our experiments over the 32 text documents show that the character segmentation rate of Otsu’s method just reaches 62.29%. Figures 6-8(b-c) shows the thresholding results by using Niblack’s and Sauvola’s adaptive thresholding methods. In particular, the parameters k in Equations (1-2) are set at -0.2 and 0.2, respectively. The window size of both methods is set at 20. The parameter R in Equation (2) is set at 128 as recommended in [3]. As Figures 6-8(b) show, besides its low speed, Niblack’s method produces a large amount of noise in the background area. As an improvement, Sauvola’s method suppresses the background noise greatly as illustrated by the documents in in Figures (6-7)c. However, for document images of lower contrast such as the one in Figure 1(d), Sauvola’s method classifies all foreground text pixels to the background incorrectly shown in Figure 8(c). Our experiments over the 32 text documents show that the character segmentation rates of the two methods reach 88.62% and 79.81%, respectively. The proposed document thresholding technique obviously outperforms the three comparison methods as illustrated by the three documents in Figures 6-8(d). As Figures 6-8(d) show, the proposed technique produces little background noise and is tolerant to the variation in the document contrast at the same time. The tolerance to the variation in the document contrast can be explained by the fact that the compensated document images normally produce a perfect bimodal histogram, though the two histogram peaks may be apart by different distances, depending on the magnitude of the document contrast. For the 32 text document tested, the character segmentation rate of the proposed technique reaches up to 94.17% on the average. Besides, the proposed technique can also find applications for some other tasks such as the edge detection. For badly illuminated document images, the direct application of edge

ACM DocEng2007

Figure 9: (a) Canny’s edge detector over the document in Figure 1(a); (b) Canny’s edge detector on the document in Figure 4(b).

5.

ACKNOWLEDGE

This research is supported by the Agency for Science, Technology and Research (A*STAR), Singapore, under grant no. 0421010085.

6.

Figure 8: Thresholding results of the badly illuminated document image in Figure 1(d) by using Otsu’s method in (a), Niblack’s method in (b), Sauvola’s method in (c), and our proposed document thresholding techniques in (d).

detector such as Canny’s [7] cannot locate edges properly within the heavily shaded document regions illustrated in Figure 9(a). However, edges can be located much better from the document images compensated by using our proposed technique as illustrated in Figure 9(b).

4.

CONCLUSION

This paper reports a document thresholding technique that binarizes badly illuminated document images through the photometric correction. A two-dimensional SavitzkyGolay filter is used, which estimates the global shading variation through fitting a least square polynomial surface to the intensity of document pixels. Shading compensation is then implemented, which produces a roughly uniformly illuminated document image that can be binarized by some global thresholding technique. Experiments show the superior performance of the proposed technique. Though the proposed document thresholding technique outperforms others greatly, its applications may be limited by the two observations, namely, the illumination and the resulting shading vary smoothly along the document surface and images contain some quantity of uniformly colored background. Fortunately, both assumptions can be satisfied by most typical documents including text documents, maps, and engineering drawings. We will further study the thresholding of the documents with local abrupt shading variations or complex background in our future work.

REFERENCES

[1] M. Sezgin and B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, Journal of Electronic Imaging, vol. 13, no. 1, pages:146–165, 2004. [2] N. Otsu, A Threshold Selection Method from Graylevel Histogram, IEEE Transactions on System, Man, Cybernetics, vol. 19, no. 1, pages:62–66, 1978. [3] J. Sauvola and M. Pietikainen, Adaptive document image binarization, Pattern Recognition, vol. 33, no. 2, pages:225–236, 2000. [4] W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, Englewood Cliffs, New Jersey, 1986. [5] A. Savitzky and M. J. E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures, Analytical Chemistry, vol. 36, pages:1627–1639, 1964. [6] O. D. Trier and T. Taxt, Evaluation of Binarization Methods for Document Images, IEEE Transaction on Pattern Analysis and MachineIntelligence, vol. 17, no. 3, pages:312–315, 1995. [7] J. Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pages:679–698, 1986. [8] I. J. Kim, Multi-Window Binarization of Camera Image for Document Recognition, Ninth International Workshop on Frontiers in Handwriting Recognition, pages: 323–327, 2004. [9] B. Gatos and I. Pratikakis, S. J. Perantonis, Adaptive degraded document image binarization, Pattern Recognition, vol. 39, no. 3, pages: 317–327, 2006. [10] M. Krzysztof; M. Preda; M. Axel, Dynamic threshold using polynomial surface regression with application to the binarisation of fingerprints, Proceedings of SPIE, vol. 5779, pages. 94–104, 2005. [11] S. D. Yanowitz and A. M. Bruckstein, A new method for image segmentation, Computer Vision, Graphics and Image Processing, vol. 46, no. 1, pp. 82-95, 1989.