A Scanner-Model-Based Approach to Bitmap

0 downloads 0 Views 676KB Size Report
Nov 19, 1999 - model in both the clustering and super-resolution steps. ... available, it may be more cost-effective to perform high-speed lower quality scans.
A Scanner-Model-Based Approach to Bitmap Clustering and Document Image Improvement Marshall Bern 

David Goldberg y

November 19, 1999

Abstract We describe a method for improving scanned or faxed document images. As in the previous work of Hobby and Ho, our approach clusters together instances of the same symbol from all over the page and then computes a super-resolved representative for each cluster. Unlike the previous work, our method assumes a probabilistic model of the scanning process, and uses this model in both the clustering and super-resolution steps. This model-based approach also allows Bayesian prior distributions and principled reversal of scanner distortions such as gain.

1

Introduction

We consider the problem of improving the appearance of scanned or faxed documents. This problem is a special case of image restoration in which the degraded image is bi-level and the original is known to be a text document. We envision several applications for our document improvement method.

  

Although faxes are usually readable, they are often difficult or unpleasant to read. Our method can make noticeable improvements to the appearance of a fax. Our method enables further processing of documents such as deskewing or optical character recognition (OCR). Current OCR has a high error rate on fax images, and hence a number of researchers have worked on fax image improvement for this purpose [1, 5, 12, 18]. Finally, our method may also be useful for archival documents. If originals are no longer available, our method could improve the appearance of the existing scans. Even if the originals are available, it may be more cost-effective to perform high-speed lower quality scans and subseqently improve the image quality in software.

Our document improvement method appears to be the most effective such method to date; moreover, some of its constituent algorithms could be useful in tokenizing compressors such as DigiPaper [15] or DjVu [11]. We shall point out these byproducts as they arise. Our method clusters connected components of black pixels (roughly letters) from across the page, and computes a “most likely” representative for each cluster, with likelihood determined by a probabilistic model of the scanning process. The representatives are bitmaps themselves, but at  Xerox Palo Alto Research Center, 3333 Coyote Hill Rd., Palo Alto, CA 94304. [email protected] y Xerox Palo Alto Research Center, 3333 Coyote Hill Rd., Palo Alto, CA 94304. [email protected]

1

higher resolution. An a priori (prior) probability distribution on bitmaps can influence the choice of the most likely representative; we describe a prior distribution based on so-called chain codes. Our method then uses the representatives to recluster connected components, and finally assembles the output page by replacing each member of a cluster by the cluster’s representative.

2

Related Work

Hobby and Ho [8] at Lucent Technologies previously used the same basic strategy: cluster bitmaps, compute representatives, recluster, and then assemble the output page. In detail, however, our method differs at each step. 1. For the initial clustering, Hobby and Ho use a feature-based approach. We use DigiPaper’s “Hausdorff matching” algorithm [13, 14], which is apparently more reliable. 2. To compute cluster representatives, Hobby and Ho use a method due to Hobby and Baird [7]: align the scans by centroids of black pixels, sum the scans to give a histogram, smooth the histogram to give a gray-level representative, and finally determine a polygonal outline that stays within a certain gray “tube” yet has a minimum number of inflection points. We use a hill-climbing optimization procedure to approximate the most probable double-resolution representative. The method of Hobby et al. has certain advantages: it is faster, outlines can be rasterized at any resolution, and final characters are inherently smooth since they minimize numbers of inflection points. Our model-based method has the advantage that it can rigorously incorporate Bayesian priors and learned or guessed scanner distortion parameters. Viewing the results in Figure 1, we see that our method shows slightly more accurate reconstruction of sharp features such as serifs, and more reliable determination of overall blackness of characters. 3. Hobby and Ho correct their initial clustering by comparing bitmaps against representative outlines, using an algorithm close to DigiPaper’s Hausdorff matching algorithm. We combine— but never split—our conservative initial clusters by computing the probability that a given original gave rise to a set of scans. Our reclustering is very successful, often reducing the number of clusters by 30% without introducing any new errors. 4. Finally, Hobby and Ho appear to use the alignment computed in step 2 to reassemble the output document. (The ragged baselines, however, in Figures 1(b) and 9 are largely due to deskewing before super-resolving.) We again appeal to our scanner model, placing our representatives in their likeliest positions. Our method thus produces super-resolved placement as well, an important appearance enhancement. The work of Hobby et al. is the only directly comparable work, but we have lifted ideas from a number of other sources. Our scanner model is a variant of the model proposed by Baird and others [2, 3]. Cheeseman et al. [4] have used Bayesian priors defined by neighborhood statistics for super-resolution of gray-level photographic imagery. Thouin and Chang [17] devised a “bimodal smoothness score” (sort of a Bayesian prior) for low-resolution gray-level text images, such as video images. Chris Dance of XRCE (personal communication) suggested the use of chain codes [16] to define a priori distributions.

2

Figure 1. (a) Detail of a “fine” (about 200 dpi) fax. (b) Hobby-Ho improvement to 600 dpi. (Courtesy John Hobby) (c) Bern-Goldberg improvement to 400 dpi. Notice that (b) is smoother, but (c) has more accurate serifs and character weights. The c in (b)’s Church is an example of a misclustering.

3

Scanner Model

We model a scanner as a rectangular lattice (a grid) of point sensors that sample an original image. The outputs of the sensors are black and white pixels. In theory, the original image may be anything, but we focus on the case that the original image is itself a raster of pixels at twice the resolution of the output image. We model a sensor as follows. Each sensor sees a roughly disk-shaped region of the original image and then outputs a black pixel with probability dependent upon the total weight of black in its region. The coefficients for pixels within the disk-shaped region define the sensor’s point spread function and the sigmoidal probability curve defines its response function. In practice, we have found that the point spread function shown in Figure 2(a) works fairly well, although its diameter is much smaller than empirically observed diameters. We made the arbitrary choice that sensors are positioned at corners rather than centers of the input pixels, and hence the point spread function has four center coefficients. (Center-centered sensors would allow a more sharply peaked point spread function, which might be better.) The sum of the coefficients is normalized to one. The response function can be varied to model different scanner threshold settings. A sigmoid symmetric around .5 as shown in Figure 2(b) implies no scanner gain: the expected amount of output black equals the amount of input black. A sigmoid ramping up sharply between .2 and .6 would model a scanner with some gain, and the improved document image will be lighter than the input, as shown in Figure 3(b). The minimum and maximum probability values of :0001 and :9999 in the response function model “salt and pepper” noise. If sensors’ optical characteristics are known or can be inferred, then the point spread and response functions can be set specifically for a given scanned image. Alternatively, the document image improvement system can include a user-controlled “gain knob”.

4

Improvement Algorithm

In this section we give the details for each step of our algorithm.

4.1

Initial Clustering

Our initial clustering is DigiPaper’s clustering [13, 14]. All settings have default values, except for fracHausd which we set slightly more conservatively, to .975 rather than .970. For completeness, we review DigiPaper’s Hausdorff matching method. A connected component is a maximal set of

3

1.0

.9999

.9

Probability of outputting black

.088 .235

.1

.059 .118

.0001

.3

.7

1.0

Weight of black in disk

Figure 2. (a) The default point spread function of a sensor. (b) The probability that a sensor’s output pixel is black is a sigmoidal function of the weight of input black.

Figure 3. (a) Detail of a flatbed scan input. (b) Lightened. (c) Darkened.

black pixels in the initial binary raster, such that each black pixel is connected to each other by a path of adjacent black pixels. Diagonal adjacency counts, and thus each pixel has 8 neighbors. Let A and B be connected components, aligned by centers of their bounding boxes. Let jAj denote the number of black pixels in A, and A \ B denote the pixels that are black in both A and  denote a one-pixel dilation of the black pixels in A, that is, the component that has a black B . Let A pixel wherever A has either a black pixel or a white pixel orthogonally bordering a black pixel. (DigiPaper actually uses a topology-preserving dilation, which refuses to blacken a pixel if it would join two connected components in its 8-neighborhood.) Let @A denote the boundary of A, that is, the set of black pixels with white neighbors. Connected components A and B match each other if

jAj jA \ B j  f (j@Aj) and jBj jB \ Aj  f (j@Bj); where f (n) equals 0 for n  3, and :025n for n  7 (that is, 1 fracHausd ), and interpolates

between these two lines for 3 < n < 7. In other words, the number of pixels of A lying outside  must be very small, and vice versa. An additional test stops a match if either A n B or B n A B includes a set of more than three black pixels that can be enclosed by a 3  3 box. DigiPaper uses Hausdorff matching to form clusters. Initially each connected component is in a cluster of its own and is that cluster’s representative. DigiPaper then combines clusters whose representatives match. As cluster membership changes, either by combining clusters or by dropping members that no longer match the representative, cluster representatives are recomputed by thresholding aligned histograms, with the threshold set to preserve median blackness.

4

4.2

Optimal Representatives

Before we describe optimal representatives, we must first explain how to compute the probability that a given connected component A is a scan of a given original image B . Let  represent a translation of the scanner’s sensor grid with respect to B . Let wij ( ) denote the weight of black in B seen by the sensor in row i and column j . Using the point spread and response functions given in Figure 2, we can compute the probability p(wij ( )) that the sensor’s output pixel would be black. We can now write the probability that the pixel in row i and column j has value Aij (black or white), given B and  :  p(w ( )) if Aij is black; ij P [Aij j B;  ] = (1) 1 p(wij ( )) if Aij is white. We assume that sensors act independently. This assumption does not mean that the value of a pixel gives no information about the value of a neighboring pixel (not true since sensor disks overlap!), only that the randomization of the response function is independent from sensor to sensor. Even so, this assumption is only an approximation, as fax images show some correlation both horizontally (the pixels scanned at the same time by different sensors) and vertically (the pixels scanned at different times by the same sensors). With our assumption, we can simply multiply the individual pixel probabilities to give the probability P [A j B;  ] that A is a scan of B at translation  .

j

P [A B;  ] =

Y

j

P [Aij B;  ]:

ij

(2)

Conceptually, A and B are each padded with white pixels and indices i and j run over all positions in the union of the bounding boxes of A and B . Equations (1) and (2) above assume a specific translation  . Since  is unknown, we optimize over all possible translations:

j

j

P [A B ] = max P [A B;  ]: 

(3)

In general, equation (3) may involve a difficult continuous optimization problem, but we limit the search to a discrete set of translations, namely translations within B ’s double-resolution pixel lattice. We have found that if A and B have been pre-aligned by the centroids of their bounding boxes as in DigiPaper, then we can limit the search to the 9 shortest vectors in this lattice, that is, a shift of 1, 0, or 1 in each of x- and y -coordinates. Finally, we can compute the probability of an entire cluster of bitmaps C by multiplying the probabilities of each individual bitmap. (Since probabilities become very small, for example 10 2000 for a large cluster, the program actually adds logarithms.)

C j B]

P[

=

Y

2C

A

j

P [A B ]:

(4)

Now the optimal representative B for a cluster C is the one that maximizes P [C j B ]. To see this, we can write P [B ] P [B j C ] = P [C j B ]  (5) P [C ] using Bayes’ rule. Probability P [B ] is the a priori probability of representative B , which for now we shall assume to be the same for all B ’s, and P [C ] is the a priori probability of the cluster C , which is constant.

5

How do we find the B that maximizes P [C j B ]? We use a simple-minded hill-climbing approach. The initial representative B 0 is simply DigiPaper’s cluster representative with each pixel split into four double-resolution pixels. For each scan A in the cluster, we find the translation  that maximizes P [A j B 0;  ] by searching the 9 shortest vectors as above. Next we calculate and record P [C j B 0 ]. We sum the translated scans to form a double-resolution histogram, which we use to guide our search for the optimal B . We flip pixels in B 0 (that is, change white to black or vice versa) based on this histogram. To compute the next representative B 1 , we take only the most clearly indicated flips, specifically the white pixels where more than 60% of the scans have black and the black pixels where fewer than 40% of the scans have white. We realign scans with respect to B 1 , calculate and record P [C j B 1 ], and update the histogram. For B 2 we are a little more aggressive: flipping white pixels over 55% and black pixels under 45%. We again repeat the alignment and updating cycle. For B 3 and subsequent representatives, we flip pixels according to the expected number in the corresponding histogram bin, rather than by fixed percentages. If the observed number exceeds the number predicted by the scanner model by more than a certain percentage we flip white to black, and vice versa. We halt the cycle either when no pixels flip or after a fixed number of cycles (default four). The optimal representative is the B i with maximum P [C j B i ]. We arrived at this ad hoc optimization heuristic after a certain amount of experimentation. The reason to start with conservative flips and then gradually become more aggressive is that flipping a pixel tends to inhibit its neighbors from flipping; hence we only want “locally most flippable” pixels to qualify in the early rounds. On the other hand, a more sequential approach such as flipping pixels one at a time starting from the “most flippable” would be unacceptably slow. As a further speedup, we do not compute new alignments after B 2 or subsequent rounds. Typically, there is a lot of flipping from B 0 to B 1 , and only a little bit of fine tuning—which rarely changes the alignments—in subsequent rounds. See Figure 4.

4.3

Bayesian Priors

The cluster representatives computed by our optimization heuristic have some flaws. The representative for a large cluster (say at least five scans) looks noticeably better than a representive for a small cluster (two or three scans). Notice the difference between i, n, and o (large clusters) and gh and th (small clusters) in Figure 4(c). Worst of all, the representative for a singleton (one-member) cluster is identical to the scan, and hence remains at single-resolution. In addition, an optimal representative may be a bit “furry”’ where a vertical edge lies halfway between two verticals of the double-resolution pixel grid. (Unfortunately, Figure 4 does not have any examples of singletons or fur!) A principled approach to solving these problems is to define Bayesian prior probability distributions on representatives and incorporate them into the overall optimization using equation (5). We used chain codes to define prior distributions. A chain code is a string of letters N, E, S, W (for north, east, south, and west), representing the directions of boundary edges around a representative. Edges are oriented so that black is on the left, meaning counterclockwise around the outer boundary and clockwise around holes. We compiled transition probabilities for all chain codes of length five, meaning the relative frequencies of the next letter after each possible string of length five. Since boundary edges cannot double back on themselves, there are always three possible choices (corresponding to straight, turn left, and turn right) for each edge after the first, and hence a total of 4  35 = 972 transition probabilities in the table. We used Turing’s rule for assigning probabilities to transitions that never occurred, that is, we assumed that all non-occurring transitions had the same probability and that altogether they had the same total probability as the once-occurring transitions. 6

Figure 4. (a) Detail of a fax input. (b) After one round of flipping pixels to optimize cluster representatives. (c) After four rounds of flipping. Notice that large clusters such as i, n, and o have improved more than small clusters such as gh and th.

Figure 5. (a) After four rounds of flipping including priors. The top of the t in te is less hooked than in Figure 4(c). (b) After reclustering, m, M, c, te and th have improved. (c) After breaking run-together letters, gh has improved.

The a priori probability P [B i ] of a given representative B i is defined to be the product of the transition probabilities around (all connected components of) the boundary of the representative. This prior distribution penalizes furry representatives and rewards straight and smoothly curving representatives. The optimal representative is now defined to be the B i with maximum P [C j B i ]  P [B i ]. In an effort to guide the optimization heuristic toward representatives with better prior probabilities, the pixels on either side of an unlikely turn are marked as especially flippable, meaning that they can be flipped even if the histogram argues against it. The prior probability P [B i ] is much smaller than P [C j B i ] for large clusters (say 10 100 compared to 10 2000), and hence rarely affects the choice of representative. For clusters with only two or three members, however, the prior probability has an approximately equal voice in the outcome. Notice that the top of the t in te has improved slightly (less hooked) from Figure 4(c) to Figure 5(a). For singletons, the prior probability acts as a mild smoothing operation which improves straight strokes and staircasing along diagonals without rounding serifs. We tried two different choices for the training set on which to compile the transition probabilities: the statistics from a clean postscript master, and—in a sort of bootstrapping approach—the statistics from the representatives for the large clusters (at least ten members) on the scanned document itself. We could not discern any significant differences between the two choices, even when the postscript master was the clean version of the scanned document.

7

4.4

Reclustering

We now describe how we combine clusters using our scanner model and optimal representatives. We process clusters by decreasing order of their numbers of members. For each cluster, before we even start to compute its representative, we attempt to merge it with some larger cluster. If we can combine cluster i with some large cluster j (here large means more than three members), then we simply let the representative Bj for cluster j also serve as the representative for cluster i. If the larger cluster j is itself small (no more than three members), however, then we recompute the combined cluster representative using scans from both clusters. Alternatively, we can simply stop trying to merge cluster i with a larger cluster when the size of the larger cluster gets down to three; this alternative gives a significant speed-up, sacrificing only a small amount of final image quality. How do we decide whether to merge clusters i and j ? Let Ai denote DigiPaper’s singleresolution exemplar for cluster i, and as above Bj denote our double-resolution representative for cluster j . We recluster using the probability P [Ai j Bj ] as given by equation (3). In order to compare P [Ai j Bj ] against a preset threshold, we must normalize it to account for the different sizes of connected components: 1=p N [Ai j Bj ] = (P [Ai j Bj ]) ; where p is the number of pixels in Ai (aligned with Bj ) that are within a sensor disk’s radius of a black pixel in either Ai or Bj . We found that it is safe to declare a match whenever N [Ai j Bj ] exceeds a threshold value, with default equal to :70. We can interpret this threshold intuitively as saying that we should declare a match if the probability that Ai is a scan of Bj is at least the probability we would obtain if the scanner model predicted each pixel in Ai with probability :70. We use a slightly more aggressive threshold of :68 in the case that cluster i is a singleton and cluster j has at least four members. As a practical speed-up, we do not bother to compute N [Ai j Bj ] if the bounding boxes for Ai and Bj differ too much in either width or height. Reclustering improves output appearance significantly. In our example, m, M, c, te and th have been replaced by better representatives from Figure 5(a) to (b). On the full page from which this detail is extracted (Figures 6 and 7), over 230 out of 701 clusters can be combined with other clusters without introducing any noticeable errors. If we do quit trying for mergers when the larger cluster has fewer than four representatives, reclustering also saves some running time. The time spent testing for mergers of small clusters with large ones is more than repaid by the time saved in not computing optimal representatives. Finally, in one of the byproducts we promised, reclustering can also improve the compression performance of DigiPaper by 10% to 30% on scanned and faxed documents, with the smaller percentage typical for flatbed scans and the larger percentage typical for 200 dpi faxes. See our previous invention proposal, “A similarity measure for scanned bitmaps”. Experiments indicate that super-resolution is fairly important to the reclustering performance. Single-resolution representatives with single-resolution translations (that is, the increment for  in equation (3)) find only about 40% of the valid mergers found by the double-resolution algorithm before starting to make mistakes; single-resolution representatives with double-resolution translations perform somewhat better, finding about two-thirds of the valid mergers found by the fully double-resolution algorithm.

4.5

Breaking Run-Together Letters

For fax inputs, there remain many singleton clusters even after reclustering. Typically about half of these clusters are run-together letters. We added one more step to cope with these problem cases.

8

For each singleton cluster, we make a final pass through its representative, computing a a sequence of “breakable positions” as follows. We count 2 for each orthogonal adjacency, and 1 for each diagonal adjacency, between column c and column c + 1. We say that the position between c and c + 1 is breakable if the total adjacency is no greater than 5, the last breakable position was at least 5 columns to the left, and the total number of black pixels 2 and 3 columns to the left and right is sufficiently large (at least 6 on each of left and right). (This last check avoids breaking a horizontal line at every fifth column.) We then try to match the partial bitmaps between successive breakable positions with previous clusters’ representatives as in Section 4.4. If we find a successful match (using the same thresholds for N [Ai j Bj ] as before), then we replace the partial bitmap by the representative of the larger cluster; otherwise we pass the partial bitmap along unchanged. This crude algorithm finds quite a few matches, 177 on the full document in Figures 6 and 7. Most of the matches are correct matches, and the few that are not—such as breaking an n out of an m—are hardly noticeable. Figure 5(c) shows the results of the breaking step. Of the three run-together pairs, gh, te, and th, only gh had few enough adjacencies to be breakable, and indeed each of the two pieces of gh was successfully matched. As this tiny example suggests, there is substantial room for improvement in this step of our method. A smarter algorithm would parse run-together letters using dynamic programming as in Document Image Decoding.

4.6

Assembling the Output

The final step of our method reassembles the complete output page, replacing each connected component (or matched piece of a connected component as explained in Section 4.5) with its cluster’s representative. The position for the representative is its most likely position according to the scanner model, found by first aligning centers of bounding boxes and then testing the 9 nearby doubleresolution translations. We recomputed all of these alignments, even though most of them—all but the alignments of connected components of clusters that were later combined with other clusters— were computed at an earlier step.

5

Experimental Results

We show four pairs of “before” and “after” pictures in the appendix. All four were run with default thresholds, and all four show some clustering errors. From Figure 6 to Figure 7, the I in In the end in the first column of has changed to l, and the C in the first line of the third column and the o in .com in the last column have changed to the primary text font. Figure 10 has a number of substitutions of the primary text font for slanted and bold fonts. Figure 12 also has a few substitutions of the text font for bold, as in Lemma and A volume calculation, along with the “catastrophic” error mamfolds in the last line above Section 2.2. Finally, Figure 13 and 14 give a 300 dpi flatbed scanner example and its improvement to 600 dpi. (The improvement may or may not be visible depending upon the printer that produced this document!) Again there are a few font substitution errors, notably the l’s in cultures and colorants, and a similar catastrophic error, Ammal Cells in the second line under Biologie. (Both of these ni to m errors should be fixable by a dynamic programming approach to breaking up run-together letters.) We chose this last page in order to test our program’s performance on accent marks. It made only one clear mistake: pr´esente in the fourth line from the end changed to pr˙esente. Our error rates are probably lower than Hobby and Ho’s error rates for the same amount of clustering. For example, in Figure 9 there are 42 font substitution errors, whereas Figure 10 has 9

“only” 29. (Here we count each appearance of the error, even though one cluster merger can cause numerous substitutions.) Almost half of our errors occur in the initial DigiPaper clustering step, and would occur even with fracHausd set to 1.0. One more comment on this document example: most but not all of the poor baseline in Figure 9 is due to a deskewing operation that was done on the original 200 dpi image. This example shows the importance of super-resolved baselines and deskewing after super-resolution. Finally, a comment on our running times: they are fairly long. For example, the improvement from Figure 6 to 7 represents 3 minutes and 32 seconds of processing on a Sun Ultra-60. (Figure 13 to 14 takes a little more time, the other documents quite a bit less.) On the other hand, the running time is easily improvable. Simply by not attempting to merge singletons with other singletons, the time drops to 1 minute and 55 seconds and the quality difference is barely discernable.

6

Conclusions

We have given a principled, scanner-model-based method for improving document images. We believe that our method, or some fast approximation to our method, offers a practical solution to high-quality scanning needs.

Acknowledgments We thank William Rucklidge for answering our questions about DigiPaper and related methods, Chris Dance for help with statistics and chain codes, Yair Weiss for all things Bayesian, and John Hobby for sharing some data with us.

References [1] T. Akiyama, N. Miyamoto, M. Oguro, and K. Ogura. Faxed document image restoration method based on local pixel patterns. SPIE Vol. 3305, 1998, 253–262. [2] H.S. Baird. Document image defect models. In Structured Document Image Analysis, H.S. Baird, H. Bunke, and K. Yamamoto, eds., Springer, 1992, 546–556. [3] E.H. Barney Smith. Characterization of image degradation caused by scanning. Pattern Recognition Letters 19 (1998) 1191–1197. [4] P. Cheeseman, B. Kanefsky, R. Kraft, J. Stutz, and R. Hanson. Super-resolved surface reconstruction from multiple images. In Maximum Entropy and Bayesian Methods, G.R. Heidbreder, ed., Kluwer, 1996, 293–308. Also Technical Report FIA-94-12, NASA Ames Research Center, 1994. http://ic-www.arc.nasa.gov/ic/projects/bayesgroup/group/super-res/ [5] J.C. Handley and E.R. Dougherty. Optimal nonlinear fax festoration. Proc. of the SPIE Document Recognition Conference, SPIE Vol. 2181, Document Recognition, 1994, 232–235. [6] J.D. Hobby. Polygonal approximations that minimize the number of inflections. Proc. 4th Annual ACM-SIAM Symp. on Disc. Algorithms, 1993, 93–102.

10

[7] J.D. Hobby and H.S. Baird. Degraded character image restoration. Proc. 5th Annual Symp. on Document Analysis and Image Retrieval, 1996, 177-189. [8] J.D. Hobby and T.K. Ho. Enhancing degraded document images via bitmap clustering and averaging. ICDAR ’97: Fourth Int. Conf. on Document Analysis and Recognition, 1997. [9] G.E. Kopec and P.A. Chou. Document image decoding using Markov source models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15 (1994) 602–617. [10] G.E. Kopec and M. Lomelin. Document-specific character template estimation. Proc. of the SPIE Document Recognition III Conference, SPIE Vol. 2660, 1996, 14–26. [11] Y. LeCun, L. Bottou, P. Haffner, P. Howard, P. Simard, and Y. Bengio. See http://djvu.research.att.com and http://www.research.att.com/yann. [12] M. Oguro, T. Akiyama, K. Ogura. Faxed document image restoration using gray level representation. ICDAR ’97: Fourth Int. Conf. on Document Analysis and Recognition, 1997, 679–683. [13] W. Rucklidge and D. Huttenlocher. See http://www3.cs.cornell.edu/digipaper. [14] W. Rucklidge and D. Huttenlocher. U.S. Patent 5835638: Method and apparatus for comparing symbols extracted from binary images of text using topology preserved dilated representations of the symbols. [15] W. Rucklidge and D. Huttenlocher. IP 960499. Docket 96114. Fontless structured document image representations for efficient rendering. [16] G.M. Schuster, G. Melnikov, and A.K. Katsaggelos. Operationally optimal vertex-based shape coding. IEEE Signal Processing Magazine, November 1998, 91–108. [17] P.D. Thouin and C.-I. Chang. A method for restoration of low-resolution text images. Proc. 1999 Symp. on Document Image Understanding Technology, Annapolis, 1999, 143–148. [18] M.Y. Yoon, S.W. Lee, and J.S. Kim. Faxed image restoration using Kalman filtering. ICDAR ’95: Third Int. Conf. on Document Analysis and Recognition, 1995, 677–680.

11

Figure 6. 200 dpi input fax.

Figure 7. Improvement to 400 dpi.

Figure 8. 200 dpi input fax (Courtesy John Hobby).

Figure 9. Hobby-Ho improvement to 600 dpi (Courtesy John Hobby).

Figure 10. Bern-Goldberg improvement to 400 dpi.

Figure 11. 200 dpi input fax.

Figure 12. 400 dpi improvement.

Figure 13. 300 dpi flatbed scan.

Figure 14. 600 dpi improvement.