Document Image Processing for Paper Side ... - CiteSeerX

0 downloads 0 Views 1MB Size Report
tical moments in document image processing to improve the per- formance of ... transmission of analogue and digital information. ... implementations of the fundamental idea. ..... is robust to format conversions, such a .pdf to .ps, for example.
1

Document Image Processing for Paper Side Communications Paulo Vinicius Koerich Borges, Student Member, IEEE, Joceli Mayer, Member, IEEE, Ebroul Izquierdo, Senior Member, IEEE

Abstract— This paper proposes the use of higher order statistical moments in document image processing to improve the performance of systems which transmit side information through the print and scan channel. Examples of such systems are multi-level 2-D bar codes and certification via text luminance modulation. These systems print symbols with different luminances, according to the target side information. In previous works, the detection of a received symbol is usually performed by evaluating the average luminance or spectral characteristics of the received signal. This paper points out that, whenever halftoning algorithms are used in the printing process, detection can be improved by observing that third and fourth order statistical moments of the transmitted symbol also change, depending on the luminance level. This work provides a thorough analysis for those moments used as detection metrics. A print and scan channel model is exploited to derive the relationship between the modulated luminance level and the higher order moments of a halftone image. This work employs a strategy to merge the different moments into a single metric to achieve a reduced detection error rate. A transmission protocol for printed documents is proposed which takes advantage of the resulting higher robustness achieved with the combined detection metrics. The applicability of the introduced document image analysis approach is validated by comprehensive computer simulations.

I. I NTRODUCTION Printed paper and its corresponding document image processing is critical for information storage, presentation and transmission of analogue and digital information. In addition to conventional text and images, paper communications include, for example, bar codes, bank and identity documents, and hardcopy text certification. Regarding bar codes, multi-level two-dimensional (2-D) bar codes [1], [2] have gained increased attention in the past few years. Instead of representing information with only black and white symbols, multi-level codes use gray levels to increase the symbol rate, as illustrated in Fig. 1(a). Consequently, a higher capacity version of 1-D bar codes is achieved. Examples of their utilization include representing encrypted information or serving as an auxiliary verification channel. A discussion regarding more applications and aspects related to coding/decoding of 2-D bar codes is given in [1], [2]. With respect to hardcopy text certification, several techniques have been proposed in the literature. Brassil et al. [26] This work was supported by CNPq, Proc. No. 202288/2006-4. Paulo Borges and Joceli Mayer are with the LPDS, Dept. of Electrical Engineering - Federal University of Santa Catarina, Florian¨ı¿ 12 polis, Brazil, 88.040-900. Tel: +55 48 3721-7627, Fax: +55 48 3721-9280. Ebroul Izquierdo is with the Multimedia and Vision Lab - Dept. of Electronic Engineering Queen Mary University of London, Mile End Road, London E1 4NS, UK. Tel: +44 20 7882 5354, Fax: +44 20 7882 7997.

(a) Multi-level 2-D bar code.

S I G NAL

P R O C E S S I NG

1 0 0

0 0

1 1 0

1

1 1 0 1 0 0 1

(b) Example of text certification through luminance modulation.

Fig. 1: Illustration of side communications over paper.

propose and discuss several methods to embed and decode information in documents, which can survive the print and scan (PS) channel. In one of the methods, called line-shift coding, a line is moved up or down according to the bit to be embedded. In order to perform blind detection, line centroids can be used as references. One disadvantage of this method is that the centroids must be uniformly spaced, which does not always occurs in documents. Variations of the method include character and word and character shift coding [27], [23], [25], but they are essentially different implementations of the fundamental idea. Unfortunately, the line and word shifting techniques assume predictable spacing in the document. Equations, titles, and variable size logos or symbols complicate the coding process due to non-uniform spacing. An alternative class of methods (called pixel flipping) performs modifications on the characters pixels [28], [13], such as flipping a pixel from black to white, and vice versa. In [30], for example, the modifications are performed according to the shape and connectivity of the characters. In weak noise conditions, this types of methods present a very high information embedding rate, but since the method relies on small dots, when the PS distortions are considered, they required very high resolutions in both printing and scanning to reduce detection errors. For these pixel flipping techniques, an useful detection statistic when the signal is submitted to the PS process is proposed in [31], based on the compression bit rate of the modified signals. Another important technique is called text luminance modulation (TLM) [3], [4], [5]. It slightly modulates the luminance of text characters to embed side information. This modification is performed according to the target side information and can be set to cause a very low perceptual impact while remaining detectable after printing and scanning. An example of this technique is given in Fig. 1(b), where the intensity changes have been augmented to make them visible and to illustrate the underlying process. In contrast to some of the limitations

2

of the methods detailed above (non-blind detection, need for uniform spacing between lines, errors from segmentation, etc), TLM can be designed to be more robust to these issues, as discussed in [6]. Due to the usefulness of TLM and multi-level 2-D bar codes, this work focuses on improving the detection in such systems. The contributions presented in this paper are manyfold: (i) In TLM and multi-level 2-D bar codes, extraction of the embedded side information is usually performed by evaluating the average amplitude [1], [5], [6] or spectral characteristics [5] of the region of interest. However, considering that halftoning is usually employed in the printing process, other statistics of the received signal can also be exploited in the detection. One example is the sample variance, which can be effectively used as a detection metric, as proposed in [3]. In this work, higher order statistical moments such as skewness and kurtosis are used as detection metrics, extending the work presented in [3]. (ii) The provided analysis shows the relationship between the modulated luminance and the higher order statistics of a printed and scanned halftone region. Statistical assumptions related to the 1st and 2nd order moments and to the PS channel model are based on [3]. This model includes the halftone quantization characteristics of the halftoning process which are exploited here to derive the high order statistics based detection metrics. (iii) Robustness against PS distortions and consequently reduced detection error, by combining the proposed metrics into a single highly efficient metric. This unified model allows previously proposed metrics [5], [6] to be combined with the ones proposed in this paper. (iv) A practical protocol for the certification of printed documents is proposed which exploits the resulting high robustness of the proposed unified metrics. This paper is organized as follows. Section II describes the halftoning process and a PS model, which can be seen as a noisy communications channel. Section III analyzes the relationship between the different moments and the modulated average luminance before PS. Section IV performs a similar analysis considering the PS distortions. Section V proposes that the discussed metrics be combined into a single metric using the Bayes classifier. Section VI proposes an authentication protocol for printed text documents. Experimental results are presented in Section VII, followed by conclusions in Section VIII. II. T HE P RINT

AND

S CAN C HANNEL

A. The Halftoning Process Due to limitations of most printing devices, image signals are quantized to 0 or 1 (where 0 represents white and 1 represents black in this paper) prior to printing . The output ‘0’ represents a white pixel (do not print a dot), and ‘1’ represents a black pixel (print a dot). A halftone image (binary) b is generated from an original image s and this quantization is performed according to a direct comparison between the elements in a dithering matrix D and the elements in s. A

detailed description of halftoning algorithms can be found in [7], [8]. The binary signal b is included in the PS channel model described in the next section. B. A Print and Scan Channel A number of analytical models of the PS channel have been presented in the literature [1], [2], [9], [6]. In general, the proposed models of the PS channel assume that the process can be modeled by low-pass filtering, the addition of Gaussian noise (white or colored), and non-linear gains, such as brightness and gamma alteration. Geometric distortions such as possible rotation, re-scaling, and cropping may also occur, but they are assumed controled as they can be compensated for. The contributions presented in this work are directly related to the effects caused by the halftoning process. For this reason, a PS model which includes the halftoned signal is employed to describe the process. A detailed description of this model is given in [3], where the PS operation is described by    y(m, n) = gs gpr [b(m, n)] + η1 (m, n) ∗ h(m, n) (1) + η3 (m, n), In this equation, η1 represents the microscopic ink and paper imperfections and η3 is a noise term combining illumination noise and microscopic particles in the scanner surface. The term h represents a blurring effect combining the low pass effect due to printing and due to scanning. The term gpr (·) in (1) represents a gain in the printing process and the term gs (·) represents the response of scanners. In the model in (1), b represents the halftone signal, generated from an original signal s, as described in Section II-A. III. E FFECTS I NDUCED

BY THE

H ALFTONE

The halftoning algorithm quantizes the input to ‘print a dot’ and ‘do not print a dot’ according to the dither matrix coefficients. This quantization causes a predictable effect on the variance, skewness, and kurtosis of a halftoned region, according to the input luminance. For the variance, this is dependence is discussed in [3]. In the following this approach is extended to the skewness and the kurtosis, to help improve the detection in printed communications. A. Skewness The skewness measures the degree of asymmetry of a distribution around its mean [14]. It is zero when the distribution is symmetric, positive if the distribution shape is more spread to the right and negative if it is more spread to the left, as illustrated in Fig. 2(a). The skewness of a halftone block b0 of size J × J is given by:

γ1b0 =

J J 1 XX [b0 (m, n) − b¯0 ]3 J 2 m=1 n=1

σb30

(2)

PJ PJ where b0 (m, n) ∈ {0, 1}, b¯0 = (1/J 2 ) m=1 n=1 b0 (m, n) and J 2 is the number of coefficients in the dithering matrix

3

positive replacements

pD 1

negative

area p = s0

0

s0

pb 1

1 − s0 0

s0 1

Fig. 3: Representation of the uniform distribution assumed for the coefficients of D and the distribution of b.

(a) Skewness.

positive negative

γ1b0 (s0 ) =

(b) Kurtosis.

Fig. 2: Illustration of the effect of positive and negative skewness and kurtosis.

b20 (m, n)

D. Since b0 (m, n) ∈ {0, 1}, = b0 (m, n), and (2) can be written as 2 3 b¯0 − 3b¯0 + 2b¯0 (3) γ1b0 =   2 3/2 ¯ ¯ b0 − b0

γ2b0 (s0 ) =

s0 − 3s20 + 2s30 3/2

(s0 − s20 ) s0 − 4s20 + 6s30 − 3s40 2

(s0 − s20 )

(6) (7)

where γ1b (s0 ) and γ2b (s0 ) represent respectively the skewness and the kurtosis of a halftoned block that represents a region of constant luminance s0 . In [3], an analysis similar to the above is presented for the variance, and it is shown that with the same assumptions the variance σb20 (s0 ) is given by σb20 (s0 ) = s0 − s20

(8)

C. Comments on γ1b and γ2b B. Kurtosis The kurtosis is a measure of the relative flatness or peakedness of a distribution about its mean, with respect to a normal distribution [14]. A high kurtosis distribution has a sharper peak and flatter tails, while a low kurtosis distribution has a more rounded peak with wider “shoulders,” as illustrated in Fig. 2(b). The kurtosis of a halftone block b0 of size J × J is given by:

γ2b0 =

J J 1 XX [b0 (m, n) − b¯0 ]4 J 2 m=1 n=1

σb40 2 3 4 b¯0 − 4b¯0 + 6b¯0 − 3b¯0 = −3 2  2 b¯0 − b¯0

−3 (4)

To derive γ1b0 and γ2b0 as a function of the input luminance s(m, n), b0 (m, n) must be generated from a constant gray level region, that is, s(m, n) = s0 , m, n = 1, . . . J, where s0 is a constant. Assuming that D is approximately uniformly distributed as illustrated in pD in Fig. 3, the probability p of b(m, n) = 1, which is Pr[s0 > D(m, n)], is given by X 1 p = Pr[s0 < D(m, n)] = 2 b(m, n) J b(m,n)=1 (5) J J 1 XX ¯ b(m, n) = b = s0 = 2 J m=1 n=1 as illustrated by the area p in Fig. 3. Substituting this result into (3) and (4), yields

The halftone signal b0 is binary and it is distributed according to b0 (m, n) ∈ {0, 1}, with probabilities 1 − s0 and s0 , respectively, as illustrated by pb in Fig. 3. Because the skewness and the kurtosis of b0 depend on s0 , these moments can be used as detection metrics in text luminance modulation and multi-level bar codes, in addition to the average and variance metrics employed in [3]. Regarding the skewness, it is equal to zero when s0 = 0.5 and the distribution of b0 is symmetric, represented by two peaks of equal probability. The two symmetric peaks also flatten the distribution of y in (1), minimizing the kurtosis. When s0 < 0.5, b is composed of more white dots than black dots, leaning the distribution of y to the left and causing a positive skewness. The opposite occurs when s0 > 0.5, yielding a negative skewness. Likewise, the distribution of y becomes more peaky as s0 approach the limits of the luminance range, consequently increasing the kurtosis. IV. E FFECTS I NDUCED

BY THE

PS C HANNEL

The statistical moments described in (3) and (4) are affected by the low-pass characteristic and the noise in the PS channel. Considering these channel distortions γ1y and γ2y are derived in Sections IV-C and IV-D, respectively. Statistical and distortion assumptions for the analyses are discussed in Section IV-A. For simplicity, the (m, n) coordinate system is mapped to a one dimensional notation. A. Statistical and Distortion Assumptions In the model in (1), let b(n) = ¯b + η2 (n). The noise η2 is zero-mean with variance given by ση22 = σb2 , and it is distributed according to η2 ∈ {−s0 , 1 − s0 }, as illustrated in Fig. 4.

4

pη2 1 − s0 −s0

of independent and identically distributed zero-mean random variables are zero, (11) becomes 1 (12) µγ1y = 3 E{α3 }E{[η2 (n) ∗ h(n)]3 } σy

s0 1 − s0

Fig. 4: Distribution of the noise η2 .

Appendix I derives the term E{[η2 (n) ∗ h(n)]3 } in equation above, yielding

Although gs in (1) is generally defined as non-linear, in many devices it can be approximated to a linear model [1]. This is particularly more reasonable in a TLM application, because detector operates in a small range of the luminance range [0, 1] due to the low perceptual impact requirement. For this reason, gs is assumed linear and φ in gs is approximated to 1 for simplicity. Assuming that b(m, n) is generated from a constant gray level region, that is, s(m, n) = s0 = ¯b, (1) can be written as

µγ1y =

1 (σy2 )3/2

(3σα2 µα +µ3α )[(1−s0 )(−s0 )3 +(1−s0)3 s0 ]h3 (13)

where σy2 is described by (10) and h3 is given by: h3 =

∞ X

∞ X

∞ X

h(k)h(l)h(r)

(14)

k=−∞ l=−∞ r=−∞

D. Kurtosis The sample kurtosis of a scanned symbol is given by

y(n) = α[¯b + η2 (n)] + η1 (n) ∗ h(n) + η3 (n), 

(9)

The term α represents a gain (see gpr in (1)) that varies slightly throughout a full page due to non-uniform printer toner distribution. Due to its slow rate of change, α is modeled as constant in n but it varies in each realization i satisfying α ∼ N (µ2α , σα2 ), where i represents the i−th symbol of a 2-D bar code or the i−th character in TLM. Due to the nature of the noise (discussed in Section II) and based on experimental observations, η1 and η3 can be generally modeled as zero-mean mutually independent Gaussian noise [1], [11], [2].

N 1 X [αs0 + αη2 (n) ∗ h(n) + η1 (n) ∗ h(n) σy4 N n=1  4 + η3 (n) − y¯]

µγ2y = E

= =

N 1 X  E [αη2 (n) ∗ h(n) + η1 (n) ∗ h(n) + η3 (n)]4 4 σy N n=1 N 1 X E{α4 [η2 (n) ∗ h(n)]4 } σy4 N n=1

+ 6E{α2 [η2 (n) ∗ h(n)]2 [η1 (n) ∗ h(n)]2 } + 2E{α2 [η2 (n) ∗ h(n)]2 η32 (n)} + E{[η1 (n) ∗ h(n)]4 }

B. Variance Based on the assumptions described above, it is shown in [3] that the sample variance of a scanned symbol y is given by µσy2 = (µ2α + σα2 )ση22 rh (0) + ση21 rh (0) + ση23

(10)

In the following an extension to the skewness and to the kurtosis is presented.

+ 6{[η1 (n) ∗ h(n)]2 η32 (n)} + E{η34 (n)} (15) 4

Appendix II derives the term E{[η2 (n) ∗ h(n)] } in equation above, yielding  1 µγ2y = 4 (3σα4 + 6σα2 µ2α + µ4α )[(1 − s0 )(−s0 )4 σy N + (1 − s0 )4 s0 ]h4 + 6(σα2 + µ2α )ση21 ση22 rh2 (0)

+ 6(σα2 + µ2α )ση22 ση23 rh (0) + 3ση41 rh2 (0)  4 2 2 + 6ση1 ση3 rh (0) + 3ση3

C. Skewness The sample skewness of a scanned symbol y is given by

µγ1y

N 1 X =E [αs0 + αη2 (n) ∗ h(n) + η1 (n) ∗ h(n) σy3 N n=1  + η3 (n) − y¯]3



=

1



N X

 E [αη2 (n) ∗ h(n) + η1 (n) ∗ h(n)

σy3 N n=1 + η3 (n)]3

(11)

Recalling that η1 , η2 and η3 are zero-mean mutually independent random variables and that third order moments

(16)

where h4 is given by: h4 =

∞ X

∞ X

∞ X

∞ X

h(k)h(l)h(r)h(s)

(17)

k=−∞ l=−∞ r=−∞ s=−∞

V. C OMBINING

THE

M ETRICS

Extending the approach which combines the metrics average and variance discussed in [3], it is also possible to combine the skewness and the kurtosis along with another metrics of a received symbol into a single metric to reduce the detection error rate. Considering a stochastic interpretation of the detection metrics, the result of each metric is approximately normally distributed. For this reason, in this work the Bayes classifier

5

[15] is employed to combine the metrics, due to its optimal properties for normally distributed patterns [15]. Results of the detection using the Bayes classifier are given in Section VII. The reader is referred to [3] for an example on how to apply this classifier in the scenario discussed in this paper. Although some detection metrics have better performance than others, because all the first four statistical moments are useful to separate classes, combining them increases the distance between classes, and consequently reduces the detection error rate [16], at the expense of increasing computational complexity. It is also possible to combine useful spectral or other non-statistical metrics, although this is not discussed in this paper. VI. A P RACTICAL AUTHENTICATION P ROTOCOL Taking advantage of the reduced error rate provided by combining the detection metrics, a practical protocol for document authentication based on TLM [5], [6] is proposed. It is similar to the system proposed in [33], but with an alternative detection method. Instead of employing TLM as a side message transmitter, the modulated characters (see Figure 1(b)) are used to ensure that no character in the document has been altered. This is achieved by combining cryptography, optical character recognition (OCR) [21], [22], and the detection of characters with modulated luminances, discussed in this paper. Notice, however, that the proposed system can also be used to authenticate digital documents that are not subject to the PS channel. The proposed framework for authentication scrambles the binary representation of the original text string with a key that depends on the string. The resulting scrambled vector is used to create another vector of dimension equal to the number of characters in the document. This is used as a rule to modulate each character individually, as illustrated in Figure 1(b). A related approach for image authentication in which a digital watermark [10] is generated with a key that is a function of some feature f in the original image has been proposed in the literature, as in [19], [20], [24], for example. To avoid that f be modified by the embedding of the watermark itself, hence frustrating the watermark detection process, only characteristics of a portion of the image must be used. It is possible, for example, to extract features from the lowfrequency components, and to embed the watermark in the high frequency components, as discussed in [10]. In contrast, in the authentication system proposed here, the modified characters luminances do not alter the feature used to generated the permutation key, which are the characters “meanings.” The system is described in the following. A. Encryption •







Let vector c = [c1 , c2 , . . . , cK ] of size K represent a text string with K characters. Let vector s = [s1 , s2 , . . . , sK ] represent the luminances of characters [c1 , c2 , . . . , cK ], respectively. Let ci ∈ Ω (Ω = {a,b,c, . . . , X, Y, Z }, for example), where Ω has cardinality S. Let cbi be the binary representation of symbol ci .









Let cb be the binary representation of c, where cb has size |cb | = K log2 S. Let κ = f (cb ) be a function of cb . κ is used as a key to generate a pseudo-random sequence (PRS) k, such that the PRS’s are ideally orthogonal for different keys κ. Let c′b = cb  k, where  represents the “exclusive or” (XOR) logical operation. Let M be a function that maps c′b , with |cb | bits, to another vector w, with K bits. c

s/b cb cb  k

c′b

k PRS(κ)

M |cb | → K w

κe

Encrypt

κ f (cb )

we = s

Fig. 5: Encryption block diagram. Block ‘s/b’ represents string-to-binary conversion. Block ‘M’ represents a mapping of c′b from |cb | bits to K bits. The symbol  represents the “exclusive or” (XOR) logical operation. In order to provide security, w is encrypted with the private key of a public key cryptosystem [17]. Public key cryptosystems use two different keys, one for encryption, κe , and one for decryption, κd . The private key κe is only available for users who are allowed to perform the authentication process. On the other hand, anyone can have access to the public key κd to only check whether a document is authentic, without the ability to generate a new authenticated document. Let we be the encrypted version of w based on the key κe , using the a public key encryption scheme such as the RSA [17], for example. To authenticate the text document, vector s (which represents the luminances of the characters in the document) is modified such that s = we . Therefore, the document is authenticated by setting the luminance of each character ci equal to si . B. Decryption In the verification process, OCR is applied to the printed document. In addition, the luminance of each character is determined using the metrics proposed in this paper. Therefore, when testing for the authenticity of the document one has access to a received ˆc and a received ˆs, where cˆ and ˆs represent the received vectors c and s, respectively. It is assumed that the conditions are controlled such that no OCR or luminance detection errors occur. Moreover, one has access to a public key κd for decryption in the RSA algorithm and a scrambling key κ = f (ˆcb ), which depends on cˆ. ˆe Using the public key κd , it is possible to decrypt ˆs = w ˆ into w.

6

ˆb (the binary represenUsing κ, it is possible to scramble c tation of cˆ) yielding ˆ c′b . Applying the same mapping rule M ˆ ′. of the encryption process to ˆ c′b yields a new vector w ′ ˆ =w ˆ the document is assumed authentic. Else, it is If w assumed that one or more characters have been altered. A block diagram of the authentication test process is given in Fig. 6. ˆc

s/b ˆcb ˆ′b c

ˆ ˆ cb  k

M |ˆ c′b | → K

ˆ k PRS(ˆ κ) ˆ′ w κ ˆ f (ˆ cb ) ˆs = w ˆe

values for the parameters in (1) are ση1 = 0.018, ση3 = 0.01, µα = 0.8, σα = 0.03. TABLE I: Combinations experiments. Printer HP IJ-855C HP IJ-855C HP IJ-855C HP IJ-870Cxi HP IJ-870Cxi HP IJ-870Cxi HP LJ-1100 HP LJ-1100 HP LJ-1100

of printers and scanners used in the

Decrypt

Equality Test

κd Authentic? Fig. 6: Decryption block diagram. Block ‘s/b’ represents string-to-binary conversion. Block ‘M’ represents a mapping of cˆ′b from |cb | bits to K bits. The symbol  represents the “exclusive or” (XOR) logical operation. If an attacker changes one or more characters in the docuˆ and w ˆ ′ are two completely different ment such that cˆ 6= c, w sequences (quasi-orthogonal) with very high probability, failing the authentication test. A practical example of the proposed system is given in Section VII. Although OCR has been included in the detection process assuming that the document has been printed and scanned, the proposed authentication protocol can be applied to digital documents. It does not require the use of appended files and it is robust to format conversions, such a .pdf to .ps, for example. Hence, unlike a digital signature, which protects the binary codes of the documents, the system proposed here protects the visual content or the meaning of the document. VII. E XPERIMENTS The goal of this section is to validate experimentally the analyses of Sections III and IV and to illustrate that higher order moments can be used to detect a luminance change in printed symbols. During the experiments, the noise and the distortion parameters of the PS channel vary depending on the printing and scanning devices used. The experiments are conducted with different combinations of printers and scanners, according to the legend in Table I. The printing and scanning resolutions were set to 300 dots/inch and pixels/inch, respectively. Typical

Legend C1 C2 C3 C4 C5 C6 C7 C8 C9

As discussed in [3], comparing the frequency responses of an original digital image and its PS version after several experiments, the response h(m, n) is well represented by a traditional Butterworth low-pass filter described by H(f1 , f2 ) =

ˆ w

Scanner Genius HR6X HP 2300C HP SJ-5P Genius HR6X HP 2300C HP SJ-5P Genius HR6X HP 2300C HP SJ-5P

1 1 + [F (f1 , f2 )/F0 ]2Q

(18)

where Q is the filter order, F0 is the cutoff frequency, and F (f1 , f2 ) is the Euclidean distance from point f1 , f2 to the origin (center) of the frequency spectrum. Although different filters could be used, for this model the filter order Q and cutoff frequency F0 which yield the best approximation of the frequency response of the process are determined experimentally through curve fitting. In the tests, these parameters are given by Q = 1 and F0 = 0.17 for the devices used. Using the noise, gain and blurring filter parameters described above, a character or symbol distorted with the proposed PS model is perceptually similar to an actual printed and scanned character, as illustrated in [3]. Regarging the perceptual impact of the method, if the modulation intensity exceeds a given perceptual threshold, it becomes less difficult for a human viewer to notice the modifications. Nevertheless, unlike regular images (like natural photos), where the “meaning” of the image depends on the values of the pixels, in text documents the “meaning” is given by the shape of the characters, such that letters can be recognized and words interpreted. In this sense, characters could be of any color and as long as they are readable, the information content of the modified document will be exactly the same as the original all-black character document. The relevance of the perceptual constraint is that it refrains the reader from being bothered or annoyed while reading, helping to ensure that the reader’s attention is not drawn to the fact that the characters are modified. A. Experiment 1 The effect of a halftone skewness level that depends on the input luminance is illustrated in Fig. 7, where two curves are presented. The black curve (‘Theoretical’) represents the theoretical skewness presented in (7). The gray curve (‘Bayer’) represents the skewness of a halftone block (before PS)

7

10

0.05

Theoretical Bayer

Theoretical Experimental 0.04

Variance

Skewness

5

0

0.03

0.02

−5 0.01

−10 0

0.2

0.4 0.6 Luminance

0.8

0 0

1

0.6

0.8

1

Fig. 9: The effect of variance dependent on the input luminance, after PS.

30

2

Theoretical Bayer

Theoretical Experimental 1 Skewness

20 Kurtosis

0.4

Luminance

Fig. 7: The effect of skewness dependent on the input luminance.

25

0.2

15 10 5

0

−1

0 0

0.2

0.4 0.6 Luminance

0.8

−2 0

1

Fig. 8: The effect of kurtosis dependent on the input luminance.

0.2

0.4 0.6 Luminance

0.8

1

Fig. 10: The effect of skewness dependent on the input luminance, after PS. 6 Theoretical Experimental

5 4 3 Kurtosis

generated using the Bayer dithering matrix [18]. Similar experiments are presented regarding the kurtosis, as shown in Fig. 8. These figures illustrate that the analyses of Section III are in accordance with the results obtained from a practical halftone matrix.

2 1 0

B. Experiment 2 This experiment illustrates the validity of the channel model described in Section II and the expected values of the higher order moments as a function of the input luminance, determined analytically in Section IV. The effect of a printed and scanned variance level that depends on the input luminance is illustrated in Fig. 9, where two curves are presented. The black curve (‘Theoretical’) represents the theoretical variance determined in (10). The gray curve (‘Experimental’) represents the variance of printed and scanned blocks, originally of size 32 × 32. Similar experiments are presented regarding the skewness and the kurtosis determined in (13) and (16), as shown in Figures 10 and 11, respectively. The ‘Experimental’ curve in these figures corresponds to the averaging of the results obtained for the nine combinations C1 − C9 of PS devices. Figure 12(a) shows the histogram of a PS block generated from a constant luminance value s0 = 0. Similarly, histograms for s0 = 90 and s0 = 180 are presented in Figures 12(b) and 12(c), respectively, illustrating a change in the shape of the distribution. C. Experiment 3 In this experiment a multi-level 2-D bar code is printed with a sequence of 56000 symbols with four possible lu-

−1 −2 0

0.2

0.4

0.6

0.8

1

Luminance

Fig. 11: The effect of kurtosis dependent on the input luminance, after PS. TABLE II: Experimental error rates for Metric # of Errors Average (µ) 667 Kurtosis (γ2 ) 1860 Comb. (µ, σ 2 ) 114 Comb. (µ, γ1 ) 259 Comb. (µ, σ 2 , γ1 ) 50 Comb. (µ, σ 2 , γ1 , γ2 ) 22 TABLE III: Experimental error rates for Metric # of Errors Average (µ) 157 2 Variance (σ ) 144 Skewness (γ1 ) 280 Kurtosis (γ2 ) 328 Comb. (µ, σ 2 ) 14 Comb. (µ, γ1 ) 27 Comb. (µ, σ 2 , γ1 ) 8 Comb. (µ, σ 2 , γ1 , γ2 ) 3

2-D bar codes. Error Rate 1.19 × 10−2 3.32 × 10−2 2.04 × 10−3 4.63 × 10−3 8.93 × 10−4 3.93 × 10−4

text watermarking. Error Rate 1.03 × 10−2 9.48 × 10−3 1.84 × 10−2 2.16 × 10−2 9.22 × 10−4 1.78 × 10−3 5.27 × 10−4 1.98 × 10−4

8

0.05 Bit 0 Bit 1

Variance

0.04 0.03 0.02 0.01

(a) Histogram for s0 = 0.

0 0.5

0.6

0.7

0.8

0.9

Luminance

Fig. 13: Decision boundary combining two metrics. Note the dispersion and the offset caused by the distortions of the PS channel.

(b) Histogram for s0 = 0.65.

of well-know 2-D bar codes used in practice, such as Data Matrix, Aztec Code and QR code, to the rate of multilevel 2-D bar codes. The multilevel codes show a superior performance in comparison the non-multilevel codes, when the modulation levels of the multilevels codes are properly assigned, according to the print-scan devices employed.

D. Experiment 4

(c) Histogram for s0 = 0.3.

Fig. 12: Histograms with different shapes illustrating the change in the skewness and in the kurtosis of a PS region, according to the input luminance s0 .

minance levels (2 bits/symbol) drawn from the alphabet {0.08, 0.34, 0.65, 0.95}. Optimum values for the alphabet depend on the PS devices used, as discussed in [1] and [2], where the authors present a study on multilevel coding for the PS channel assumed. The original size (prior to printing) of each symbol is 8 × 8, corresponding to the size of one halftone block. Table II shows the obtained bit error rates when performing the detection using the four suggested metrics (average, variance, skewness, kurtosis) separately. This table also presents the result of combining the metrics with the Bayes classifier, illustrating a smaller error rate. Because the variance and the kurtosis are symmetric around the middle of the luminance range, they cannot be used alone as detection metrics. In [2], for example, in a experiment comparable to the first row (average detection) in Table II, the observed error rate was 1.817 × 10−2 . This rate is significantly improved when the multilevel coding with multistage decoding (MLC/MSD) proposed in [2] is employed. Notice, however, that such more advanced coding/decoding methods can also be applied to the higher order statistics proposed in this work. The size of the 8 × 8 cell used is comparable to traditional non-multilevel 2-D bar codes used in commercial applications. In [2] the authors compare the rate (in bytes per square inch)

This experiment implemented the text hardcopy watermarking system [5], [6], which embeds data by performing modifications in the luminances of characters, respecting a perceptual transparency requirement. A sequence of 15180 characters (as in ’abcdef...’ ) is printed and scanned. The font type tested was ‘Arial’, size 12. The luminances of the characters were randomly modified to {0.95, 0.84} with equal probability, where 0.95 corresponds to bit 0 and 0.84 corresponds to bit 1. To determine to which class (bit 0 or bit 1) each received character belongs to, the four metrics discussed so far were tested. The resulting obtained error rates are given in Table III. An example of employing the Bayes classifier to combine two metrics (the average and the variance) is given in Fig. 13. Note that decision boundary combining the metrics yields a reduced error rate, in comparison to the boundaries based solely on the average or the variance. Similarly, Fig. 14 illustrates a surface separating the two classes in a 3-D space formed by the average, the variance, and the skewness. Notice that this small error rate is achieved using standard consumer printing and scanning devices and regular paper. Using professional equipments, it is reasonable to assume that the error rate is close to zero in small documents (such as identification cards, passports, etc), specially if complete perceptual transparency is not a requirement. This assumptions make practical the authentication protocol proposed in Section VI.

E. Experiment 5 This section illustrates two applications for the authentication protocol proposed in Section VI. One in printed form and one in digital form.

9

Bit 0 Bit 1

Skewness

6 4 2 0 0.05 0.04 0.03 0.02 0.01 Variance

1

Average 0.5

Fig. 16: Scanned ID. The last digit in the birth date is modified from 8 to 9, as indicated by the arrow.

Fig. 14: Decision boundary combining three metrics. Document Image Processing for Paper Side Communications

Fig. 15: ID authenticated using the protocol proposed in Section VI. Notice the modified character luminances.

Using the 8-bit ASCII standard, the parameters for this sample are: • K = 49. • c = {D,o,c,u,m,e,. . . ,i,o,n,s}. • cb = [0100010001101111, . . ., 01110011]. which yields to the modulation string we = [10011101, . . . , 1011].

1) ID Card: An identification card is authenticated using TLM, as shown in Fig. 15. Notice that the intensity changes are visible to illustrate the underlying process. A modified version of the card is generated, where the last digit in the ‘Valid Until’ field is modified. To reduce the probability of OCR and luminance detection errors, only numbers and upper and lower cases characters of the alphabet are considered in this test. For the non-tampered document in Fig. 15, using a 8-bit ASCII table to represent the characters [17], the following parameters (discussed in Section VI) are obtained: • K = 124. • c = {I,N,T,E,R,N,. . . ,2,0,0,8}. • cb = [0100100101001110, . . ., 00111000]. • κ = f (cb ) = 1176020. • k = [0100110110101110, . . ., 00010101]. ′ • cb = cb  k = [0000010011100000, . . ., 00101101]. • w = [11100, . . . , 0110]. • we = [01101, . . . , 0111]. we is composed of K elements, corresponding to the number of characters in the document. The document is authenticated by altering the luminance of each character ci in c to wei in we . Notice that the characters luminances in the document in Fig. 15 are modified according to we . After printing, the parameters are again obtained, based on the tampered with printed document in Fig. 16. Because the ˆ and w ˆ ′ are two last digit of the document is different, w completely different sequences, failing the equality test shown in Fig. 6. 2) Paper Title: Some of the characters luminances on the title of this paper are slightly modulated to a gray level. This can be verified using any screen capture tool and common image processing software. Increasing the luminance gain to a visible level, the text becomes:

as illustrated in the characters luminance above. If the ‘D’ in ‘Document’ is modified to ‘d’: • K = 49. • c = {d,o,c,u,m,e,. . . ,i,o,n,s}. • cb = [110010001101111, . . ., 01110011]. a completely different string we is generated: we = [01100100, . . . , 1000]. and the authentication is not verified. VIII. C ONCLUSIONS This paper reduces the detection error rate of printed symbols, in cases where the luminances of the symbols depend on a message to be transmitted through the PS channel. It has been observed that as a consequence of modifying the luminances, the halftoning in the printing process also modifies the higher order statistical moments of a symbol, such as the variance, the skewness and the kurtosis. Therefore, in addition to the average luminance and spectral metrics, these moments can also be used to detect a received symbol. This is achieved without any modifications in the transmitting function. A PS channel model which accounts for the halftoning noise is described. Analyses determining the relationship between the average luminance and the higher order moments of a halftone image are presented, justifying the use of the new detection metrics. In addition to the extended detection metrics, this paper also proposes an authentication protocol for printed and digital documents, where it is possible to determine whether one or more characters have been modified in a text document. The experiments illustrated: the successful applicability of the new metrics; that a reduced error rate is achieved when the metrics are combined according to the Bayes classifier; two possible applications of the proposed authentication protocol,

10

using as an example the title of this paper. Notice that the contributions presented can be combined with other methods, serving as a practical alternative for document authentication. A PPENDIX I This appendix derives the result presented in (13). µy = E{[(η2 (n) ∗ h(n)]3 } ( ∞ 3 ) X =E h(k)η2 (n − k) =E

(

k=−∞ ∞ ∞ X X

∞ X

−∞

= (−s0 )4 (1 − s0 ) + (1 − s0 )4 s0

(26) Therefore, (24) can be written as µy = h4 (−s0 )4 (1 − s0 ) + (1 − s0 )4 s0 R EFERENCES

k=−∞ l=−∞ r=−∞

Let ∞ X

∞ X

∞ X

h(k)h(l)h(r)

(20)

k=−∞ l=−∞ r=−∞

Recalling that η2 is uncorrelated noise and η2 ∈ {−s0 , 1 − s0 } with probabilities {1 − s0 , s0 } yields E{η2 (n − k)η2 (n − l)η2 (n − r)} = 6 0⇔k=l=r

(21)

and Z ∞ E{η23 } = η23 fη2 (η2 )dη2 −∞ Z ∞ (22) = η23 [δ(η2 + s0 )(1 − s0 ) + δ(η2 − 1 + s0 )s0 ]dη2 −∞

= (−s0 )3 (1 − s0 ) + (1 − s0 )3 s0

Therefore, equation (19) can be written as µy = h3 [(−s0 )3 (1 − s0 ) + (1 − s0 )3 s0 ]

(23)

where h3 is given by (20). A PPENDIX II This appendix derives the result presented in (16). µy = E{[(η2 (n) ∗ h(n)]4 } ( ∞ 4 ) X =E h(k)η2 (n − k) k=−∞

=E

(

∞ X

∞ X

∞ X

∞ X

(24)

h(k)h(l)h(r)h(s)

k=−∞ l=−∞ r=−∞ s=−∞

 η2 (n − k)η2 (n − l)η2 (n − r)η2 (n − s) Let h4 =

∞ X

∞ X

∞ X

∞ X

k=−∞ l=−∞ r=−∞ s=−∞

h(k)h(l)h(r)h(s)

(27)

(19)

h(k)h(l)h(r)

 η2 (n − k)η2 (n − l)η2 (n − r)

h3 =

Recalling that η2 is uncorrelated noise and η2 ∈ {−s0 , 1 − s0 } with probabilities {1 − s0 , s0 } yields Z ∞ 4 E{η2 } = η24 fη2 (η2 )dη2 −∞ Z ∞ = η24 [δ(η2 + s0 )(1 − s0 ) + δ(η2 − 1 + s0 )s0 ]dη2

(25)

[1] N. D. Quintela and F. P¨ı¿ 12 rez-Gonz¨ı¿ 12 lez. “Visible encryption: Using paper as a secure channel.” In Proc. of SPIE, USA, 2003. [2] R. V¨ı¿ 12 llan, S. Voloshynovskiy, O. Koval, and T. Pun, “Multilevel 2D bar codes: towards high capacity storage modules for multimedia security and management,” IEEE Transactions on Information Forensics and Security, 1, 4, pp. 405-420, December 2006. [3] P. Borges and J. Mayer, “Text luminance modulation for hardcopy watermarking,” Signal Processing, 87, pp. 1754-1771, 2007. [4] A. K. Bhattacharjya and H. Ancin, “Data embedding in text for a copier system,” Proc. of IEEE Int’l Conf. on Image Processing, Vol. 2, 1999. [5] R. V¨ı¿ 12 llan, S. Voloshynovskiy, O. Koval, J. Vila, E. Topak, F. Deguillaume, Y. Rytsar and T. Pun, “Text data-hiding for digital and printed documents: theoretical and practical considerations” in Proc. of SPIE, Elect. Imaging, USA, 2006. [6] P. V. Borges and J. Mayer, “Document watermarking via character luminance modulation,” IEEE Int’l Conf. on Acoust., Speech and Signal Proc., May 2006. [7] R. A. Ulichney, “Dithering with blue noise,” in Proc. of IEEE, Vol. 76, No. 1, 1988. [8] R. A. Ulichney, Digital Halftoning, 1988. [9] K. Solanki, U. Madhow, B.S. Manjunath, S. Chandrasekaran “Modeling the print-scan process for resilient data hiding,” in Proc. of SPIE, Elect. Imaging, USA, 2005. [10] Ingemar J. Cox, Matthew L. Miller and Jeffrey A. Bloom, Digital Watermarking, Morgan Kaufmann, 2002. [11] S. Voloshynovskiy, O. Koval, F. Deguillaume and T. Pun, “Visual communications with side information via distributed printing channels: extended multimedia and security perspectives,” in Proc. of SPIE, Electronic Imaging 2004, San Jose, USA, January 18-22 2004. [12] M. Norris and E. H. B. Smith “Printer modeling for document imaging,” in Proc. Int’l Conf. on Imaging Science, Systems and Technology, USA, 2004. [13] T. Amano, “A feature calibration method for watermarking of document images,” in IEEE Proc. of the Fifth Int’l Conf. on Document Analysis and Recognition, ICDAR ’99. 20-22 Sept. 1999. [14] D. Manolakis, V. Ingle, S. Kogon Statistical and Adaptive Signal Processing, McGraw-Hill, 2000. [15] R. Duda, P. Hart, D. Stork Pattern Classification, Wiley-Interscience, 2000. [16] S. Theodoridis, K. Koutroumbas, Pattern Recognition, AP, 2006. [17] B.Sklar Digital Communications Prentice-Hall 2001. [18] B.E. Bayer, “An Optimum Method for Two-Level Rendition of Continuous Tone Pictures,” in IEEE Int’l Conf. on Communications, 1973. [19] J. Cannons and P. Moulin, “Design and statistical analysis of a hashaided image watermarking system,” in IEEE Trans. on Image Processing, Vol. 13, Issue 10, Oct. 2004 Page(s):1393 - 1408. [20] Xiaoqiang Li and Xiangyang Xue, “Fragile authentication watermark combined with image feature and public key cryptography,” in 7th International Conference onSignal Processing, ICSP ’04. 2004. Volume 3, 31 Aug.-4 Sept. 2004 [21] S. Mori, C.Y. Suen, K. Yamamoto, ”Historical review of OCR research and development,” in Proceedings of the IEEE, Volume 80, Issue 7, July 1992 Page(s):1029 - 1058. [22] Yihong Xu and G. Nagy, “Prototype extraction and adaptive OCR,” in IEEE Trans. on Pattern Analysis and Machine Intelligence, Volume 21, Issue 12, Dec. 1999 Page(s):1280 - 1296.

11

[23] A. M. Alattar and O. M. Alattar, “Watermarking electronic text documents containing justified paragraphs and irregular line spacing,” Proc. of SPIE, Volume 5306, June, 2004. [24] P. W. Wong and N. Memon, “Secret and public key image watermarking schemes for image authentication and ownership verification,” IEEE Trans. on Image Processing, Volume 10, Issue 10, Oct. 2001. [25] H. Yang and A. C. Kot, “Text document authentication by integrating inter character and word spaces watermarking,” Proc.IEEE Int’l Conf. on Multimedia and Expo, 2004. [26] J.T. Brassil, S. Low, N.F. Maxemchuk, “Copyright protection for the electronic distribution of text documents,” Proc. of IEEE, Volume 87, No. 7, pp. 1181-1196, July 1999. [27] D. Huang and H. Yan, “Interword distance changes represented by sine waves for watermarking text images,” IEEE Trans. on Circuits and Systems for Video Technology, Volume 11, No. 12, pp. 1237-1245, Dec. 2001. [28] Min Wu and Bede Liu, “Data hiding in binary image for authentication and annotation,” IEEE Trans. on Multimedia, August 2004. [29] M. Mese and P.P. Vaidyanathan, Recent advances in digital halftoning and inverse halftoning methods, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Vol. 49, Issue 6, June 2002 Page(s):790 - 805. [30] Q. Mei, E. K. Wong, and N. Memon,“Data hiding in binary text documents,” in SPIE Proc. Security and Watermarking of Multimedia Contents III, Vol. 2, San Jose, CA, January, 2001. [31] M. Jiang, E. K. Wong, N. Memon and X. Wu, “Steganalysis of Degraded Document Images,” in IEEE Workshop on Multimedia Signal Processing, October, 2005. [32] Steven M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Volume I, Prentice Hall, 1993. [33] R. Villan, S. Voloshynovskiy, O. Koval, F. Deguillaume and T. Pun, “Tamper-proofing of electronic and printed text documents via robust hashing and data-hiding,” in Proc. of SPIE-IST Electronic Imag. 2007, Sec., Steganography, and Watermarking of Multimedia Contents IX, San Jose, USA, 2007.