Template Matching based on Quadtree Zernike

2 downloads 0 Views 472KB Size Report
ABSTRACT. In this paper a novel technique for rotation independent template matching via Quadtree Zernike decomposition is presented. Both the template and ...
Template Matching based on Quadtree Zernike Decomposition Alessandro NERI, Marco CARLI, Veronica PALMA, Luca COSTANTINI, a Applied

Electronics Dept.,Universit`a degli Studi Roma TRE, Roma, Italy ABSTRACT

In this paper a novel technique for rotation independent template matching via Quadtree Zernike decomposition is presented. Both the template and the target image are decomposed by using a complex polynomial basis. The template is analyzed in block-based manner by using a quad tree decomposition. This allows the system to better identify the object features. Searching for a complex pattern into a large multimedia database is based on a sequential procedure that verifies whether the candidate image contains each square of the ranked quadtree list and refining, step-by-step, the location and orientation estimate. Keywords: template matching, Zernike moments, quadtree decomposition

1. INTRODUCTION Pattern localization and recognition is an important issue in many applications of machine vision and images database management like robotics and automatic vehicle guidance. In this paper a novel technique for rotation independent template matching via quadtree Zernike’s decomposition is presented. This decomposition is based on complex Circular Harmonic Functions that form a complete orthogonal basis on a unit disc. Usually, pattern matching based on Zernike’s moments is performed by selecting a circle containing the object to be localized and approximating the portion of the image falling inside the circle by a truncated expansion in terms of Zernike’s polynomials up to a given order. Then, pattern location can be estimated thanks to rotation invariants derived by Zernike’s moments. Since a pattern can be easily steered by multiplying the expansion coefficients by a complex exponential factor whose phase is proportional to the rotation angle, effective procedures for maximum likelihod (ML) estimation of both location and orientation can also be devised. Nevertheles, many applications require detection and localization of complicated patterns that have to be distinguished from similar objects differing only for a few fine details. In this case, direct use of Zernike’s moments for computing the ML functional requires a great number of expansion terms. Here, in order to manage objects of arbitrary shape, while reducing the computational workload, we partition the pattern to be localized into small square blocks using a quad tree decomposition. The size of each block is adapted to the local image content and is controlled by the quadratic norm of the error corresponding to the truncated Zernike’s expansion. The quadtree blocks are then ranked with respect to the energy of the low pass filtered gradient or, equivalently, to the Fisher’s information on location and rotation. Maximum Likelihood estimation of the location and the orientation of the first block of the quadtree is performed by means of an iterative quasi-Newton procedure making use of Zernike’s moments The estimation algorithm is an extension of the technique proposed for the Gauss-Laguerre approximation. Compared to the traditional ML technique based on the matching of a candidate image with a whole set of rotated versions of the pattern, this procedure requires a local maximization of functionals derived by Zernike coefficients. The estimated location and orientation are then employed to verify whether the current image contains the second block of the rank ordered template quadtree list. If the quadratic norm of the difference between the subset of the reference template, constituted by the first and the second square of the quadtree and the current image falls below a predefine threshold, the next block of the quadtree This work has been partially supported by Fondazione Ugo Bordoni (FUB).

Image Processing: Algorithms and Systems VII, edited by Jaakko T. Astola, Karen O. Egiazarian Nasser M. Nasrabadi, Syed A. Rizvi, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7245, 724510 © 2009 SPIE-IS&T · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.810521 SPIE-IS&T/ Vol. 7245 724510-1

list is analyzed. The procedure stop either if the energy of the difference exceeds a threshold or the last list element has been processed. This novel technique results to be very useful for automatic localization scopes. The rest of the paper is organized as follows: the proposed scheme is described in details in Section 2, while experimental results are reported in Section 5. Finally, Section 6 presents some concluding remarks.

2. THE ZERNIKE DECOMPOSITION The Zernike polynomials were first proposed in 1934 by Zernike [1 ],[2 ]. Let x = [x1 , x2 ] The Zernike’s polynomials form a complete orthogonal basis set defined on the unit circle of the real plane and belong to the class of circular harmonic function (CHFs), i.e., to the class of complex, polar separable functions with harmonic angular shape. More specifically, let x = [x1 , x2 ] denote the cartesian coordinates of points of the real plane 2 , and let V˜nm (ρ, θ) = Vnm (ρ cos θ, sin θ) denote  the expression of Zernike polinomials Vnm (x1 , x2 ) of order n and repetition index m in polar coordinates ρ = x21 + x22 and θ = tan−1 (x2 /x1 ). Then, we have V˜nm (ρ, θ) = Rnm (ρ)ejmθ ,

(1)

where Rnm (ρ) are the Zernike radial profile defined as: ⎧ ⎨ Rnm (ρ) =

(n−|m|)/2 



s=0

(−1)s (n−s)!ρn−2s

s!(

n+|m| n−|m| −s)|!|( 2 −s)! 2

0

n − |m| even

(2)

n − |m| odd

For a continuous function f (x), inside the unit disk centered at x0 , the following Zernike’s polynomial expansion holds:

f (x) =

∞ +∞  

Anm (x0 )Vnm (x − x0 ),

(3)

n=0 m=−∞

with expansion coefficients Anm (x0 ), also referred in the literature as Zernike moments, given by: Anm (x0 ) =

n+1 π

  x−x0 ≤1

∗ (x − x0 )dx1 dx2 f (x)Vnm

(4)

Since V˜nm (ρ, θ) can be rotated by an angle ϕ by multiplying it by a factor e−jmϕ , the expansion coefficients of an image f (x) rotated by an angle ϕ as related to the expansion coefficients Anm (x0 ) of f (x) by the following relationship: (ϕ) Anm (x0 )

−jmϕ (x0 ) A(ϕ) nm (x0 ) = Anm e

(5)

Due to magnitude invariance under image rotation and high efficiency of image representation, Zernike moments are attractive image features playing an important role in various areas. To calculate the Zernike moments, the image (or region of interest) is first clipped on to the unit disk using polar coordinates, where the center of the image is the origin of the unit disc[3 ],[4 ]. Pixels falling outside the unit circle are not used in the calculation. The accuracy of Zernike moments computed via Eq. (4) suffers from the geometric approximation error,5 .6 This is due to the fact that the total area covered by the square pixels involved in the computation of Zernike moments is not exactly the unit disk. The cause of errors in computing Eq. (4) resides in the use of Cartesian coordinates, which are usually suitable for processing images digitalised using square or rectangular pixel. However, in this case cartesian coordinates are not matched to the circular nature of Zernike polynomials. The geometric error might be avoided by a tesselation of non-square pixels, whose areas add up exactly to that of the unit disk. With reference to polar coordinates, the Zernike moments can be evaluated as follows5 :

SPIE-IS&T/ Vol. 7245 724510-2

Figure 1. Zernike filters with order increasing from 0 to 80 and repetition index m = 0

Anm =

n+1 π

 0



 0

1

f˜(ρ, θ)Rnm (ρ)e−jmθ ρdρdθ

(6)

where f˜(ρ, θ) = f (ρ cos θ, sin θ). The image f˜(ρ, θ) is approximated by the function f˜(ρ, θ) defined over a set of concentric sectors Ωuv so that they add up to the unit disk and they are not overlapping. Then the numerical approximation of Anm is5 : m + 1  ˆ Aˆmn = (7) f (ρuv , θuv )ψmn (ρuv , θuv ) π u v where the summation is performed over all the sectors inside the unit circle, and fˆ(ρuv , θuv ) is an estimate of f˜(ρ, θ) and ere ψnm (ρuv , θuv ) is defined as:   ψnm (ρuv , θuv ) =

Rnm (ρ)e−jmθ ρdρdθ =

Ψuv (s)

(e)

(e)

ρuv 

θuv 

Rnm (ρ)ρdρ (s) ρuv

(e)

e−jmθ dθ

(8)

(e) θuv

(s)

(e)

where ρuv and ρuv represent respectively the starting and the ending radii of Ψuv and θuv and θuv denote the starting and the ending angles, respectively. In this way, the computation of Zernike moments is made without introducing geometric error. In practical situations, the reconstruction of the image is performed by using just a a finite number of Zernike moments, i.e.: M N   ˆ Anm Vnm (ρ, θ). (9) f (ρ, θ) = n=0 m=−M

The truncation error depends on both the number of Zernike’s moments employed and the width of the image.

SPIE-IS&T/ Vol. 7245 724510-3

Figure 2. Illustration of sector Ψuv which represents a polar pixel at (ρuv , θuv ) .

3. THE QUADTREE CONSTRUCTION For image retrieval, descriptors based on Zernike moments have been successfully used for their discriminating power, noise resilience, information redundancy and reconstruction capability. However, in order to manage complex objects of arbitrary shape, that have to be distinguished from similar objects differing only for a few fine details, we partition the region of interest containing the object into square blocks using a quadtree decomposition.7 Then, for each square of the decomposition, the truncated Zernike’s expansion is applied to the portion of the pattern falling inside the circle inscribed in the actual square. Quadtree decomposition squares are then ranked based on the energy of Zernike components. Searching for a complex pattern in a large multimedia database is then based on a sequential procedure that verifies whether the candidate image contains each square of the ranked quadtree list and refining, step by step, the location and orientation estimate. More in detail, let R represent the entire image and let P be a predicate equal to True whenever the accuracy of the Zernike’s decomposition is considered satisfactory. R is partitioned into smaller and smaller square regions R(i) , so that for each R(i) , P (R(i) ) = T rue. We start with the entire image. If P (R) = F alse we divide the image into four squares. If P is False for any region R(i) , we further partition R(i) in other four quadregions and so on. The quadtree technique can be seen as a tree in which each node has exactly four descendants, as illustrated in Fig. 3. In order to control the computational complexity of the whole procedure, we chose as predicate P the comparison of the L2 norm of the approximation error in the reconstruction of a square block of the image with a predefined number of Zernike’s moments with a threshold γ. If the norm of the error between the image itself f (x) and the reconstructed image fˆ(x) exceeds a predefined threshold, a square is further split (Fig. 4). Let us denote with ξi the center of R(i) , with δi its width, and with wT (x) a square window of unitary width, then



2





x − ξ i P (R(i) ) =

wT [f (x) − fˆ(x)]

< γ δi Thus, if the norm of the error between the image itself f (x) and the reconstructed image fˆ(x) exceeds a predefined threshold, a square is further split (Fig. 4).

4. THE TEMPLATE MATCHING PROCEDURE Based on Cramr-Rao lower bound, accuracy and reliability of the pattern location and rotation estimates is strictly related to the Fisher’s information which is, by fact, proportional to the magnitude of the energy of the derivatives along two orthogoal directions and to the energy of the angular derivative, or, equivalently, to

SPIE-IS&T/ Vol. 7245 724510-4

Figure 3. Quadtree Partitioned image

the effective spatial and angular bandwidths. Therefore, since we adopt a sequential detection and estimation procedure that verifies whether each candidate image contains each square of the quadtree, in order to reduce the search time, we first rank the template quadtree blocks on the basis of the energy of the mid and high, angular and radial, frequency components, computed directly from the Zernike’s expansion coefficients. Since rotation of a pattern simply produces a linear phase shift of each Zernike’ s expansion coefficient proportional to the order of the angular harmonic, detection of the pattern belonging to the first square of the ranked quadtree list is performed by means of an iterative quasi-Newton procedure providing also the Maximum Likelihood estimate of both location and the orientation of the that square block in each candidate image. The estimation algorithm is an extension of the technique proposed for the Laguerre-Gauss approximation.8 More specifically, let f (x) be the observed image that contains a noisy, rotated, and scaled version of the reference pattern g(x). At the position b = [b1 , b2 ] and orientation ϕ we have: w[Rϕ (x − b)]f (x) = w[Rϕ (x − b)]g[Rϕ (x − b)] + v(x),

(10)

where w(x) is a generic window, v(x) is the observation noise modeled aa a white, zero-mean, Gaussian random field with power density spectrum equal to N0 /4 and Rϕ is the rotation matrix defined as:   cos ϕ sin ϕ . (11) Rϕ = − sin ϕ cos ϕ The goal is to estimate the parameters b and ϕ. The estimation can be done maximizing the Log-Likelihood functional Λ[f (x); b, ϕ] :   2 2 2 (12) |w[Rϕ (x − b)| |f (x) − g[Rϕ (x − b)]| dx1 dx2 . ln Λ[f (x); b, ϕ] = − N0 Direct Maximum Likelihood estimation is not very simple because the search of maximum w.r.t ϕ implies the search in three dimensions. However the choice of a disk of radius σ as weighting window w(x) and the use of the Zernike expansion leads to a simpler and faster iterative procedure. In fact, expanding both w[Rϕ (x − b)]f (x) and w(x)g(x) we obtain: w[Rϕ (x − b)]f (x) =

+∞ ∞   n=0 m=−∞

1 f Znm (x0 ) Vnm σ

SPIE-IS&T/ Vol. 7245 724510-5



x − x0 , σ

(13)

Figure 4. Quadtree decomposition of ”Lena” image

w(x)g(x) = where

(14)

1 ∗ x−b f (x) Vnm dx1 dx2 , σ σ x−b≤σ   n+1 1 ∗ x = g(x) Vnm dx1 dx2 . π σ σ x−b≤σ

f (b) = Znm g Znm

x 1 g Znm (x0 ) Vnm , σ σ n=0 m=−∞ +∞ ∞  

n+1 π

 

(15) (16)

Substituting (15) and (16) in (12) and considering the orthogonality of Zernike polynomials, we obtain: ln Λ[f (x); b, ϕ] = − =−

2 N0

2 N0

∞ 

 

2

x−b≤σ

|f (x) − g[Rϕ (x − b)]| dx1 dx2 =

+∞ 

f

2 π g

Znm (b) − Znm ejmϕ

2 (n + 1)σ n=0 m=−∞

(17)

The evaluation of the extrema of the log-Likelihood functional is carried out in two steps. First, for each discrete location of a grid, the rotation that maximizes the ML functional is determined and then a discrete log-Likelihood Map is constructed and its absolute maximum localized. Let us denote as the Zernike Moments Likelihood Map the maxima of the truncated version of the above expression w.r.t. rotation versus the pattern location b:

N +M

f

π 2   g jmϕ 2

Z (b) − Zmn e (18) ZM LM (b) = max − ϕ N0 n=0 (n + 1)σ 2 mn m=−M

SPIE-IS&T/ Vol. 7245 724510-6

This map indicates, point by point, the best matching between the region of interest and the reference image ˆ maximizing the likelihood under all possible orientations.8 For each location b, the evaluation of rotation ϕ(b) functional can be performed by means of a quasi-Newton procedure as the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Once for each image of the dataset the ZMLM has been computed, the images are ranked on the basis of the absolute maximum of the associated ZMLM. Then the image presenting the highest ZMLM value is retained as ˆ of the absolute maximum and the corresponding the best candidate for pattern matching, while the location b ˆ ϕ( ˆ b) constitutes the coarse estimate of the pattern position and orientation. This coarse estimate is employed to verify whether the candidate image contains the second block of the rank ordered list of quadtree elements, too. With respect to the first block, we built the ZMLM only for a limited set of possible locations, falling inside a small neighboor of the site predicted on the basis of the coarse location and orientation estimates. In addition, the quasi-Newton procedure utilized to built the ZMLM of the second block is initialized using the estimated orientation of the first block. If the energy of the difference between the subset of the reference template, constituted by the first and the second square of the quadtree and the current image falls below a predefine threshold, location and rotation of the image are refined and the next square analyzed. In particular, the actual rotation is estimated on the basis of the orientation of the line passing through the ceters of the two blocks. The procedure ends when the last block in the list has been processed. If at some stage the energy of the difference exceeds a predefined threshold, the current image is discarded and the next item of the dataset corresponding to the highest ZMLM is considered as candidate for pattern matching.

5. EXPERIMENTAL RESULTS The algorithm has been testeded on several iamges with size of 128x128 eventually rotated by different angles. Regions of interest of 32x32 pixels have been considered (Fig. 5).

oI

Quadtree

QcQjB.Qifj

Figure 5. Quadtree decomposition of a region of interest (Roi) of ”Lena” image

SPIE-IS&T/ Vol. 7245 724510-7

In the quadtree decomposition, the Zernike expansion is truncated at the 20th order. A block is further partitioned if the norm of the approximation error exceeds a predefine threshold. In applying the template matching procedure to each block of the quadtree, for each block of the candidate image the Zernike moments up to the 20th order are computed: this leads to a descriptors array of 231 elements which turns out to be a good trade-off between computational cost and matching effectiveness. The blocks of the template and the blocks of the candidate image are compared and a Log-likelihood map is created. In fig.6 the ZMLM is shown.

Zetndce M

25

Is

Os

Figure 6. Map of the template matching error with zooming of the lowest value

The threshold, that allows the algorithm to detect the image containing the pattern has been set to 1200; this value corresponds to a mean error of the Zernike moment module equal to 5. In tab. 7and tab.8 some results related to the research of a ROI (in particular Lena’s hat) in a multimedia database are reported. In the database different images are present, some of these are noisy image or Jpeg and Jpeg200 encoded versions of the original images. The noisy images have been generated by adding to the original image a zero mean gaussian noise with variance resectively equal to 0,001 and to 0,01. In these tables the columns P1 P2 and P3 represent the coordinates of the centers of the first three blocks found for every image in the database. For this locations, the associated norms of the differences between the reference pattern and the ROI of the current image giving the best approximation named Error1, Error2, Error3 are also reported togheter with the average norm of the mean error (Mean Error) and the estimate orientation (Angle).

Ij

Figure 7. Searching for Lena’s hat in the database

Fig. 8 refers to the case corresponding to a rotation by an angle 30◦ of the template (Lena’s hat). In this case, the mean error associated to the true image and to its noisy versions increases due to interpolation error

SPIE-IS&T/ Vol. 7245 724510-8

introduced during rotation. However, the mean error doesn’t exceed the detection threshold while the estimated angle has an error value between 1◦ − 3◦ . P1()

LJ1SO,OOl Ppp

1 P2()

2 P3()

AgI

(10689)

59683

(46.100)

373,03 (37104)

41108

460.31

332.75

275

(106.89)

838.92

(46.100)

601.66 (37.104)

590.74

677.11

328.68

1.32

(1 19.43)

1064.15 (27.19)

1615.68 234.4

9526

255.52

74.4S

1704.09 236.40

93.6

(90.39) 2306.00 (1 13.23) 2228.47

(90.39) 1 125.60 1886.69 350.60

20.6

(37.32)

(37.32)

25.5

3123.11 (11943) 639.78

L!O,O1 (10689) 2114.03 (106.89) 1510,67 (100,83) (119.43) 1233.88

Ppp

M

(27.19)

1534.66 (65.67)

1287.83

3072.21 (119.44) 806.19

3901.74

481.59

1637.51

1972.66 355.50

Figure 8. Searching for Lena’s hat rotated by 30◦ in the database

In both cases, the algorithm is able to identify the true image and discard the other images in the database. In order to evaluate the performances of the proposed method, the repeatability rate test[10 ] versus the rotation angle has been performed. The repeatability rate is the percentage of detected points which are repeated in both images. The images has been rotated by different angles from 0 to 50◦ .

repeatability rate

1

0.8

0.6

0.4

0.2

0

0

10

20 30 rotation angles in degrees

40

50

Figure 9. Repeatability rate

The results in fig. 9 show how Zernike’s moments seem to provide a rather robust representation of templates. This is mainly consequence of the Zernike’s moments property of rotation invariance.

6. CONCLUSIONS In this work we have presented a novel technique for template matching. It based on Zernike moments and on a quadtree structure that allows a block by block searching. Compared to the traditional ML model based on matching of the observed image with the whole set of rotated patterns, the present method requires an iterative functional estimate based on Zernike moments. The experimental results show an high repeatability rate and an high detection rate. A local representation of the image has been studied with Zernike moments that present some important mathematical properties as orthogonal basis. The Likelihood Map show points by points the best matching under all possible orientations. The use of this technique allows a reliable localization and orientation estimate, even for complex patterns.

SPIE-IS&T/ Vol. 7245 724510-9

REFERENCES 1. Y.Bin and P. Jia-xiong, “Improvement and invariance analysis of zernike moments using as a region based on shape descriptor,” in Proceeding of Compuer Graphics and Image Processing, 2002. 2. R. Mukundan and K. R. Ramakrishnan, Moment Function in Image Analysis: Theory and Applications, World Scientific, 1998. 3. G. Amayeah, A.Erol, G.Bebis, and M.Nicolescu, “Accurate and efficient computation of high order zernike moments,” in Advances in Visual Computing, pp. 462–469, 2005. 4. C. Chong, R. Mukundan, and P. Raveendran, “An efficient algorithm for fast computation of pseudo-zernike moments,” in Intnl. Jnl. Of Pattern Recognition and AI, pp. 1011–1023, 2003. 5. S. M.Pawlak, “On the accuracy of zernike moments on image analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-20, pp. 1358–1364, December 1998. 6. Y. Xin, M. Pawlak, and S.Liao, “Image Reconstruction with Polar Zernike Moments,” in Pattern Recognition and Image Analysis, 3687, 2005. 7. R. Gonzales, R. E. Woods, and S. Eddins, Digital Image Processing Using Matlab, Prentice Hall, 2004. 8. A. Neri and G. Iacovitti, “Maximum likelihood localization of 2-d patterns in the gauss-laguerre transform domain: Theoretic framework and preliminary results,” Transactions on image processing, IEEE 13(1), 2004. 9. M. Carli, F. Coppola, G. Iacovitti, and A. Neri, “Translation, orientation and scale estimation based on laguerre-gauss circular harmonic pyramids,” SPIE Conf.Photonics West , 2002. 10. L.Sorgi, N. Cimminiello, and A. Neri, “Keypoint selection in the laguerre-gauss transformed domain,” in Proc. of 2nd Workshop on Applications of Computer Vision, ECCV, May 2006.

SPIE-IS&T/ Vol. 7245 724510-10