affine invariant shape representation and

0 downloads 0 Views 221KB Size Report
The spectral signature of each image patch is then matched against a set of .... dergoes 3D rotation and translation, and is then orthograph- ically projected onto ...
AFFINE INVARIANT SHAPE REPRESENTATION AND RECOGNITION USING GAUSSIAN KERNELS AND MULTI-DIMENSIONAL INDEXING Jezekiel Ben-Arie

Zhiqian Wang and K. Raghunath Rao

EECS Department University of Illinois at Chicago Chicago, IL 60607

ECE Department Illinois Institute of Technology Chicago, IL 60616

ABSTRACT

This paper presents a new approach for object recognition using ane-invariant recognition of image patches that correspond to object surfaces that are roughly planar. A novel set of Ane-Invariant Spectral Signatures (AISSs) are used to recognize each surface separately invariant to its 3D pose. These local spectral signatures are extracted by convolving the image with a novel con guration of Gaussian kernels. The spectral signature of each image patch is then matched against a set of iconic models using Multi-Dimensional Indexing (MDI) in the frequency domain. Ane-invariance of the signatures is achieved by a new con guration of Gaussian kernels with modulation in two orthogonal axes. The proposed con guration of kernels is Cartesian with varying aspect ratios in two orthogonal directions. The kernels are organized in subsets where each subset has a distinct orientation. Each subset spans the entire frequency domain and provides invariance to slant, scale and limited translation. The complete set of orientations is utilized to achieve invariance to rotation and tilt. Hence, the proposed set of kernels achieve complete ane-invariance.

1. INTRODUCTION The method presented in this paper is based on extracting a localized ane-invariant representation of image patches using a set of novel kernels. This approach, called iconic representation, has de nite advantages over methods that are based on extracting and analyzing low-level features (such as lines, junctions, interest points, etc.). Feature extraction necessitates prior segmentation that commits subsequent analysis of information which might be incomplete and may include substantial irrelevant clutter. Also, the large number of small features extracted leads to a combinatorial explosion in the complexity of matching image data to models. In contrast, in iconic representation, the given image is directly projected onto - or equivalently convolved with - a set of basis functions. The coecients obtained from this linear operation form a signature that yields a rich and localized representation of the image shape, that can also describe texture and other image attributes. A simpli ed feature image is not necessary, and thus early decisions (such This work was supportedby the Advanced Research Projects Agency under ARPA/ONR Grant No. N00014-93-1-1088

as thresholding of edge detector outputs etc.) are completely avoided. Iconic representation can directly `grasp' the entire representation of local features and shapes. A literature survey [1] reveals that many previous iconic approaches have used localized kernels such as Gabor functions [3] or Gaussian derivatives [6], and have achieved success with representation and recognition of image shapes. However, the main drawback of previous methods is that they can handle only similarity-transformed shapes (2D rotation, translation and scale) and do not provide invariance to complete ane transformations that is required when a 2D shape is viewed in a 3D scene. We use the term `ane transformation' to refer to orthographic projection and scaling, which quite accurately represents the perspective transformation of a planar shape when viewed from a distance, i.e. scaling, slant, tilt, 3D rotation and 3D translation. The ane-invariant iconic representation is obtained by convolving the image with a new set of kernels and extracting spectral signatures which are ane-invariant. The kernels are based on Gaussians with two-dimensional modulation, i.e. 2D Gaussian derivatives. The con guration of kernels suggested in this paper is Cartesian with varying aspect ratios in two orthogonal directions. The kernels are organized in subsets where each subset has a distinct orientation. Each subset spans the entire frequency domain and provides invariance to slant, scale and limited translation. The union of di erently oriented subsets, which now has a redundancy in the frequency domain, is utilized to achieve invariance in two additional degrees of freedom, i.e. rotation and tilt. Hence, complete ane-invariance is achieved by the entire set of kernels. Previous approaches tessellate the entire frequency domain in a polar fashion and are based on polar con guration of kernels with one-dimensional modulation and constant aspect ratios. In contrast to the method introduced here (which provides invariance to all geometrical distortions in scaled orthographic projections), previous iconic approaches provide only scale and rotation invariance, and are unable to compensate for 2D translation due to their one-dimensional Gaussian modulation. The spectral signatures obtained from the kernels are used for ane-invariant recognition of image patches that corresponds to approximately planar object surfaces. As explained in Section 3, the process of recognition also includes estimation of the 3D pose. The recognition is based on a Multi-Dimensional Indexing (MDI) scheme in the frequency domain. The indexing method provides robustness

Matching Signatures B’

B

C’

A

Til tA xis Tilt Angle

Sla

nt

X-axis

Kernels

C

D

xis

tA

Til Rotation Angle

X-axis

A’

D’

Figure 1: Partial Subset of Kernels Image Model K l (only @x@ 22l @y@ 3l3 displayed) for oriFigure 2: Invariance of signatures to rotation and tilt of the shape. For the displayed entation l = 0 degrees in spatial dorotation and tilt of the airplane shape, the oriented kernel pairs A-A', B-B', C-C', main. Each subset K l completely and D-D' yield invariant signatures. spans the frequency domain. in partial distortion, background clutter, noise, and illuelliptical 2D Gaussian-based kernelsde ned as  mination e ects, and image degradations due to lower res@ mm @ nn exp ? x22l ? y2l2 Gm;n; olution. MDI [4] has several advantages over conventional Xi ;lYk (x;y) = @x 2X 2Y l @yl low-dimensional hashing [5]. Speci cally, by adding more i  k    dimensions to the indexing scheme, one can use very coarse x xl = cos l sin l (1) quantization (and thus gain robustness), and still eliminate y ? sin l cos l yl overcrowding of bins in the hash table, without reducing disThe standard deviations Xi and Yk of these elliptical kercrimination. MDI also provides databases with signi cantly nels vary in a geometrical progression with i and k as larger retrieval size. In addition, we have shown [1] that (2) Xi = i?1 0 ; Yk = k?1 0 ; i; k = 1:::N our indexing of spectral signatures actually implements the optimal matching principles of our recently developed Exwhere the geometric ratio > 1 and the smallest standpansion Matching (EXM) method [2], which is a robust and ard deviation 0 are constants. The Gaussian in Eq. (1) is e ective template matching method. modulated in two orthogonal axes - which have orientation l (denoted by Xl and Yl ) - by successive derivative operators of order m and n respectively, where m; n = 0:::(Nd ? 1). 2. AFFINE-INVARIANT REPRESENTATION We refer to the order of the derivatives (m; n) as the frequencies of the modulation, since the frequency In our method, an ane-invariant representation of the imp in thep Xl and Yl axes of these kernels is proportional to m and n respectage is obtained by convolving with a set of Gaussian kernels. ively. The orientation of the kernel is denoted by l , which This set is centered at various `interest locations' which coruniformly spans the range [0; 360) degrees in discrete steps respond to centers of prominent image patches. A set of l = 1:::N. Note that one could also use Gabor functions spectral signatures is then generated, Each signature rep(sine/cosine modulation) here. resents a local image patch. These signatures are then independently recognized using Multi-Dimensional Indexing The above scheme generates a subset of modulated Gaus(MDI). The pose of each recognized patch is also obtained sian kernels K m;n;l = fGm;n; Xi ;lYk ; i; k = 1:::N g with as a by-product. Complete objects can be recognized as a identical orientation l and identical modulation frequencies con guration of signatures. (denoted by the order m;n of the 2D partial derivatives), We assume that the object surfaces are relatively small but with varying aspect ratio and size (indexed by Xi with respect to the viewing distance. Thus, projective transand Yk ). For each orientation l , we have a cumulative formations can be well approximated by ane transformasubset K l of kernels which includes all the frequencies, i.e. tions of surface's points [1]. By the term ane transformam; n = 0:::Nd ? 1. The complete set of kernels K consists tion we refer to the transformation of a planar shape that unof the union of all the subsets K l rotated to di erent oridergoes 3D rotation and translation, and is then orthographentations l ; l = 1::N that uniformly span 360 degrees. ically projected onto the image plane and scaled (reduced In practice, one can invoke symmetry properties and utilize or increased in size). This projection can be represented kernels that span only one quadrant (90 degrees) of orientaby a sequence of transformations in the image plane which tion. One viewing each subset of kernels K l (Fig. 1) [1], one include translation, rotation and scale. In addition, to repcan see that each subset completely spans the band-limited resent 3D rotation, slant (foreshortening) and tilt (rotation frequency domain of interest. In fact we have demonstrated of the axis of foreshortening) transformations are necessary. this completeness of the kernels by accurately reconstructMore details are available in [1]. ing a local image patch using one subset of Gaussian kernels both for analysis and synthesis [1]. However, there are two The ane-invariant representation is based on a set of

σY slant about X-axis Slanted & Tilted line 1

line 1

A

A

al

sc e

slant about Y-axis

σX

Original

B

B

Contour plots of signatures

Figure 3: Shifting property of the spectral sig- Figure 4: Contour plots of the signature for the original airplane image (left) and nature in the (X ; Y ) plane with respect to shifted spectral signature obtained for the slanted and tilted shape (right). The scaling and slanting of an arbitrary shape. labels A and B illustrate the shift in the signature. reasons for introducing the over complete set K . The rst in the frequency domain. This means that the combined reason is that the redundancy in the complete set K is utiloutput of the quadrature kernels is almost constant with ized to overcome tilt and rotation. The second reason is translation within the localized window (receptive eld). that quadrature pairs of kernels - which were also found in Since shapes can be slanted and tilted in any orientathe visual cortex [1] - can be used to achieve translation tion in space, one has to generate a subset of kernels for invariance within the receptive eld. each tilt direction and for each orientation, which forms two rotational degrees of freedom. These two degrees of When this con guration of kernels is convolved with freedom are dealt with by using the complete set of kernels a local image patch I (x; y), it generates a set of multiK both for the model signature and for the image signadimensional spectral signatures fS m;n;l ; m; n = 0:::(Nd ? ture. As demonstrated in Fig. 2, even if the model is tilted 1) ; l = 1:::Ng composed of the convolution coecients of and rotated, there is exact correspondence between four of all the kernels. Mathematically, the model signatures (marked by labels A through D) and (3) S m;n;l (Xi ; Yk ) = Gm;n; Xi ;lYk (x; y)  I (x;y) four of the image signatures (marked by labels A' through D'). Using the orientations of these matching signatures the for i; k = 1:::N, where  denotes convolution. The dirotation and tilt between the model and image is estimated. mensions of the spectral signature correspond to Xi ; Yk . Hence the signatures di er in their orientation and frequencies, while within each signature we have a set of aspect 3. RECOGNITION BY INDEXING ratios along the Xl; Yl axes. The recognition scheme is based on the ane-invariant nature The principle of slant-invariance (see Fig. 3) is that of the spectral signatures described in Section 2. With rewhen the image patch corresponds to a slanted shape, say spect to the spectral signatures of the model, the image sigin axis Xl , the correctly oriented signature S m;n;l shifts nature could be shifted (due to slanting and/or scaling), or in the direction of the slant, i.e. Y , with respect to the distorted due to the discrete nature of the orientation, the signature of the unslanted shape. When the shape is scaled, limited range of scales, partial occlusion or irrelevant clutall the signatures fS m;n;l ; l = 1:::Ng shift equally, i.e. ter in the receptive eld of the kernels. To robustly match diagonally, in the (X ; Y ) plane. Hence, combined slant signatures in the presence of these e ects, we use a voting and scale results in a corresponding shift in the (X ; Y ) scheme based on Multi-Dimensional Indexing (MDI) [4]. plane. Fig. 4 displays contour plots of the signature of the airplane model and the corresponding signature when the In our experiments, the kernels Gm;n; X ;lY (x; y) employ a airplane is slanted by 60 degrees with a tilt of 15 degrees. set of Standard Deviations ranging fromi kXi ; Yk = 3:::24, The signature does not change except for a translation in a set of 7 frequencies m; n = 0:::6, and 24 orientations the (X ; Y ) plane (see labels A and B on the plots for lm;n; in steps of 15 degrees. The set of spectral signatures easy registration). This (X , Y ) plane translation between S l (Xi ; Yk ) obtained by convolution with the kernels a model signature and the image signature can be used to used along with a MDI scheme for ane-invariant recognicompute the relative slant and scale between the two. The tion. The hash table is generated using all the signatures positional pose parameters can be retrieved from the X and S m;n;l . We use 6-derivative combinations resulting in 20Y image coordinates and scale (depth information). Here, dimensional indices. For every model, the complete set of its translation invariance is achieved by using odd and even 20-dimensional indices along with its pose (X , Y and l ) derivatives as quadrature pairs, thus eliminating the phase is stored in the hash table. Similarly, given an image patch

to be invariantly recognized, its set of indices are compared with the hash table and each matching index adds one vote for the corresponding model in that entry. In addition, the values of X ; Y and l stored in the entry are used to vote for the pose of each model in the entry. The total number of votes accumulated by each model (with pose) over all the indices of the test image is its matching score.

Figure 5: Six of the 26 model objects in the library. Close to 100% recognition is achieved over a wide range of slant, tilt, scale, and rotations.

erage Signal-to-Noise Ratio (SNR) of 10.3 (de ned as the ratio of the highest vote to the next highest vote). The pose of each model was also estimated correctly in all experiments. Complete recognition was achieved over more than 3 octaves of scaling, slant angles of more than 80 degrees, and image rotation and shape tilt of 360 degrees. Two of the successfully recognized test images are displayed in Fig. 6. Fig. 7 displays a test image with two slanted and tilted models in close neighborhood with some background clutter. Both the trackball and the airplane models are successfully recognized, demonstrating that the scheme is robust to neighboring shapes and background clutter in the image. The scheme is also quite robust to additive noise. The test image of Fig. 8 has considerable additive white noise (SNR=-1.8 dB) and is still successfully ane-invariantly recognized. Experiments in large amounts of colored noise (especially high frequency noise) also yield robust recognition rates. Figure 9 illustrates a test image with a low resolution (32  32) model. This is a drastic reduction from the original 128  128 image size, which could occur due to large viewing distances. Despite the large degradation, experiments over all the 26 models (with scaling of 0.8 and rotation of 30 degrees) yielded a 90% recognition rate. These results show that the representation and recognition scheme is quite robust to signi cant degradation that correspond to lower resolution.

4. REFERENCES

Figure 7: Recognition of two Figure 6: Test image neighboring objects (airplane of ane-transformed trackball and mouse) in background model that was recognized. clutter.

Figure 8: Test image in white Figure 9: Low resolution test noise (SNR=-1.8 dB). image of tank (32  32). We use a library of 26 objects, 8 of which are displayed in Fig. 5. These models consist of real gray-level images (128  128) of objects with some amount of texture as well. Experiments over varied conditions of slant, tilt, scale and rotation yielded close to 100% recognition with a high av-

[1] J. Ben-Arie, K. R. Rao and Z. Wang, \Iconic Representation and Recognition Using Ane-Invariant Spectral Signatures," to appear in ARPA Image Understanding Workshop 96, Palm Springs, CA, Feb. 1996, pp. TBD. [2] J. Ben-Arie and K. R. Rao, \A Novel Approach For Template Matching by Non-Orthogonal Image Expansion," IEEE Transactions on Circuits & Systems for Video Technology, vol. 3, no. 1, Feb. 1993, pp. 71-84. [3] J. Buhmann, J. Lange, C. von der Marlsburg, J. C. Vorbruggen and R. P. Wurtz, \Object recognition with Gabor Functions in the Dynamic Link Architecture: Parallel Implementation on a Transputer Network," Neural Networks for Signal Processing, B. Kosko (Ed.), Prentice Hall, Englewood Cli s, NJ, 1992, pp. 121-159. [4] A. Califano and R. Mohan, \Multidimensional Indexing for Recognizing Visual Shapes," IEEE Trans. on PAMI, vol. 16, no. 4, April 1994, pp. 373-392. [5] Y. Lamdan, J. Schwartz and H. Wolfson, \Geometric Hashing : A general and ecient model-based recognition scheme," Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Tarpon Springs, MD, 1988, pp. 335-344. [6] R. P. N. Rao and D. H. Ballard, \Object Indexing using an Iconic Sparse Distributed Memor," Proc. of Fifth Intl. Conf. on Comp. Vision, June 1995, Cambride, MA, pp. 24-31.