PART-BASED BAYESIAN RECOGNITION USING ... - Semantic Scholar

3 downloads 0 Views 537KB Size Report
Kaleem Siddiqi, Jayashree Subrahmonia, David Cooper and Benjamin B. Kimia ..... 2] Benjamin B. Kimia, Allen R. Tannenbaum, and Steven W. Zucker. Shapes ...
PART-BASED BAYESIAN RECOGNITION USING IMPLICIT POLYNOMIAL INVARIANTS Kaleem Siddiqi, Jayashree Subrahmonia, David Cooper and Benjamin B. Kimia Division of Engineering, Brown University Providence, RI 02912 ABSTRACT We present an approach to recognition that is based on partitioning and invariant recognition in a Bayesian framework. The intended application domain is that of complex articulated objects in arbitrary position and under considerable occlusion. First, since the performance of traditional model-based recognition strategies degrades with increasing object data-base size, with partial occlusion, and with articulation, we employ a partitioning that does not rely on apriori primitives or models. Rather, this scheme decomposes segmented shapes into parts, where the form of each part is not known apriori, but is derived based on generic geometric assumptions about objects and their projections. Speci cally, two types of parts, neck-based and limbbased, give rise to a shape decomposition that remains invariant under occlusion in the visible portion of the object, unaltered under articulation of parts, is stable under slight changes in viewing geometry and nally is robust with changes in resolution and scale. Second, the parts derived from the rst stage are described by implicit polynomial curves. These polynomials represent the parts well and are computationally simple to t to the data. However, the great advantage in using implicit polynomials is the algebraic invariance associated with them. Each part is represented by a vector of invariants that remains essentially independent of viewing geometry, and as such is suitable for matching purposes. The matching process is a Bayesian engine based on asymptotic distributions. In the conclusion section, we brie y indicate how this technology ts into a complete object recognition system. 1. PARTITIONING BASED ON LIMBS AND NECKS Underlying recognition is an organization of objects and their parts into classes and hierarchies. In computer vision, the notion of recognition based on \parts" has become increasingly popular. Perhaps the most compelling support for this idea comes from recognition in the presence of occlusion: while local features

Figure 1: Left: Limb-based parts arise from part-lines

whose end points are negative curvature minima, and whose end-point's tangents continue smoothly from one to another (co-circular tangents [1].) Right: Neck-based parts arise from part-lines whose length is a local minimum, and which are the diameter of an inscribed circle. These necks correspond to second-order shocks of [2].

Figure 2: This gure illustrates that partitioning and scale

bring out the coarse level structure of the target which is blended-in with considerable occlusion.

Figure 3: The computed hierarchy of parts supports a semantic coarse-level description of a tank as \a small part centered on top of a larger part". Other objects would not match against such a description, leading to the possibility of \semantic matching" and even the possibility of classi cation of objects not in the database.

13

8

are sensitive to noise and other variations, global structures are susceptible to occlusion by other objects. In contrast, the stable computation of a few parts can lead to recognition that is robust to occlusions. In addition, objects are often composed of moving parts: while the description of each part remains intact, the relationships between parts change. Our partitioning scheme is based on decompositions along limbs and necks. Formally, a part-line is a curve whose end points rest on the boundary of that shape and which divides it into two connected components. A limb is a part-line going through a pair of negative curvature minima with co-circular boundary tangents on (at least) one side of the part-line, Figure 1 (left). A neck is a part-line which is also a local minimum of the diameter of an inscribed circle, Figure 1 (right). Intuitively, limbs result from composing two objects together, e.g., a gun attached to a tank, while necks result from narrowings for articulation, e.g., the neck that often exists between a turret and the main body of a tank. These part types are highly correlated with how people break shapes into parts [3]. In addition, they satisfy a number of computational constraints for recognition [4]. In particular, they are robust to occlusion, Figure 2, stable with changes in viewing geometry, and are designed to yield functional three-dimensional parts. Further, note that parts can cover a range of scales, with small parts being attached to larger ones. For example, at the coarse level a tank is the composition of two parts: the main body and the turret; at a ner level, the turret can be described as the composition of its main body, a gun, etc. Such a description allows for matching at coarse levels and is extremely important for ecient recognition. Our proposed parts incorporate an appropriate notion of scale for approximating shape [5]. To illustrate, Figure 3 depicts the parts of three targets, extracted from LADAR images, in scale. Note the similarity that emerges between the two tanks at coarse levels of description. 2. RECOGNITION BASED ON POLYNOMIAL INVARIANTS We now discuss the representation of 2D and 3D object parts by implicit polynomial curves, followed by Bayesian recognition based on vectors of algebraic invariants for the parts. Typically, we use a 4th degree implicit polynomial curve to represent a part. Such a curve is the zero set of the polynomial X = f

i



(i;j ;0

i;j

j

aijk x y ;

and i+j

4)

i.e., the set of points where the polynomial is equal to 0. These polynomials are generalizations of the 2nd f

9

6

11

4 − left rocket front

7

1 − wing

Y

Y F16.9 F16.9.fit

30.00

4

5

F16.1 F16.1.fit

120.00 110.00

25.00

100.00 90.00

20.00

80.00 70.00

15.00

60.00 10.00

12

50.00 40.00

5.00

30.00 20.00

0.00

10.00 -5.00

1 10

0.00 -10.00

-10.00

-20.00

3

-40.00

2 − fuselage nose

-50.00 -60.00

-25.00

2

-70.00 -30.00

-80.00 X -30.00

-20.00

3 − right rocket front

Y

-10.00

0.00

10.00

20.00

-50.00

0.00

50.00

11 + 13 − left missile

Y F16.8 F16.8.fit

30.00

X -100.00

30.00

10 + 12 − right missile

Y F16.3 F16.3.fit

100.00

-30.00

-15.00 -20.00

Y F16.6 F16.6.fit

30.00

rocket rocfit

30.00

90.00 80.00 70.00

25.00

25.00

25.00

20.00

20.00

20.00

60.00 50.00

15.00

15.00

15.00

10.00

10.00

10.00

5.00

5.00

5.00

40.00 30.00 20.00 10.00 0.00

0.00

0.00

0.00

-10.00 -20.00 -30.00

-5.00

-5.00

-5.00

-10.00

-10.00

-10.00

-40.00 -50.00

-15.00

-15.00

-15.00

-20.00

-20.00

-20.00

-25.00

-25.00

-25.00

-60.00 -70.00 -80.00 -90.00 -100.00

-30.00

-30.00

X

-30.00

X

X

X

Figure 4: Top Left: An F16 ghter jet and its parts pro-100.00

-50.00

0.00

50.00

100.00

-30.00

-20.00

-10.00

0.00

10.00

20.00

30.00

-30.00

-20.00

-10.00

0.00

10.00

20.00

30.00

-30.00

-20.00

-10.00

0.00

10.00

20.00

30.00

duced by the partitioning algorithm. Clockwise From Top Right: Bounded ts to a representative group of parts. The best tting polynomial representation is overlaid on the part boundary-curve data, and is so close that it is dicult to distinguish the two. 4th degree polynomials were used except for the bottom right where an 8th degree polynomial was used to obtain a better t than the 4th degree t in the adjacent box. By \bounded ts" we refer to a constrained subset of implicit polynomial curves [6].

degree curves, e.g., circles and ellipses; see [7] for additional details. Since an object to be recognized can be in arbitrary position with respect to the viewing camera, and since the coecients of a polynomial t to data change with translation and either rotation (Euclidean invariants) or general linear transformations (ane invariants) of the data, recognition is based on algebraic invariants which are functions of the polynomial coef cients that do not change with translation and either rotation or linear transformations of the data. Euclidean invariants for an ellipse are the trace and the determinant, which are equivalent to the lengths of the major and minor axes of the ellipse. An example of a Euclidean invariant for a 4th degree polynomial in 2D is 3a213 ?8a04 a22 +2a13 a31 +3a231 ?32a40 a04 ?8a22 a40 , where is the coecient of [8]. Note that the invariants to be used depend on the application. For example, for aerial images of objects on the ground, where the part boundaries undergo a general linear transformation from one view to another, ane invariants are appropriate, unless the camera depression angle is known, in which case only Euclidean invariants are needed. An object part is recognized by comparing a vector of invariants measured for the part with a vector of invariants for each part in the database. Bayesian (minimumprobability of error) recognizers are required because the measured invariants can vary due to noise in the image data and due to missing or additional data along the part boundary due to occlusion and other reasons. Our Bayesian recognizers are based on asymptotic behavior which leads   to: ^) (Z j G )   exp ? 21 (G ? G^ ) G (G ? G where Z is the vector of data points along a part to i

aij

p

x y

k

K

j

k

t

k

;

6 4

2 3

3

4

3

2

5

5

1 − body Y

Y

Y 300.00

250.00

250.00

200.00

200.00

150.00

150.00

100.00

100.00

data 4th_deg_fit

-100.00

-100.00

-150.00

-150.00

90.00

90.00

80.00

80.00

160.00

70.00

70.00

140.00

120.00

120.00

100.00

100.00

-200.00

-250.00

-200.00

-100.00

0.00

100.00

200.00

0.00

-10.00

-10.00

-60.00

-60.00

-80.00

-80.00

-100.00

-100.00 -120.00

-200.00

-100.00

0.00

100.00

200.00

-200.00

-100.00

0.00

100.00

4 − turret front Y

180.00

-60.00 -70.00 -80.00

-90.00

-90.00

X

200.00

-200.00

Y 200.00

180.00

-40.00 -50.00

-100.00

0.00

100.00

X

200.00

-50.00

0.00

data 4th_deg_fit

X

50.00

4 − turret

Y two 200.00

-40.00 -50.00

-80.00

-200.00 X

300.00

-20.00 -30.00

-70.00

-160.00 -180.00

-200.00 -300.00

-20.00 -30.00

-60.00

-140.00

-160.00

X

300.00

10.00

0.00

-20.00 -40.00

-180.00 -300.00 X -300.00

20.00

10.00

0.00

-20.00

-140.00

-250.00

-300.00

30.00

20.00

20.00

-50.00

Y

Y 50.00

50.00

90.00

45.00

45.00

160.00

80.00

80.00

40.00

40.00

140.00

70.00

70.00

35.00

35.00

120.00

120.00

60.00

60.00

30.00

30.00

100.00

100.00

50.00

50.00

25.00

25.00

80.00

40.00

60.00

40.00

15.00

10.00

10.00

0.00

10.00

5.00

0.00

5.00

0.00

0.00

-20.00

-20.00

-10.00

-10.00

-5.00

-5.00

-40.00

-40.00

-20.00

-20.00

-10.00

-10.00

-60.00

-60.00

-30.00

-30.00

-15.00

-15.00

-80.00

-80.00

-40.00

-40.00

-20.00

-20.00

-100.00

-100.00

-50.00

-50.00

-25.00

-25.00

-120.00

-120.00

-60.00

-60.00

-30.00

-30.00

-140.00

-140.00

-160.00

-70.00

-160.00

-180.00

-100.00

0.00

100.00

-200.00

-100.00

0.00

100.00

-100.00

-50.00

0.00

50.00

5 − cannon Y

-100.00

Y data 4th_deg_fit

80.00 70.00

60.00

60.00

50.00

50.00

40.00

40.00

30.00

30.00

20.00

-20.00

-30.00

-30.00

-40.00

-40.00

-50.00

-50.00

-60.00

-40.00

-20.00

0.00

20.00

40.00

60.00

20.00

-20.00

0.00

20.00

40.00

3 − cab top Y

Y 50.00

45.00 40.00

35.00

35.00

35.00

30.00

30.00

30.00

25.00

25.00

25.00

25.00

20.00

20.00

20.00

20.00

15.00

15.00

10.00

15.00

10.00

5.00

10.00

5.00

0.00

5.00

0.00

-5.00

0.00

-5.00

-5.00

-10.00

-10.00

-10.00

-10.00

-15.00

-15.00

-15.00

-15.00

-20.00

-20.00

-20.00

-20.00

-25.00

-25.00

-25.00

-25.00

-30.00

-30.00

-30.00

-30.00

-35.00

-35.00

-35.00

-40.00

-45.00

-20.00

0.00

20.00

40.00

-35.00

-40.00

-45.00

-40.00

-45.00

-50.00

-45.00

-50.00

X -40.00

50.00

-50.00

X -40.00

-20.00

0.00

20.00

data 4th_deg_fit

50.00

35.00 30.00

-40.00

0.00

-40.00

45.00

X -50.00

X

40.00

40.00

0.00

80.00

0.00

45.00

5.00

-80.00

X -60.00

-20.00

40.00

-50.00 -80.00

-40.00

50.00

-5.00

-70.00

-80.00

-50.00 X

100.00

45.00

15.00

-60.00

-70.00

50.00

40.00

10.00

0.00 -10.00

0.00

three

10.00

0.00

-20.00

-50.00

Y 50.00

20.00

10.00

-10.00

-45.00

-50.00 X

100.00

5 − cannon

Y five

80.00 70.00

-40.00

-45.00

-100.00 X

200.00

-35.00

-40.00

-90.00

-100.00 X

200.00

-35.00

-80.00

-90.00

-200.00 X -200.00

-70.00

-80.00

-180.00

-200.00

data 4th_deg_fit

20.00

15.00

20.00

10.00

0.00

20.00

30.00

20.00

20.00

0.00

40.00

30.00

40.00

20.00

50.00

two 100.00

90.00

160.00

80.00

0.00

2 − front wheels

Y 100.00

140.00

60.00

data 4th_deg_fit

40.00

30.00

60.00 40.00

0.00

60.00 50.00

40.00

80.00

60.00 40.00

60.00 50.00

-40.00

-120.00 -200.00

Y

200.00 180.00

160.00 140.00

20.00 0.00 -50.00

Y one

50.00

0.00

6

1 − front

Y 200.00 180.00

80.00

50.00

-50.00

4

1 − body

one 300.00

2

1

1

5

1

X

40.00

-40.00

-20.00

0.00

20.00

X

40.00

-40.00

-20.00

0.00

20.00

40.00

Figure 5: Top Row: Parts produced by the partitioning algorithm for a truck and two tank silhouettes from real LADAR data. Top to Bottom: 4th degree polynomial curve ts to the parts. For each vehicle, the data sets for representative parts are shown on the left, and the representations of the best tting implicit polynomial are superimposed on the data on the right. Unconstrained polynomials were used for all the parts except for the body and turret of the second tank: here bounded polynomials were used. Whereas the unconstrained representations include the data very well but also have spurious pieces, this does not degrade recognition signi cantly [9]. The bounded representations do not have spurious portions, but usually cannot capture as much detail. Parts in lower blocks are not to scale. be recognized, is a constant, G^ is the set of invariants measured from the sensed data, G is the set of invariants for the part in the database and G is a weighting matrix derived from the measured data of the object to be recognized [9]. Recognition is based on the value of for which the probability (Z j G ) is maximum, or equivalently, choosing for which the ^ ) is miniMahalanobis distance (G ? G^ ) G (G ? G mum. K

k

k

tank1 − body: partial data one

tank1 − body Y

Y 100.00

tank1 − body: partial data two Y

100.00

Y 200.00

Y 100.00

200.00

80.00

80.00

180.00

80.00

180.00

60.00

60.00

160.00

60.00

160.00

40.00

40.00

20.00 0.00 -20.00

140.00

20.00

120.00

0.00

100.00

-20.00

40.00

80.00

120.00

0.00

100.00

-40.00

60.00

-40.00

-60.00

40.00

-60.00

-80.00

-80.00

20.00

-80.00

-100.00

-100.00

0.00

-100.00

-120.00

-120.00

-20.00

80.00 60.00 40.00

0.00 -20.00

-140.00

-40.00

-140.00

-40.00

-160.00

-60.00

-160.00

-60.00

-180.00

-180.00

-80.00

-200.00

-220.00

100.00

200.00

300.00

100.00

200.00

300.00

-80.00

-120.00

-240.00

-140.00

-260.00

-180.00

-160.00

-280.00

-200.00

-180.00

-300.00

X 0.00

-100.00

-220.00

-140.00 -160.00

-280.00 -300.00 X 0.00

-200.00

-120.00

-240.00 -260.00

-280.00 -300.00

-180.00

-100.00

-220.00

-240.00 -260.00

-200.00

X -100.00

0.00

100.00

200.00

X 0.00

100.00

200.00

300.00

p

k

k

20.00

-120.00

-140.00 -160.00

-200.00

k

140.00

20.00

-20.00

-40.00 -60.00

th

X -100.00

0.00

100.00

200.00

Figure 6: This Figure illustrates the robustness to occlusion of the tting process. We refer to the boxes from left to right. Box 1: part boundary of the rst tank's body, Figure 5. Box 2: data obtained from the partially occluded part boundaries. Box 3: data in box 2 on which is superimposed the best t polynomial representation for the data. Box 4: data in box 2 but with an additional boundary piece removed to illustrate additional occlusion. This data with its polynomial curve representation superimposed on it is shown in the last box. Observe that in both cases the ts cover the data and capture the global shape of the part.

k

t

k

3. CONCLUSION In a complete recognition system, an object in the database is stored as a set of two-dimensional views, which in turn are stored as a hierarchy of parts in some standard position. In the rst stage of recognition, parts derived from the image data are matched against stored parts in the database by a Bayesian comparison of vector features, i.e., vectors of invariants, thus generating a set of candidate objects with an associated view. Note that some parts found in the data will be due to clutter, and others will be of objects that are not of interest and are not in the database. A role of the Bayesian recognizers is to reject these. In the second

TABLE 1(a) (tank1 vs. tank1 and tank1 vs. tank2) : tank 1 part1 part2 part3 part4 part5 part6 part1 00.00 **.** **.** 23.91 **.** **.** part2 **.** 00.00 22.66 **.** 05.91 29.73 tank 1 part3 **.** 44.93 00.00 **.** 41.15 06.85 part4 10.91 **.** **.** 00.00 **.** **.** part5 **.** 16.53 27.77 **.** 00.00 32.01 part6 **.** 51.88 14.40 **.** 46.63 00.00 tank 2 part1 part2 part3 part4 part5 part1 06.91 23.77 **.** **.** **.** part2 **.** **.** 19.01 42.09 09.96 tank 1 part3 **.** **.** 29.53 45.22 09.39 part4 25.91 05.63 **.** **.** **.** part5 **.** **.** 20.93 18.87 12.93 part6 **.** **.** 37.42 27.17 11.29 TABLE 1(b) (tank1 vs truck) : truck part1 part2 part3 part4 part5 part6 part1 29.11 41.65 43.95 21.39 10.13 49.71 part2 61.93 58.09 65.63 50.93 53.33 40.85 tank 1 part3 43.01 23.71 40.95 49.77 48.31 19.19 part4 33.29 21.53 17.91 35.31 38.13 09.51 part5 59.19 63.01 58.95 55.41 57.31 25.99 part6 53.61 24.89 37.71 60.19 65.30 13.59 TABLE 1(c) (Occluded Data) : tank1 part1 part2 part3 part4 part5 part6 tank1 part1a 02.83 **.** **.** 23.09 **.** **.** part1b 02.92 **.** **.** 21.42 **.** **.** tank2 tank 1 part1a 04.11 20.13 **.** **.** **.** part1b 04.58 22.92 **.** **.** **.** truck tank1 part1a 06.09 34.56 41.05 05.47 04.99 09.16 part1b 19.61 35.77 45.97 11.32 05.13 11.03

Figure 7: An entry in the Table 1 part recognition confu-

sion matrix is a \squared distance" measure between seven Euclidean invariants for a sensed part with seven Euclidean invariants for a part stored in the database. Each part in the database is in xed but arbitrary position; the tank1 sensed parts are translated and rotated by 70 with respect to corresponding ones in the database. This distance measure is the optimal minimum probability of error part recognizer. The left column is the part present in the sensed data. To illustrate, the recognition signi cance of the matrix elements, the probability of the curve data measured for \tank1" part5 under the assumption that it came from \tank2" part5 is roughly proportional to e? 12293 . The last table shows the results for recognition of the body of \tank1" when silhouette data is partially missing due to partial occlusion. The partial data used for recognition is that in boxes 2 and 4 of Figure 6, and are denoted parts 1a and 1b, respectively. Some entrees are missing because the disparity in sizes of a pair of parts was huge and therefore does not require recognition. The part sizes used are those shown in the top three gures of the targets in Figure 5; hence, the tank bodies are roughly the same size. :

stage, the Bayesian engine matches a stored hierarchy of parts from this list of candidate targets against the hierarchy of parts obtained from the data. The match is based on both structural and hierarchical relationships between parts as well as metric information from each part. The result is a recognition system that is robust under occlusion and movement of parts, stable with changes in viewing geometry and noise in the data, and which is suitable for large databases. Acknowledgements This research was supported by NSF grants IRI-9305630, and IRI-9224963. [1] [2] [3] [4] [5] [6]

[7] [8] [9]

4. REFERENCES Pierre Parent and Steven Warren Zucker. Trace inference, curvature consistency and curve detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 11(8):823{839, August 1989. Benjamin B. Kimia, Allen R. Tannenbaum, and Steven W. Zucker. Shapes, shocks, and deformations, I: The components of shape and the reaction-di usion space. International Journal of Computer Vision, To Appear, 1995. Kaleem Siddiqi, Kathryn J. Tresness, and Benjamin B. Kimia. Parts of visual form: Ecological and psychophysical aspects. Technical Report LEMS 104, LEMS, Brown University, June 1992. Kaleem Siddiqi and Benjamin B. Kimia. Parts of visual form: Computational aspects. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(3):239{251, March 1995. Benjamin B. Kimia, Allen R. Tannenbaum, and Steven W. Zucker. Entropy scale-space. In Carlo Arcelli, editor, Visual Form: Analysis and Recognition, pages 333{344, New York, May 1991. Plenum Press. D. Keren, J. Subrahmonia, and D.B. Cooper. Bounded and unbounded implicit polynomial curves and surfaces, mahalanobis distances, and geometric invariants for robust object recognition. In Proceedings of the DARPA Image Understanding Workshop, pages 769{778, January 1992. D. Keren, D. B. Cooper, and J. Subrahmonia. Describing Complicated Objects by Implicit Polynomials. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16:38{ 54, January 1994. D. Keren. Some new invariants in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1143{1149, November 1994. J. Subrahmonia, D.B. Cooper, and D. Keren. Practical reliable Bayesian recognition of 2D and 3D Objects Using Implicit Polynomials and Algebraic Invariants. IEEE Transactions on Pattern Analysis and Machine Intelligence, To Appear, 1995. Also Tech Report LEMS-107, Brown University.