LEARNING CORNER ORIENTATION USING CANONICAL ...

2 downloads 0 Views 260KB Size Report
This paper shows how canonical correlation can be used to learn a detector for corner orientation invariant to corner an- gle and intensity. Pairs of images with ...
LEARNING CORNER ORIENTATION USING CANONICAL CORRELATION B. Johansson ∗

M. Borga, H. Knutsson

Computer Vision Laboratory Dept. of Electrical Engineering Link¨oping University [email protected]

Medical Informatics Dept. of Biomedical Engineering Link¨oping University {knutte,magnus}@imt.liu.se

ABSTRACT This paper shows how canonical correlation can be used to learn a detector for corner orientation invariant to corner angle and intensity. Pairs of images with the same corner orientation but different angle and intensity are used as training samples. Three different image representations; intensity values, products between intensity values, and local orientation are examined. The last representation gives a well behaved result that is easy to decode into the corner orientation. To reduce dimensionality, parameters from a polynomial model fitted on the different representations is also considered. This reduction did not affect the performance of the system.

1. INTRODUCTION It is often difficult to design detectors for complex features analytically. An alternative approach is to have a system learn a feature detector from a set of training data. One method based on canonical correlation is described in [1], [2], [3]. where a system learns a detector for local orientation invariant to signal phase by showing the system pairs of sinusoidal patterns that have the same orientation but different phase. Products between pixel values was used as input samples and the system learned linear combinations of the sample components which in a simple manner was decoded into the orientation angle. It turned out that the linear combinations could be interpreted as quadrature or Gabor filters. This paper shows that the same technique can be used to learn a descriptor of corner orientation which is invariant to corner angle and intensity. Three input representations are examined: intensity values, products between intensity values, and local orientation in double angle representation. The dimensionality of the input data can be quite large, especially if we use products between intensity values. Therefore, to reduce the dimensionality, parameters from a polynomial expansion model on the respective representations is also explored as input data. ∗ This work was supported by the Foundation for Strategic Research, project VISIT - VISual Information Technology.

2. CANONICAL CORRELATION Assume that we have two stochastic variables x ∈ CM1 , y ∈ CM2 (M1 and M2 do not have to be equal). Canonical correlation analysis, CCA, can be defined as the problem of finding two sets of basis vectors, one for x and the other for y, such that the correlations between the projections of the variables onto these basis vectors are mutually maximized. For the case of only one pair of basis vectors we have the ˆ y∗ y (∗ denotes conjugate ˆ x∗ x and y = w projections x = w transpose) and the correlation is written as w∗ Cxy wy E[xy] =p ∗ x ρ= p wx Cxx wx wy∗ Cyy wy E[x2 ]E[y 2 ]

(1)

where Cxy = E[xy∗ ], Cxx = E[xx∗ ], Cyy = E[yy∗ ]. It can be shown that the maximal canonical correlation can be found by solving an eigenvalue system [1]. The ˆ y1 are the projections that have the ˆ x1 , w first eigenvectors w highest correlation ρ1 . The next two eigenvectors have the second highest correlation and so on. It can also be shown that the different projections are uncorrelated. 3. EXPERIMENT SETUP In the experiments we have N pairs of training samples, (n) (n) (Ix , Iy ), see figure 1 for examples. Each pair has the same corner orientation but differ in other properties. The orientation varies between 0◦ and 360◦ with a resolution of 5◦ , giving a total of 72 values. The corner angle varies between 60◦ and 120◦ with a resolution of 5◦ , giving a total of 13 values. When using the intensity value representation it is not possible to learn corner intensity invariance, the representation is not descriptive enough. Therefore, in this case we will not vary the intensity. For the other representations both the corner angle and corner intensity varies as in figure 1. The training pairs consists of all combinations of the images above that have the same corner orientation but differ in corner angle. This gives a total of N = 72 × 132 = 12168 pairs. Gaussian noise was finally added. The noise actually helps the learning algorithm to find more smooth and robust vectors wxk . This is not surprising since the algorithm finds projections which

10 pairs of training samples without noise (72)

Ix

(72)

Iy

Ix

Iy

(389)

Ix

(878)

Ix

(389)

Iy

(2157)

Ix

(878)

Iy

(2407)

Ix

(2157)

Iy

(3756)

Ix

(2407)

Iy

(4319)

Ix

(3756)

Iy

(9957)

Ix

(4319)

Iy

(10284)

Ix

(9957)

Iy

(11201)

(10284)

Iy

(11201)

Same examples with noise added, PSNR = 10 dB (72)

Ix

(72)

Iy

Ix

Iy

(389)

Ix

(878)

Ix

(389)

Iy

(2157)

Ix

(878)

Iy

(2407)

Ix

(2157)

Iy

(3756)

Ix

(2407)

Iy

(4319)

Ix

(3756)

Iy

(9957)

Ix

(4319)

Iy

(10284)

Ix

(9957)

Iy

(11201)

(10284)

Iy

(11201)

ˆ xk can simply be In this case the resulting CCA-vectors w interpreted as linear filters on the gray-level image. Products between intensity values, I × I Products between intensity values can be generated with the outer product, ix iTx . The input to the learning algorithm is then this outer product reshape into an M 4 × 1 vector, ( (n) (n)T x(n) = vec(ix ix ) (3) (n) (n)T y(n) = vec(iy iy ) (In practice the dimensionality is reduced to M 2 (M 2 +1)/2 (n) (n)T due to the symmetry property of ix ix ). The projection of x onto a resulting CCA-vector can then be written as T x = iTx Wxk ix wxk

Corresponding local orientation (double angle repr.) (72)

Zx

(389) Zx

Z(878)

Z(2157)

Z(2407)

Z(3756)

Z(72)

Zy

(389)

Z(878)

Z(2157)

Z(2407)

Zy

y

x

y

x

y

x

y

x

(3756)

Z(4319) x

(4319)

Zy

Z(9957)

Z(10284)

Z(11201)

(9957)

Z(10284)

Zy

x

Zy

x

y

x

(11201)

Fig. 1. Examples of pairs of training samples for the experiments. are invariant to the noise (because the noise is not a common property) which implicates a low-pass characteristic of the projections. For the gray-level experiments an image size of 5 × 5 was used. For the local orientation experiments an image size of 9 × 9 was used. The gradient (Ix , Iy ) was then computed using 5 × 5 differentiated Gaussian filters with σ = 1.2. The double angle representation is then computed as z = (Ix + iIy )2 . This representation has an argument that is double the local orientation angle. Z will therefore become invariant to edge sign (i.e. a positive or a negative edge with the same orientation), which in turn means that the representation is more invariant to intensity. The border values were finally removed in order to avoid border effects, leaving a 5 × 5 local orientation image. 4. REPRESENTATIONS AND INTERPRETATIONS As input to the CCA we have N pairs of training samples, (x(n) , y(n) ), in form of one of the representations mentioned before. This section discusses some details regarding these representations and how to interpret the resulting ˆ xk for the different representations. CCA-vectors w Intensity values, I (n) Let Ix be an M ×M image corresponding to one train(n) ing sample. Let ix denote the same image after reshaping (n) (n) it into an M 2 × 1 vector, i.e. i(n) = vec(Ix ). ix will be used as input samples to the system: ( (n) (n) x(n) = ix = vec(Ix ) (2) (n) (n) y(n) = iy = vec(Iy )

(4)

where Wxk is wxk reshaped into a M 2 /M 2 matrix. We can use the eigensystem of Wxk and write   X X T ˆkj e ˆTkj  ix = ˆkj )2 (5) x = iTx  λkj e λkj (iTx e wxk j

j

Hence, the projection can be computed as projections of the ˆkj followed by the weighted sum of squares. image ix onto e ˆkj can be viewed as linear filters on the image. Note that e If we can remove some of the terms in the sum we can save a lot of computations. It would be tempting to keep the terms corresponding to the largest eigenvalues but it turns T 2 ekj ix ) ] = |λkj |E[(ˆ eTkj ix )2 ] is a more relout that E[λkj (ˆ evant significance measure since it measures the average enˆkj . The significance meaergy in the subspace defined by e sure coincides with the eigenvalues in the experiments in this paper, but this is not always the case. Local orientation, Z In this case the double angle representation is used as input samples, ( (n) x(n) = vec(zx ) (6) (n) y(n) = vec(zy ) and the resulting CCA-vectors can be interpreted as linear filters on the Z image. Note that in this case both the training samples and the CCA-vector will be complex valued. Polynomial model To reduce dimensionality we model the intensity and local orientation with a second degree polynomial: f (u, v) ∼ r1 + r2 u + r3 v + r4 u2 + r5 v 2 + r6 uv

(7)

The parameter vector rf = (r1 , ..., r6 )T is found from a weighted least square problem as rf = Af where the matrix A is a function of the polynomial basis functions, see [4]. Note that a projection on rf can be transformed to a projection on f , as wr∗f rf = (AT wrf )∗ f . rI , rI × rI , and rZ is then used as representations instead of I, I × I, and Z.

rI

I

T 2 | E[ λkl ( ekl ix) ] |

λ

kl

λ ( eT i )2 + λ ( eT i )2 k1

k=1

1

1

0

0.5 10

15

20

2

1 0.5 10

20

30

40

15

20

5

10

15

20

xk

xk

0

0 0 0 0

2

3

4

5

6

0 1

0 25

0 1

25

0 Orientation

360°

Fig. 4. Results from the I × I experiment.

wT i

|DFT( w )|

x

0

0

Fig. 2. Resulting ρk for the different experiments. w

k2

0

0

0 1

25

0

k=5

1 0.5 10

6

rZ

1

k2

0

0

50

0.5 5

5

0

Z 0

4

rI × rI

1

0

3

k=3

I × I (only ρ1 -ρ50 is shown) 0.5

x

0

0 1

25

k=2

5

k=4

0

k=6

0.5

k1

0

xk x

ek1

°

0

| DFT( ek1) |

ek2

| DFT( ek2) |

°

Orientation

360

Fig. 3. Results from the I experiment. 5. RESULTS The resulting canonical correlations ρk for the different representations are shown in figure 2. In all cases we get about 4-8 large correlations and the remaining ones are significantly smaller. The absolute values are not critical as long as they are fairly high since they depend on the noise added. I case Figure 3 shows the six first CCA-vectors and the projection of the noise-free data (with the same intensity) onto the vectors. Note that there are several curves, corresponding to different corner angles. The first two CCA-vectors are simply orthogonal edge-filters and the projections vary as sinusoidal functions with 90◦ phase difference. The next two are sensitive to the double angle of the orientation. I × I case Figure 4 shows the eigenvalues λkj and significance measures for the six first CCA-vectors in the I × I experiment. ˆkl were significant, and the figure Only two eigenvectors e also shows the projection of the noise-free data onto these vectors. Figure 5 shows the vectors and their corresponding Fourier transforms. If closely examined they can be interpreted as local edge filters! Z case Figure 6 shows the five first CCA-vectors and their projections onto noise-free data (argument and magnitude are

Fig. 5. Results from the I × I experiment. shown). They can actually be interpreted as rotational symmetry filters which are well known to detect complex curvature, see e.g. [5]. Optimal patterns for these filters can be derived [6]. A prototype pattern for each filter is shown in figure 7. All patterns that can be decribed as rotations or parts of the prototype pattern (e.g. one of the trajectories) are also detected by the corresponding filter. Polynomial cases The resulting CCA-vectors for the polynomial experiments when transformed to linear filters on the corresponding original representation turned out to be very similar to the result for the original representations. They are therefore not shown in this paper due to lack of space. 6. DECODING CORNER ORIENTATION The corner orientation angle can be decoded from the pro∗ x depending on representation: jections wxk I case In this case we can simply take the angle of the vector T T ix , wx2 ix ). The left column in figure 8 shows this esti(wx1 mate as function of true value for noise-free data (the offset is not important since the system is unaware of the orientation reference value). The estimate is even more invariant to corner angle than the projections alone. However, as said before the projections are not invariant to intensity and this

* k

| w* z |

arg( w z )

w

k

°

x

k

2

180

180°





°

1

−180° ° 180

0 2



1

−180° 180°

0 2

180°





0

1

−180° 180°

−180°

−180

0 2



1

−180° 180°

0 2



1

0

−180° ° 0

Orientation

360°

0 0°

−180° ° 0

wk2 ∼ e−2iϕ

wk3 ∼ e0iϕ

°

°

180

360

−180° ° 0

°

°

180

360

Angular error on noisy data std = 19.5◦ std = 13.8◦ 180

200

400

600

800

1000

200

400

600

800

1000

Fig. 8. Left column: I case, decoding function T T ix , wx1 ix ). Right column: Z case, decoding angle(wx2 ∗ ∗ zx ) function arg(wx2 zx /wx1

Orientation

360°

Fig. 6. Results from the Z experiment. wk1 ∼ e−1iϕ

Noise-free data

180°

x

wk4 ∼ e1iϕ

wk5 ∼ e3iϕ

Fig. 7. Z result interpreted as rotational symmetries. simple decoding function will fail when the intensity varies. As an evaluation measure 1000 noisy images with random corner orientation, angle and intensity a was used. The angular error was computed and the mean angular error was removed. Finally the standard deviation of the error was computed. The result is also shown in figure 8. I × I case The projections do not behave as nice as in the previous case and they are therefore more difficult to decode. But the projections are fairly invariant to corner angle and intensity and it should therefore in theory be possible to find a decoding function. This is not further investigated in this paper though. Z case Since the first and second projections are sensitive to the third and fourth power of the orientation respectively we can decode the projections into a corner orientation angle ∗ ∗ zx )/(wk1 zx ). by taking the argument of the quotient (wk2 The magnitudes of the projections can be used as a certainty measure. The evaluation of this decoding function is shown in the right column in figure 8. Another decoding function could be to use the phase of the fourth projection, wk4 , since this is approximately the identity mapping. But the result would be less accurate, as can be inferred from the projection in figure 6. 7. DISCUSSION It has often been argued, partly motivated by biological vision systems, that local orientation information should be used to detect more complex features. The results in the

I × I and Z experiment furher motivate this idea. Note that the I × I and Z representations are closely related since the double angle Z is calculated from products between image gradient components. It may be possible to use the result from the quadratic model experiments but the local orientation helps the system to learn a more well behaved representation which is easier to decode. It may be possible to use the same technique to learn other features and invariances. One drawback can be the amount of necessary training data. Preliminary experiments shows that by using the polynomial model the number of training pairs can be less than if we use the image or local orientation directly. This is because the number of training samples is generally proportional to the number of input parameters. 8. REFERENCES [1] M. Borga, Learning Multidimensional Signal Processing, Ph.D. thesis, Link¨oping University, Sweden, SE581 83 Link¨oping, Sweden, 1998, Dissertation No 531, ISBN 91-7219-202-X. [2] H. Knutsson and M. Borga, “Learning Visual Operators from Examples: A New Paradigm in Image Processing,” in Proc. of ICIAP’99, Invited Paper. [3] M. Borga and H. Knutsson, “Finding Efficient Nonlinear Visual Operators using Canonical Correlation Analysis,” in Proc. of SSAB-2000, Halmstad, pp. 13–16. [4] G. Farneb¨ack, “Spatial Domain Methods for Orientation and Velocity Estimation,” Lic. Thesis LiU-TekLic-1999:13, Dept. EE, Link¨oping University, SE-581 83 Link¨oping, Sweden, 1999, Thesis No. 755, ISBN 91-7219-441-3. [5] B. Johansson and G. Granlund, “Fast Selective Detection of Rotational Symmetries using Normalized Inhibition,” in Proc. of ECCV-2000, vol. I, pp. 871–887. [6] B. Johansson, “Backprojection of Some Image Symmetries Based on a Local Orientation Description,” Report LiTH-ISY-R-2311, Dept. EE, Link¨oping University, SE-581 83 Link¨oping, Sweden, October 2000.