Combined Image Representation using Edgelets ... - Semantic Scholar

73 downloads 0 Views 271KB Size Report
David L. Donohoa and Xiaoming Huob. aStanford University, 370 Serra Mall, ..... 1] a painting \In The Car", 1963, by Roy Lichtenstein;. 2] a pentagon;. 3] a point ...
Combined Image Representation using Edgelets and Wavelets David L. Donohoa and Xiaoming Huob a Stanford University, 370 Serra Mall, Stanford, CA 94305-4065 b Stanford University, 370 Serra Mall, Stanford, CA 94305-4065

ABSTRACT

Recently many new methods of image representation have been proposed, including wavelets, cosine packets, brushlets, edgelets, and ridgelets. Typically each of these is good for a speci c class of features, but not good for others. We propose a method of combining two image representations based on 2-D wavelet transform and edgelet transform. The 2-D wavelet transform is good at capturing point singularities, while the newly proposed edgelet transform is good at capturing linear singularities (edges). Both transforms have fast algorithms for digital images. Wavelets and edgelets together form an overcomplete dictionary. To nd a sparse representation, we have had success by minimizing the objective function: ky ? xk22 + (x), where y is the image,  is the matrix formed from all the basis vectors in both image representations, x is the vector of coecients and (x) is a convex, l1 -like, and separable function of x. Sparsity in coecients is achieved by adding the penalty term on the x: (x). To develop an ecient algorithm to solve this problem, we utilized insights from interior point methods in constrained convex nonlinear optimization and conjugate gradient solvers from computational linear algebra. These methods combine to give a fast algorithm for our problem. Preliminary numerical experiments give promising results. Keywords: Sparsity, Sparse Image Decomposition, Image Representation, Matching Pursuit, Basis Pursuit, Convex Optimization, Relaxation, Atomic Decomposition.

1. INTRODUCTION

Many new methods of signal/image representation have been proposed recently, including wavelets, wavelet packets, cosine packets, brushlets, edgelets, and ridgelets. Typically each of these is good for a speci c class of features, but not good for others. For example, for 1-D signal representation, wavelets are e ective at representing signals made by impulses, where \e ective" means that there are only a few coecients that have large amplitudes, while not e ective at representing oscillatory signals. At the same time, Fourier basis is good at representing oscillatory signals, but not at representing impulses. For 2-D images, 2-D wavelets are e ective at representing point singularities and patches; edgelets are e ective at representing linear singularities.1 Di erent transforms are e ective at representing di erent image features. An image is usually made by several features. Combining several transform, we have more

exibility to achieve sparser representation. In this paper, we consider images made by point and linear singularities. Consequently a combination of 2-D wavelets and edgelets is considered. We focus on two schemes of image representation: wavelets and edgelets. Figure 1 depicts both of them. Wavelets2 were intensively developed in the past decade. A 1-D Daubechies wavelet is supported in a nite interval. A typical wavelet basis has two indexes: scale index j and location index k. The larger the scale index is, the smaller the support is. Moreover, since the scale index always takes integral values, the size of the support for a wavelet function decreases exponentially at rate 2: if the scale index increase by 1, the size (measure) of the support of the corresponding wavelet function is divided by 2. We say that wavelets are e ective at representing signals made by impulses; it is based on the following: [S1] For a xed scale, there are only a few wavelets whose support intersects with the impulse, hence there will be only a few nonzero coecients at this scale; [S2] For a nite-length (say length N ) discrete-time signal, there are no more than log2 N scales; Further author information: (Send correspondence to X. Huo) Xiaoming Huo: E-mail: [email protected]; David L. Donoho: E-mail: [email protected].

Figure 1.

Wavelet (left) and edgelet (right).

[S3] Combining [S1] and [S2], the total number of signi cant coecients is O(log2 N ). Note that being compared with N , O(log2 N ) is negligible, so we say that wavelets are e ective in representing signal made by impulses. 2-D wavelets are tensor products of two 1-D wavelets. For similar reasons, 2-D wavelets are e ective at representing point singularities. The 2-D wavelet transform has fast algorithms: for an N  N image, the computational complexity of the 2-D wavelet transform is O(N 2 ) ops. The edgelet1 transform was developed to represent needle-like features in images. Figure 1 (right) depicts some basis functions of the edgelet transform. Edgelets are 2-D objects taking various scales, locations and orientations. An edgelet system is constructed as follows: [E1] Partition the unit square into dyadic sub-squares. [E2] On each dyadic subsquare, put equally-spaced vertices on the boundary, starting from a corner. The distance between two neighboring vertices is a dyadic value. [E3] For any line segment that connects two vertices gotten in [2] in a subsquare, if it does not coincide with part of the boundary of the subsquare, is an edgelet. [E4] The edgelet system is a collection of all the edgelets described in [3]. An N  N digital image can be considered as a sample of a continuous 2-D function. The value of each pixel is equal to the sample value at a point that is a point in the largest equally-spaced N  N point array embedded in a unit square. It's natural to require that a vertex mentioned in [2] must located at a pixel. The cardinality of an edgelet system has O(N 2 log2 N ). Moreover, for any edgel (line segment connecting a pair of pixels in an image), it takes at most O(log2 N ) edgelets to approximate it within distance 1=N + , where  is a constant. There is a fast algorithm to compute an approximate edgelet transform.1 For an N  N image, the complexity of the fast algorithm is O(N 2 log2 N ). The edgelet basis functions showed in Figure 1 are actually the basis functions of the fast edgelet transform. Combining edgelets and 2-D wavelets, we have an overcomplete system. Because of the over-completeness, for any given image, there is no unique way to decompose the given image. We have developed an algorithm to nd the decomposition that minimizes the l1 norm of the coecients. The minimum l1 norm decomposition possesses some good properties.7 Especially when the given image can be represented sparsely, the minimum l1 norm inclines to get the sparse representation. In numerical experiments, we have used a modi cation of l1 susceptible of optimization by Newton's method. To make sure that the algorithm converges, we use a damped Newton direction. In order to get the Newton direction, we use the conjugate gradient (CG) method to solve a system of linear equations. To speed up the software, we use

truncated CG in the rst few Newton steps. We can do this is because it is not necessary to get the exact Newton direction at early iterations.3,4 This paper is organized as follows. Some issues of methodology are described in Section 2, including fast algorithms. Section 3 presents some simulation results. Finally, we conclude in Section 4,

2. METHODOLOGY 2.1. Literature of Sparse Decomposition

There is by now extensive research on nding sparse decompositions. These methods can be roughly classi ed into three categories: greedy algorithms, global optimization algorithms and special structure. The previously mentioned BP is a global optimization algorithm. A global optimization algorithm searches for a decomposition minimizing a speci ed objective function as long as satisfying some constraints. Another example of global optimization algorithm is method of frame2 (MOF), which is to nd a solution that minimize the l2 norm of x (kxk2 ). A greedy algorithm is a stepwise algorithm: at every step, the greedy algorithm takes one or several elements out of the dictionary into a linear superposition of the objective image. A well known example of greedy algorithm is Matching Pursuit5 (MP). The idea of MP is at every step, add the atom that is the highest correlated with the residual at that time. Some algorithms utilize the special structure of a dictionary. For example, best orthogonal basis6 (BOB) searches for a minimum additive entropy orthogonal basis in a dictionary that has a intrinsic binary tree structure. The dictionary could be cosine packets or wavelet packets. Since a greedy algorithm is a stepwise algorithm, it runs the risk of being trapped in a bad sequence. Some examples are in.7,8 By some numerical experiments, together with some recent advances in theoretical study, a global optimization algorithm like BP is more stable in recovering the original sparse decomposition, if it exists. But BP is a computationally intense method.

2.2. Formulation

We have y = x, where y denote an image,  is a matrix with each column being an atom in a dictionary and x is the coecients vector, x = (x1 ; x2 ; : : : ; xN )T . We consider the following optimization problem: (O1 )

min (x); x

subject to y = x;

P

where  is a convex, separable function: (x) = Ni=1 (xi );  is a convex 1-D function. Inspired by Lagrange multipliers, in order to solve (O1 ), we consider the following problem: (O2 )

min ky ? xk22 + (x); x

where  is a constant. When  goes to 0, the solution to (O2 ) hopefully converges to the solution to (O1 ). This idea is also illustrated in Figure 2. The horizontal axis is the distortion measure ky ? xk22 . The vertical axis is a measure of sparsity, in our case is (x). Without confusion, we call this plane a distortion-sparsity (D-S) plane. If there exists x, such that (u; v) = (ky ? xk22 ; (x)), then we say the point (u; v) on the D-S plane is feasible. Since ky ? xk22 is a quadratic form, when (x) is convex, all the feasible points on the D-S plane form a convex set; we call this convex set a feasible area. For a xed value of distortion, there is a minimum achievable value for (x). If all the points like this form a curve, we call it distortion-sparsity (D-S) curve. Actually, the distortion-sparsity (D-S) curve is the boundary of the feasible area. We choose  as a convex, l1 -like and C 2 function. We choose  to be convex, so that the optimization problem has a global solution. We choose  to be l1 -like function, because [1] as in BP,7 l1 -like penalty function tends to preserve sparsity; [2] l1 function is the best convexi cation of l0 , in the sense that for all the functions on [?1; 1], the pointwise maximum of all the convex functions that are upper-bounded by l0 is the l1 function.

ρ( x )

a Convex Measure of Sparsity

ρ( ^ c)

Feasible Area



_ 1 λ ||y||

2

Distortion ||y- Σ c

i i=1,...,N

Figure 2.

φ i ||

2

The Distortion-Sparsity (D-S) Plane.

We choose  to be C 2 so that (O2 ) is tractable by simple Newton methods. Our choice of  is 1 1 (x; ) = jxj + e?jxj ? ; for > 0;



where is a controlling parameter. When ! +1; (x; ) ! jxj. Note @ (x; ) = @x

 1 ? e?x ;

?1 + ex ;

x  0; x  0;

and @@22x (x; ) = e? jxj. It is easy to verify that  is C 2 . For xed , we have Proposition 2.1. If x b is the solution to (O2 ), then we have

kT (y ? xb)k1  2 :

(1)

A way to interpret the above result is that for the residual r = y ? ^x, T r is its analysis (forward) transform, the maximum amplitude of the analysis transform of the residual kT rk1 is upper bounded by 2 . Hence if  is small and if T is norm preserving|each column of  has almost the same l2 norm|the deviation of the reconstruction based on xb, xb, from the image y is upper bounded by a small quantity (literally 2 ) at each direction given by the columns of . So if we choose a small , the corresponding reconstruction cannot be much di erent from the original image. In the remainder of this paper,  is always pre- xed and we seek a solution to (O2 ).

2.3. Newton Methods

For xed , we use a Newton method to solve (O2 ). A Newton method starts from an initial guess x(0) . At every step, it then generates a new vector which is closer to the true solution: x(i+1) = x(i) + i n(x(i) );

i = 0; 1; : : : ;

where i is a damping parameter ( i is chosen by line search to make sure that the value of the objective function is reduced), n(x(i) ) is the Newton direction which is a function of current guess x(i) . Let g(x) denote the objective function in (O2 ). g(x) = ky ? xk22 + (x). The gradient and Hessian of g(x) (denoted by rf (x) and Hf (x)

respectively) are

rg(x) = g (x)

H

=

0 0(x ) 1 ?2T y + 2T x +  B @ ... CA ; 0 0 00 (x )  (xn ) 1 CA : ... 2T  +  B @ 1

1

00 (xn )

(2) (3)

The Newton direction n(x(i) ) satis es [Hg(x)]  n(x(i) ) = ?rg(x):

(4)

2.4. Conjugate Gradient Solver and Fast Algorithms

We use an iterative method to solve (4). We only need to know a fast algorithm to multiply with the Hessian matrix T Hg (x). From (3), we only need to know a fast algorithm to multiply a vector with matrix  and  . In our case,  = [T1 ; T2], where T1 and T2 are transform matrices of 2-D wavelets and edgelets. In discrete cases, both wavelet transform T1, edgelet transform T2 and their adjoints T1T and T2T have fast algorithms. Hence we know fast algorithm to multiply a vector with  and T . Hence we know fast algorithm to multiply a vector with the Hessian. The fast algorithm for 2-D wavelet transform is well known. The fast algorithm for edgelet transform is based on the idea of the Fourier slice theorem. In order to do the fast edgelet transform, we rst implement a 2-D Fourier transform, then deploy a discrete Cartesian-to-polar coordinate transform, then carry an inverse 1-D Fourier transform along the direction of the radial coordinate. Note FFT brings a fast algorithm for both 2-D Fourier transform and inverse 1-D Fourier transform. There is also a fast algorithm for the discrete Cartesian to polar coordinate transform by using the idea of fast fractional Fourier transform.9,10

2.5. Homotopy

The method described in Section 2.3 is for a xed . We observe that when is small, it takes small number of conjugate gradient (CG) iterations to solve the linear system in (4); when is large, it takes large number of CG iterations. This motivates us to apply the idea from homotopy. We start with a small 0 . After using the Newton method to nd the minimizer x0 of (O2 ), we choose another large 1 , and again use the Newton method to nd the minimizer x1 of (O2 ), but this time we take x0 as initial guess. We choose 1 > 0 . We continue with a sequence

2 ; 3 ; : : : and a sequence x2 ; x3 ; : : : . We terminate the algorithm when the di erence between two subsequent solution xi and xi+1 is small enough. It's not hard to see that sequence x0 ; x1 ; x2 ; x3 ; : : : converges to the solution to (O1 ) when i ! +1, as i ! +1.

2.6. Numerical Issues

Damped Newton direction. To make sure that the Newton method converge, we implement a binary backtracking scheme. It guarantees that at every iteration, the value of the objective function is reduced. Truncated CG. In order to save computing time, in step [2] of the Algorithm A, at some early Newton iterations, we terminate the CG solver before it reaches high precision. The reason we can do this is because an inexact Newton direction does not hurt the precision of the nal solution by Newton method. The algorithm is implemented in Matlab. Fast algorithms of wavelet transform and edgelet transform are implemented in C, and called by Matlab through a Matlab MEX C interface.

10

5

5

5

20

10

10

10

30

15

15

15

40

20

20

20

50

25

25

25

60

30

30

10

20

30

40

50

Car: Lichtenstein

60

5

10

15

20

25

30

Pentagon

Figure 3.

30 5

10

15

20

25

30

Overlapped singularities

5

10

15

20

25

30

Separate singularities

Four test images.

3. SIMULATIONS

Images. We test our algorithm on four images:

[1] [2] [3] [4]

a painting \In The Car", 1963, by Roy Lichtenstein; a pentagon; a point singularity and an overlapped line singularity; a point singularity and a separate line singularity.

They were chosen because [1] they have the features that we want to test on for our dictionary. Our dictionary is made by 2-D wavelets and edgelets. These features are point singularities and linear singularities. We believe these features will illustrate key components of our approach. [2] They are simple enough to exam if our basic assumption|di erent transforms will automatically represent the corresponding features in an sparse image decomposition|is true. Figure 3 shows the four images. Decomposition. Figure 4 shows decompositions of four images. Each row is for one image. On every row, the rst square sub-image from left is the original image; the second one is the part of the image that is represented by wavelets (or in other words the inverse 2-D wavelet transform of wavelet coecients in the nal solution); the third one is the part of the image that is represented by edgelets; the last one (or the most right one) is the superposition of the second one and the third one. On the rst row, for the painting, we observe that the part associated with the wavelet transform captures pretty well the point features and patch features in the painting image, while the part associated with the edgelet transform captures a lot of linear features. The same pattern are repeated in the third and fourth rows, but is less obvious in the second row. We speculate when the size of the image is large, this phenomenon will become more signi cant. Decay of coecients. Figure 5 shows the decaying of the amplitudes of coecients for three methods and for four images. The three methods are [1] A wavelet-only dictionary, which is equivalent to the wavelet transform; [2] An edgelet-only dictionary; [3] A dictionary containing both wavelets and edgelets.

10 20 30 40 50 60 50

100 150 Original; Wavelet+Edgelet=both

200

250

5 10 15 20 25 30 20

40

60 80 Original; Wavelet+Edgelet=both

100

120

20

40

60 80 Original; Wavelet+Edgelet=both

100

120

20

40

60 80 Original; Wavelet+Edgelet=both

100

120

5 10 15 20 25 30

5 10 15 20 25 30

Figure 4.

Decompositions.

0

0

10

0

10

0

10

−1

10

−1

10

−1

10

10

−1

−2

10

log10(|c|(i))

log10(|c|(i))

log10(|c|(i))

log10(|c|(i))

10

−2

10

−2

10

−2

10

−3

−3

10

−3

10

−4

0

100

200

300

400

500

10

−3

10

10

−4

0

100

200

i

300

400

10

500

−4

0

100

200

i

300

400

500

10

0

100

200

i

300

400

500

i

Decaying amplitudes of coecients. Dashed lines \- - -" are for the DCT only dictionary; solid lines are for our combined wavelets+edgelets dictionary; dash and dotted lines \-  -  -" are for the wavelet only dictionary. From left to right, they are for images: (1) painting by Roy Lichtenstein, (2) pentagon, (3) overlapped singularities and (4) separate singularities.

Figure 5.

0

0

10

0

10

0

10

−1

10

−1

10

−1

10

10

−1

10

−2

−2

10

greedy

−3

10

−3

10 −2

10

global −4

−4

10

−3

200

400

600

800

10

1000

−4

10

−5

0

global

greedy

global

global

10

greedy

−3

10

greedy

10

−2

10

10

−5

0

200

400

600

800

1000

10

−5

0

200

400

600

800

1000

10

0

200

400

600

800

1000

A global algorithm vs. Matching Pursuit. The solid lines are for the Matching Pursuit algorithm, the dashed lines are for the global algorithm (minimizing the l1 norm). Both are for the dictionary made by both wavelets and edgelets. The four images are (from the left to the right) (1) painting by Roy Lichtenstein, (2) pentagon, (3) overlapped singularities and (4) separate singularities.

Figure 6.

The horizontal axes are the order indexes of the amplitudes of the coecients. The amplitudes are sorted from the largest to the smallest. The vertical axes are the logarithms of the amplitudes. We observe that for these four images, the combined strategy gives the sparsest decomposition except for the second image. The set of DCT coecients is the least sparse one. Comparison with the MP. Figure 6 shows the comparisons of our method with the Matching Pursuit algorithm. For all the four images, we observe that the global algorithm tends to give sparse representations. If the singularities are separated, the two methods do not have much di erence. When an image contains overlapping singularities, our global algorithm tends to provide sparser decompositions, like in the image of the painting, the image of a pentagon and the image containing separated singularities.

4. CONCLUSION

We have explored two ideas: [1] to achieve a sparse image representation via combining several image representation schemes, and [2] to use minimum l1 norm solution to nd a sparse decomposition in an over-complete system. We modi ed the l1 norm slightly to enable a simpler algorithm. The current experimental results are promising. We have obtained sparser image representations than some current well-known methods.

APPENDIX A. PROOFS

A.1. Proof of Proposition 2.1 When xb is the solution to (O ), the rst order condition is 2

0 = r(x) + 2T x ? 2T y;

where r(x) is the gradient vector of (x). Hence 1 r(x) = T (y ? x) : (5) 2 P Recall (x) = Ni=1 (xi ). It's easy to verify that 8xi ; j0 (xi )j  1. Hence for the left hand side of (5), we have k 21 r(x)k1  2 . The (1) follows. 2

ACKNOWLEDGMENTS

The authors want to thanks Stephen Boyd and Michael Saunders for helpful discussion. Michael Saunders helps to develop the software. This project was partially supported by NSF ECS-9707111.

REFERENCES

1. D. L. Donoho, \Fast edgelet transforms and applications." Manuscript, September 1998. 2. I. Daubechies, Ten lectures on wavelets, SIAM, 1992. 3. R. S. Dembo, S. C. Eisenstat, and T. Steihaug, \Inexact newton methods," SIAM J. Numer. Anal. 19(2), pp. 400{8, 1982. 4. R. S. Dembo and T. Steihaug, \Truncated newton algorithms for large-scale unconstrained optimization," Math. Programming 26(2), pp. 190{212, 1983. 5. S. Mallat and Z. Zhang, \Matching pursuits with time-frequency dictionaries," IEEE Transactions on Signal Processing 41, pp. 3397{415, December 1993. 6. M. V. Wickerhauser, Adapted wavelet analysis from theory to software, Wellesley, 1994. 7. S. S. Chen, D. L. Donoho, and M. A. Saunders, \Atomic decomposition by basis pursuit," SIAM Journal of Scienti c Computing 20(1), pp. 33{61, 1999. electronic. 8. R. DeVore and V. Temlyakov, \Some remarks on greedy algorithms," Advances in Computational Mathematics 5(2-3), pp. 173{87, 1996. 9. D. L. Donoho, Digital ridgelet transform via digital polar coordinate transform. Stanford University, 1998. 10. D. H. Bailey and P. N. Swarztrauber, \The fractional fourier transform and applications," SIAM Rev. 33(3), pp. 389{404, 1991. 11. G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, 3rd ed., 1996. 12. S. Jaggi, W. Karl, S. Mallat, and A. Willsky, \High-resolution pursuit for feature extraction," Applied and Computational Harmonic Analysis 5, pp. 428{49, October 1998. 13. D. G. Luenberger, Introduction to linear and nonlinear programming, Addision-Wesley, 1973.