Globally Convergent Edge-preserving Regularized ... - Semantic Scholar

0 downloads 0 Views 622KB Size Report
Publisher Item Identifier S 1057-7149(98)01008-2. projection onto .... Geman and McClure [41] have suggested using. , where ...... Addison-Wesley, 1984.
204

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

Globally Convergent Edge-Preserving Regularized Reconstruction: An Application to Limited-Angle Tomography Alexander H. Delaney and Yoram Bresler, Senior Member, IEEE Abstract—We introduce a generalization of a recently proposed deterministic relaxation algorithm for edge-preserving regularization in linear inverse problems. This recently proposed algorithm transforms the original (possibly nonconvex) optimization problem into a sequence of quadratic optimization problems, and has been shown to converge under certain conditions when the original cost functional being minimized is strictly convex. We prove that our more general algorithm is globally convergent (i.e., converges to a local minimum from any initialization) under less restrictive conditions, even when the original cost functional is nonconvex. We apply this algorithm to tomographic reconstruction from limited-angle data by formulating the problem as one of regularized least-squares optimization. The results demonstrate that the constraint of piecewise smoothness, applied through the use of edge-preserving regularization, can provide excellent limited-angle tomographic reconstructions. Two edgepreserving regularizers—one convex, the other nonconvex—are used in numerous simulations to demonstrate the effectiveness of the algorithm under various limited-angle scenarios, and to explore how factors, such as choice of error norm, angular sampling rate and amount of noise, affect reconstruction quality and algorithm performance. These simulation results show that for this application, the nonconvex regularizer produces consistently superior results. Index Terms— Bayesian reconstruction, electron microscopy, inverse problems, nonlinear regularization, nonconvex optimization.

I. INTRODUCTION

I

N MANY applications of computerized tomography, it is not possible to collect projection data over a complete angular range of 180 . Examples include electron microscopy, astronomy, geophysical exploration, nondestructive evaluation, and many others [1]. In such limited-angle cases, applying a standard full-data reconstruction algorithm, such as filtered backprojection (FBP) [2], results in poor reconstructions with severe artifacts. Because of the importance of the limited-angle problem, many specialized algorithms have been introduced over the past two decades. Approaches have included orthogonal expansion [3], [4], sinogram restoration [5], [6], maximum entropy techniques [7]–[9], Bayesian methods [10], Manuscript received September 23, 1994; revised May 7, 1997. This work was supported in part by National Science Foundation Grant MIP 91-57377. The associate editor coordinting the review of this manuscript and approving it for publication was Prof. Ken D. Sauer. A. H. Delaney is a consultant, Sunnyvale, CA 94087 USA (e-mail: [email protected]). Y. Bresler is with the Coordinated Science Laboratory and the Department of Electrical and Computer Engineering, University of Illinois at UrbanaChampaign, Urbana, IL 61801 USA (e-mail: [email protected]). Publisher Item Identifier S 1057-7149(98)01008-2.

projection onto convex sets (POCS) [11]–[13], Fourier techniques [14]–[18], algebraic reconstruction techniques (ART) [19], iterative reprojection methods [20], and many others. A thorough list of earlier references can be found in [21], and comparisons of some more recent algorithms and further references can be found in [22], and [23]. In general, iterative algorithms that incorporate a priori knowledge about the solution have displayed the best results. This is not surprising because the limited-angle-tomography problem is inherently ill posed [24], and some type of a priori information must be introduced to regularize the problem. To date, however, most algorithms have produced disappointing results from limited-angle data. In our view, these poor results can be attributed mainly to the inadequate or inappropriate constraints that have been used to regularize the problem. Because the limited-angle problem is extremely underdetermined, these constraints will have a great effect on the quality of reconstruction. Typical constraints, such as known spatial support, maximum energy, minimum and maximum amplitudes, and maximum entropy, although widely applicable, are, unfortunately, not restrictive enough to resolve the indeterminacy inherent in the limited-angle problem. On the other hand, constraints such as global smoothness and closeness to a reference image can be quite effective for certain problems, but have limited applicability. In this paper, we show that the constraint of piecewise smoothness, which is applicable to most real images, can produce excellent reconstructions from limited-angle data. We formulate the tomography problem as a regularized weighted least-squares optimization problem, and propose a new family of regularization functionals that are meant to apply a constraint of piecewise smoothness on the image. These functionals typically penalize a (possibly nonconvex) function of the differences between neighboring pixels. Although edgepreserving regularization has been used successfully in applications other than tomography (e.g., see [25]–[27]), as well as in noisy, full-angle tomography such as positron emission tomography (PET) and SPECT [28]–[30], amazingly, it has not been applied to the limited-angle problem. We show that edge-preserving regularization does more than simply produce reconstructions with sharp edges—it reduces or eliminates many of the artifacts that are seen in typical limited-angle reconstructions. To perform the minimization of such edge-preserving regularized cost functionals, we present (see also [31]) a de-

1057–7149/98$10.00  1998 IEEE

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

terministic relaxation algorithm, which is a generalization of one recently proposed by Charbonnier et al. [32], [33]. This algorithm converts the original (possibly nonconvex) optimization problem into a sequence of convex (or even quadratic) optimization problems, and has produced good results in our simulations. As usual, convergence is a key theoretical issue for these iterative algorithms. Various types of convergence results are possible, including the following: 1) local convergence—of the iterates to a local minimum of the cost function, provided they are initialized close enough to the local minimum; 2) convergence of the cost function (but not necessarily the iterates) from any initialization; 3) global convergence, defined as convergence of the iterates to a local minimum of the cost function from any initialization; 4) global convergence to a global minimum, from any initialization. The strongest mode of convergence, 4, is, of course, the most desirable, but it is usually impossible for multimodal functions. In such cases, the best one can hope for is 3, global convergence. Aubert et al. [34] have performed a theoretical study of Charbonnier’s algorithm, and have proved convergence of the cost function, 2, but have not proven that the algorithm is globally convergent when the original cost function is not strictly convex. In addition, their convergence results do not apply to underdetermined problems, such as the limitedangle-tomography problem. In this paper, we prove that our more general algorithm is globally convergent even when the cost function is nonconvex and the original problem is underdetermined, provided the minima are isolated. When the minima are not isolated (i.e., the cost function has flat regions), we prove several somewhat weaker but still very useful results, including so called global subsequential convergence. We also prove that, regardless of the existence of flat regions in the cost, every isolated local minimum has a “basin of attraction,” such that iterates initialized in this basin will converge to the local minimum. This can be valuable if heuristics or prior information are used for initialization near the global minimum. For the case of strictly convex regularization, we show that in underdetermined problems the cost function need not be strictly convex, but can be made so by appropriate choice of the edge-preserving penalty. Finally, by describing how the algorithm can be viewed as a form of adaptive regularization, we motivate the use of a heuristic enhancement, which experimentally improves reconstruction quality and initial convergence for nonconvex regularizers. The algorithm and the various theoretical results are applicable to a broad class of inverse problems. The limited angle tomography problem presents a challenging test case for the algorithm, and is also of independent interest owing to its importance. Numerous simulation results performed under various limited-angle scenarios are used to compare the effectiveness of the proposed regularization functionals and to explore some of the factors that affect

205

algorithm performance and reconstruction quality. Although convex regularizers have been recommended on various theoretical grounds (e.g., [30]), our numerical results show that in this application a nonconvex regularizer produces consistently superior results for the particular convex penalty used here. This motivates our attention to the nonconvex case. This paper is organized as follows. In Section II, a seriesexpansion approach is used to formulate the tomography problem in terms of a regularized least-squares optimization problem. In Section III, we describe the deterministic relaxation algorithm that is used to perform the optimization, and provide our convergence results. Extensive numerical results are presented in Section IV, and conclusions are presented in Section V. II. REGULARIZED LEAST-SQUARES RECONSTRUCTION Our goal is to reconstruct an image that is an from a set of noisy, sampled proestimate of jections of collected at a finite number of view angles, , . Using a series-expansion approach [35], as a linear combination of we represent the estimate image shifted versions of a generating function , (1) where

is a vector of image coefficients, with . For example, a common choice for has been the square pixel. The set of sampled projections is then modeled as , where the projection operator is derived according to the particular projection geometry being used (e.g., see [36] and [37]). To find , we seek a solution to the following regularized weighted least-squares optimization problem: (2)

where and are the th regularization functional and regularization parameter, respectively, and is a weighted norm. The first term in (2) (the residual error), controls how “faithful” the reconstruction is to the data, whereas the second term (the regularization) controls how well the reconstruction matches our a a priori knowledge of the solution, with controlling the relative contribution of the two. Note that for Gaussian measurement noise, the solution to (2) corresponds to a maximum a posteriori (MAP) estimate when is the standard weighted Euclidean norm with the inverse noise covariance matrix as the weighting matrix,1 and the regularization functional is set equal to an energy term determined by the choice of probability distribution used for the prior image model [39]. Because the limited-angle-tomography problem is severely underdetermined, the regularization functionals used in (2) will greatly affect the quality of the resulting reconstruction. Ideally, we would like to find regularization functionals that 1 By appropriately choosing the weighting matrix, a close approximation to the MAP estimate can be found for Poisson-distributed data [38].

206

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

lead to a solution that is stable, close to the true solution, and the minimum of a tractable (preferably convex) optimization problem. Unfortunately, it is usually difficult to meet all three of these conditions. For example, consider the standard regularization functional , where is a highpass filter. Although this regularization functional leads to a quadratic optimization problem with a stable solution, it results in a reconstruction that is globally smooth. Since most real images are not globally smooth, the resulting reconstruction may not be close to the true solution. In this paper, we use two different regularization functionals that are meant to enforce constraints that are consistent with many real images, namely positivity and piecewise smoothness. (Note that the constraint of spatial boundedness is implicitly enforced through (1) by choosing , and, hence, to be finite, and to have finite support.) The first regularization functional penalizes negative values of the estimate , where image , and is defined by is defined by if otherwise is not quadratic where is found from by (1). Although in , it is piecewise quadratic, continuously differentiable, and convex. The second regularization functional penalizes a function of the gradient of the estimate image, and is defined by (3) is the magnitude of the gradient of at where , and is a positive function. (Note can be found from by (1) in the that the gradient of obvious way as long as is differentiable.) Using these two regularization functionals in (2), we arrive at the actual cost function that we seek to minimize (4) Although the first regularization functional, which enforces a constraint of positivity, improves reconstruction quality slightly, its effect compared to the second regularization functional (3) is small. Hence, we concentrate our attention on (3). Regularization functionals similar to (3) have been used by many researchers in a variety of ill-posed problems (e.g., [25]–[30]) and are often called edge-preserving regularizers. The key to this type of regularization is the function , which determines how the gradient of the image is penalized. For example, choosing , which corresponds to standard quadratic regularization, will cause large gradients to be heavily penalized, and thus deter the formation of sharp edges. Several authors have set equal to the Huber function (e.g., [40]), which is quadratic near zero, but is linear beyond a threshold point. Although the Huber function is convex, it does not penalize large gradients as severely as the purely quadratic . Geman and McClure [41] have suggested using

, where is a parameter to be selected. This choice of is nonconvex, and penalizes large gradients even less than the Huber function. Other convex and nonconvex choices for have been suggested by Green [28], Lange [29], Bouman [30], Geman and Reynolds [42], Hebert and Leahy [43], Blake and Zisserman [44], and others. For images that consist of piecewise-smooth regions with sharp boundaries, we would expect that better results could be obtained by using a nonconvex , as opposed to a convex , since sharp transitions could be penalized less severely. Unfortunately, using a nonconvex leads to a nonconvex and usually multimodal optimization problem, which is difficult to solve. For example, standard optimization algorithms such as Modified Gauss–Newton or conjugate gradients (CG’s) can fail to converge to even local minima, and steepest descent can be extremely slow and exhibit severe numerical difficulties. Stochastic optimization algorithms, such as simulated annealing [45], require an extreme amount of computation to find a global minimum. Alternatively, deterministic relaxation algorithms (e.g., graduated nonconvexity [44], [46]), while not guaranteed to find a global minimum, require a reasonable amount of computation and have been used successfully in several applications. In the next section, we describe a deterministic relaxation algorithm that we use to (approximately) minimize (4). III. DETERMINISTIC RELAXATION A. Previous Work Based on the work of Geman and Reynolds [42], Charbonnier et al. [32], [33] recently proposed a deterministic relaxation algorithm called ARTUR to minimize the cost functional (5) which is similar to the cost functional (4) that we are interested in minimizing. Their algorithm is based on transforming (5) into a half quadratic dual functional by introducing an auxiliary variable and then alternately minimizing this dual functional, first with respect to , and then with respect to . Given an initial estimate image , ARTUR consists of the following steps: (6) (7) which are repeated until a stopping criterion is met. Aubert et al. [34] have performed a theoretical analysis of this algorithm and have proven that if is convex and nondecreasing on , and if is a full-column-rank converges to the unique matrix, then the sequence minimum of . Unfortunately, they provided no proof that the iterates converge, or even that a subsequence of those iterates converges to a local minimum of , when is

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

nonconvex or when is rank deficient. Hence, even for convex , their convergence result does not apply to the important class of underdetermined problems in which does not have full column rank. In particular, their result does not apply to the limited-angle-tomography problem, which is a prime example of this type. B. General Algorithm We now present a generalization of the cost functional (5), and state an algorithm for its minimization that is a generalization of ARTUR. We prove that this algorithm is globally convergent (in the sense defined in the Introduction), even when it is applied to (4) or (5) with nonconvex and rank deficient, provided the minima are isolated. Somewhat weaker but still very useful results are proved for the case when the minima are not isolated. Consider the cost functional (8) , for where , , are continuously differentiable, convex functionals, and , is a continuously twice differentiable strictly concave function, with , and . We will show that cost functionals of the foregoing form are minimized by the following algorithm. Algorithm : Let be the space and a functional defined by (9) We assume that

207

satisfies3

, so that is a stationary point of . Note that, by the assumptions on , is a continuously differentiable, strictly decreasing (hence invertible), positive function, with . Convexity of with respect to follows directly from its definition. (Note that may still be nonconvex, depending on the choice of the function .) The stronger assumption of strict convexity can be removed leaving all . subsequent results intact, by appropriate redefinition4 of To save space, and because, as discussed later, the assumption of strict convexity of is rather mild and a natural one in the context of regularized inversion, this generalization is left out. The fixed points of are points satisfying . By the construction in (11) and (12), these points are minima of with respect to . But, importantly, they are also stationary points of , as stated in the following proposition. Proposition 1: Let the set be defined by (13) Then is equal to the set of stationary points of the functional defined in (8). Proof: See Appendix A. We have the following convergence results, with the corresponding proofs contained in the appendices. (We denote by a strictly decreasing sequence converging to .) Theorem 1: Let be any sequence generated by the algorithm (from an arbitrary initial point ). Then, using the usual Euclidean norm and metric on , has convergent subsequences and all their limits are stationary points of

and are such that for any fixed is strictly convex with respect to

. For any

(14)

where is a stationary point of

define the function

(15) (16)

(10) and either point of of

and the mapping2 (11) where , . The algorithm consists of the iterative application of the mapping from some initial starting point : (12) For convenience, the algorithm and the map are denoted by the same symbol . We assume that is some sufficiently large fixed constant, such that for any the constraint in the minimization (11) is inactive. In other words, the unconstrained minimizer of

f k k1 < g owing

2 A unique minimizer exists in the convex region g : g to the assumed strict convexity of J0 ( ; e):

1

Furthermore, if

converges to a stationary or the accumulation points form a continuum in

(17)

, then

3 Nonexistence of such a is a deficiency of the choice of regularization, not of the algorithm. A finite satisfying this assumption certainly exists when both Q(f ) and Vn (f ) are nonnegative definite (piecewise) quadratic in f (with finite coefficients), as is the case in the applications in this paper. More generally, let Q(f ) and Vn (f ) be nonnegative convex functions with some bounded (not necessarily unique) minima q and v , respectively. Then, for e 0 J0 (f; e) in (9) will have a bounded (not necessarily unique) minimum with respect to f located in the convex hull of q; v ; m = 0 M 1: Hence, if is chosen so that f : f < contains this convex hull, our assumption is satisfied. 4 In particular, of all the minimizers with respect to g of J [g; E (f )] (of (f ) is defined to be the one closest to f (it which there can be many), can be shown to be unique). The convergence proof in this case is somewhat more involved, requiring the use of the Kuhn–Tucker optimality conditions is upper-semicontinuous. to show that

m

m 111 0

f

M

M

jj jj

g

m

208

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

A functional on a metric space will if there be said to have an isolated stationary point at such that for any other stationary exists a , , of . point, Corollary 1: If in Theorem 1 has only isolated stationary converges to a stationary point of from points, then any initialization. Corollary 2: If in Theorem 1 is strictly convex then will converge to its global minimum from any initialization. Corollary 3: In Theorem 1 there exists a sequence of stationary points of such that . in Theorem 1 is a strong local minimum Corollary 4: If containing such that of , then there exists an open set converges to for any initial point . Theorem 1 holds under the most general conditions assumed thus far. It guarantees convergence of the objective function , for any starting point. It falls short of guaranteeing con(to a single vergence of the entire iteration sequence stationary point of ), admitting the possibility of accumulation points of the sequence forming a continuum. However, since the accumulation points are stationary points of , this can only happen if the latter form a continuum. Hence, Corollaries 1 and 2, which guarantee global convergence of the iteration sequence to a single stationary point of , or to the global minimum of , respectively. Considering the pathological situation, it follows that the is “flat” for algorithm can fail to converge only when values of in certain regions, or along certain curves. Even in this case, the algorithm behaves in a very reasonable way, as follows. 1) The values of the objective are strictly decreasing, until they converge (15). goes to 2) The difference between successive iterates zero (16). 3) Corollary 3 shows that for sufficiently large , the entire can be guaranteed to get arbitrarily trajectory close to the solution set . 4) Corollary 4 shows that regardless of the existence of of is an “flats” in , any strong local minimum ; if the iteration is started “attractive” fixed point of , the iterates will from the basin of attraction of converge to it. 5) Because an isolated minimum will be attractive, it follows that iterates cannot “bounce about” between, say, two isolated points of , getting ever closer to them, but converging to neither. Hence, the only nonconvergent behavior that can be expected, is “creeping” of the iterates, getting ever closer to a continuous set of stationary points (a “flat”) of , if such exists. Since a priori no one stationary point in a “flat” of is preferred to another, these results are about as good as one could wish for in the presence of flats in . By stopping the iteration far enough (e.g., using the norm of the gradient of as a stopping criterion) one is guaranteed to find a point arbitrarily close to the solution set. Furthermore, strong local minima, and in particular the global minimum if it is a strong , regardless local minimum, are “attractive” fixed points of

of the existence of “flats.” Indeed, if the global minimum of is not unique (or at least a strong local minimum), this should be considered a failing of the regularized criterion, rather than a shortcoming of the algorithm. One last issue to be resolved, is the nature of the stationary points of , to which the algorithm can converge. These can include, in addition to the desirable minima, undesirable maxima and saddle points, if such exist. However, in contrast to the attractive minima, the latter are unstable fixed points. This follows from the fact that is strictly decreasing as long as . Hence, any perturbation away from a saddle point or a maximum will lead away from it. Since such perturbations are inherent to any numerical implementation, the “practical solution set” will only include local minima of . (In fact, by (15), convergence to a maximum is only possible if initialized there.) C. Application to Edge-Preserving Regularization One way5 of applying these results to (4) is by setting (18) (19) (20) and so defined satisfy the in (8)–(10).6 Note that differentiability and convexity assumptions of the algorithm . With these definitions and [cf., (4) and (8)]. Furthermore, the relationship of our algorithm to ARTUR can be seen by combining the definition of with (20) to obtain , and substituting this in (10). Note that, by (20), for to be strictly decreasing (and for our theorems to hold) must be a strictly concave function on . (This is the same restriction on that Aubert et al. derived in [34].) We emphasize that the definitions of (18)–(20) are only one way of applying the proposed algorithm to the problem of minimizing (4). More general cost functionals of the form (8) can be used. In particular, is not restricted to be the th element of the squared magnitude of gradient of , but could be any nonnegative, differentiable, convex functional of . Likewise, can be any nonnegative convex differentiable function of , including, for example, more general norms of the data residual instead of the weighted 2-norm used here. Such a choice might be useful for a heavier tailed generalized Gaussian noise model, and provide robustness against noise outliers. Using the definitions of (18)–(20), it is interesting to examine the conditions on , and under which the convergence results just presented will apply. First, as assumed throughout, must be strictly concave on , with 5 Another half-quadratic algorithm, called LEGEND, more recently proposed by Charbonnier et al. [47] to minimize 5, can also be considered a special case of our general algorithm by appropriately choosing , Vn , and . 6 Note that this particular choice of  precludes the use of  functions whose derivatives are not well defined everywhere. In particular, it precludes the  function proposed by Bouman and Sauer [30].

Q

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

. Second, can be any operator yielding a convex , including, for example, a differential operator such that is the weighted sum of squares of mixed horizontal and vertical derivatives up to some order. Recall, however, the assumption that is strictly convex with respect to . Clearly, this will be the case, regardless of the convexity of , when is a full column rank linear operator (matrix). The following results show that even when is not full column can be chosen to guarantee the strict convexity of rank, . Lemma 1: The cost function is strictly convex for any fixed if with respect to and only if the null spaces and of the operators and are disjoint [i.e., ]. be a magnitude-of-gradient operator, Corollary 5: Let and a tomographic projection operator. Then and is strictly convex for all . Proof: Under the stated conditions, for nonzero, is zero only when is a constant vector, in which case cannot be zero. Hence , and the corollary follows immediately from Lemma 1. Thus, Theorem 1 and Corollaries 3 and 4 apply under the above conditions for nonconvex and possible rank-deficient . These conditions also guarantee that the optimization problem that needs to be solved in each iteration of the algorithm is strictly convex, and therefore easy to solve. A sufficient condition for to be strictly convex (so that Corollary 2 applies) is given in the following result. Theorem 2: If is convex, then is also convex. Furthermore, if is strictly convex, then is also strictly convex if and only if . In particular, this condition is satisfied when is a magnitude-of derivative operator, and a tomographic projection operator. Note that when both and have a nontrivial nullspaces, as they often would in applications, strict convexity of is not sufficient to make strictly convex. . It is sufficient, however in the tomographic application with our particular choice of . At this point, it is not clear what are the guidelines for and to guarantee that the minima of are choosing isolated, so that Corollary 1 applies. However, as discussed earlier, existence of flats in can be considered a failure of the regularization criterion, rather than of the algorithm. Although we have presented the iterative application of the algorithm as a method of minimizing (4), it should be kept in mind that the convergence results just presented are asymptotic results. Since any algorithm must be stopped after a finite number of iterations, it is also useful to view each iteration of as solving a convexly regularized reconstruction problem with objective , in which the regularization is based on the solution from the previous step. at step Toward this end, the auxiliary variable can be thought of as an edge map (or line process) that is calculated from the estimate image according to . This edge map is then used in the minimization of (9) at step , to weight a quadratic function of the gradient so that less smoothing takes place at locations where the th estimate image had a large gradient. (Because is a strictly decreasing

209

function, will be a decreasing function of the squared gradient of the th estimate image.) The faster decreases, the less smoothing takes place at these locations, and hence the more “edge-preserving” is the regularization. In view of this interpretation of the minimization of as the solution of an adaptively regularized inverse problem, the requirement that be strictly convex becomes natural. Indeed, the goal of regularization for convex problems usually includes obtaining a unique solution, i.e., conversion to a strictly convex problem. It is natural to require that the adaptive regularization at least satisfy this within every iteration. Ideally, we would like to correspond to the edge map of the true image, not the estimate image at step . However, if the estimate image improves at the next iteration, then the corresponding edge map will also improve. This improved edge map, in turn, is used to create a more accurate estimate image, and the cycle can be repeated. To satisfy the condition of Corollary 2 for convergence to the global minimum, it is necessary that and, hence, be convex in . This condition essentially places a restriction on how fast decreases (i.e., how edge preserving is), which corresponds should not constrain to to our intuitive notion that have the same edges as , but rather should allow the edges of the estimate images to evolve. With this in mind, we have found it useful, when using certain choices of that correspond to nonconvex functions, to essentially blur the edge map used in the algorithm. That is, we set (21) where the linear operator produces the coefficients of a blurred version of the estimate image . (In our experiments we have used a 2-D Gaussian blur.) Note that the convexity of and are not affected by the choice of . Experimentally, the effect of blurring the edge map, for certain choices of , has been a reduction in false edges due to noise. In particular, edges of objects that are jagged with the unblurred edge maps become smooth with the blurred edge maps. We have also found that initial convergence can be improved by using a more severe blur for the initial iterations of the algorithm and then reducing (or eliminating) the blur for the final iterations. Note that these initial iterations do not affect the asymptotic convergence results of the algorithm, and can be thought of as providing a good starting point for the algorithm. Finally, note that in an actual implementation of the algorithm it will usually not be possible to find an exact minimizer of , which is needed in (11). In fact, in order to avoid excessive computation, it is actually desirable to calculate only an approximation to this minimizer. (This is the same rationale for using an inaccurate line search in a gradient descent algorithm [48].) In our experiments, we have used the CG algorithm for the minimization of each convex subproblem, and have found that using 8 to 16 iterations per subproblem provides a good compromise between accuracy and computational savings.

210

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

Fig. 1.



functions used in experiments.

IV. NUMERICAL EXPERIMENTS In this section, we present numerical results from simulated data to demonstrate that edge-preserving regularization, applied through the proposed deterministic relaxation algorithm, can produce excellent reconstructions from limited-angle data collected from piecewise-smooth objects. In addition, we explore some of the factors that affect reconstruction quality, such as choice of error norm, choice of (and hence, ), maximum view angle, number of projection angles, and noise. Although the positivity constraint described earlier is included in all the following simulations that used the proposed iterative algorithm, experiments exploring this constraint are not included because we have found this constraint to have only a minor effect on reconstruction quality. We use two functions—one convex, the other nonconvex—and compare their performance in a series of simulations. Since it is the function that is actually evaluated at each iteration of the algorithm, these functions are based on a family of functions that we define by , where and are positive parameters to be selected. The resulting is convex for . For our experiments we use and . Setting results in , which is convex and identical to a function proposed by Lange [29], while setting results in , which is nonconvex, and identical to a function proposed by Hebert and Leahy [43].7 Note , both functions are essentially that for sufficiently small equal to , corresponding to quadratic regularization. Fig. 1 shows plots of these functions. Henceforth, we will refer to the edge-preserving regularization functional (3) with these choices of as the convex EPR, for , and the nonconvex EPR, for . A schematic of the first phantom used in the simulations is shown in Fig. 2. This phantom consists of different regions of uniform density, with the numbers indicating the density/cm within each region. (The diameter of the phantom is 22.2 cm.) This phantom was chosen for the experiments because it provides a wide variation of object sizes, spacings, and densities. In particular, the closely spaced horizontal ellipses and rectangles provide a useful test for limited-angle algorithms when the wedge of missing spatial frequencies is centered about the axis 7

 approaches the broken parabola as q

! 1.

Fig. 2. Schematic of the computer phantom used in this paper. (All the small objects within a dotted ellipse have the same density.)

(as is done in all the limited-angle simulations in this paper). As these horizontal objects become more elongated, more of their energy is concentrated in the spatial frequencies that are missing, making their recovery more difficult. In addition, there are seven very dense objects within the phantom (about five times denser than any other object), which accentuate errors due to radial and angular aliasing. Note that in order to see the details of the low-density objects we have chosen a gray-scale palette for the display of the image to truncate image amplitudes at a value of 0.4 cm, thus clipping the display of the high-density objects. The synthetic data for this phantom were collected in the form of simulated detected photon counts , which were then converted into projection data by dividing the detected photon counts by the incident photon count , and then taking the log. For the following experiments the detected photon counts were simulated to be Poisson distributed random variables with , where is the th noiseless means equal to raysum of the phantom. The raysums were grouped together to form a set of parallel-ray projections at a discrete set of angles. The raysums were created by first analytically calculating the projections on a fine discrete grid, digitally filtering the projections with a simulated unit-width integrating detector response, and then subsampling. For reference, Fig. 3 shows full-angle FBP reconstructions, each displayed on a 240 240 grid, created from 180 uniformly spaced projections with incident photon counts (per ray) of 200 000, 20 000, and 2000, from top to bottom. The FBP algorithm used a ramp filter that was spectrally windowed by a Hanning window with a . (zero-level) cutoff frequency of As stated earlier, the CG algorithm was used to (approximately) solve each convex subproblem, stopping after 8–16 iterations per subproblem. In addition, the first few iterations of the deterministic relaxation algorithm were performed with a blurred edge map. The cost functional minimized in the following simulations is identical to (4), except for the addition of an edge-preserving regularizer similar to (3), but applied to the second derivative of the image. The basis function in (1) was chosen to be the tensor product of the cubic B-

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

Fig. 3. Full-angle FBP reconstructions, using 200 000, 20 000, and 2000 incident photon counts, from top to bottom.

spline8 [49], [50], with 240 240 image coefficients used to ). This basis represent the estimate image (i.e., function was chosen because it offers a good compromise between spatial localization and approximation capability. All displayed images are sampled on a 240 240 grid with unity spacing in each coordinate, and consist of the estimate image (not the coefficients ). In addition, for each simulation, the used in the regularization parameter and the parameter functions were experimentally determined so that roughly the best qualitative reconstruction resulted in each case. (See [51] for a study of the effects of these parameters on full8 The

1-D cubic B-spline can be calculated as b0 3 b0 3 b0 3 b0 , where

b0 (t) = 1 for jtj  1=2, and zero, otherwise.

211

data reconstructions.) Although not done in this paper, these parameters could also be found by using a more systematic method, such as generalized cross-validation [52]. In the first set of simulations, limited-angle reconstructions using three different norms are compared to examine how in (4) affects reconstruction the choice of error norm quality and algorithm performance. The first norm (referred to as POISSON in the figures) is a standard weighted Euclidean that explicitly norm that uses a diagonal weighting matrix accounts for the Poisson-distributed photon counts. The diagonal entries of the matrix are equal to the detected photon counts divided by the incident photon count, i.e., (see [38] for more details). This norm more heavily weights errors corresponding to raysums with high detected photon counts, since these raysums are less noisy. The second norm (referred to as CONSTANT in the figures) is the standard unweighted Euclidean norm, which is optimal when the noise is additive white Gaussian. The third norm (referred to as RAMP in the figures) is a spectrally weighted norm defined , where by , the discrete-time Fourier transform (DTFT) of the th row of (i.e., the DTFT of the th sampled parallel-ray projection), and the spectral weight for , where is a Hanning window. This spectrally weighted norm has been suggested not for statistical reasons, but so that an approximate Parseval’s relationship exists between the projection-domain and image-domain inner products [20], [53]. is based on the Typically, the choice of error norm noise statistics of the data in order to provide an optimal reconstruction at convergence. For example, when the photon counts are Poisson-distributed, the POISSON error norm should always outperform the CONSTANT and RAMP error norms, in terms of reconstruction quality at convergence. In the absence of noise, however, the choice of error norm should have no effect on reconstruction quality at convergence. In such a case, other factors, such as rate of convergence and computational cost, may affect how the error norm is chosen. The following experiment is meant to show how reconstruction quality and rate of convergence are affected by each of the three error norms described previously. Fig. 4 shows plots of normalized mean squared error (NMSE) versus cummulative CG iteration number for the three error norms considered using the convex EPR and the nonconvex EPR in the proposed deterministic relaxation algorithm. (NMSE is the standard MSE divided by the mean of the squared reference image.) For all experiments, the FBP reconstruction was used as the initial image , with the initial being derived from according to (10). Note edge map that 8–16 CG iterations are performed with a particular edge map (performing the minimization in (11) for one iteration (12) of the algorithm ) before the CG algorithm is restarted with a new edge map. These restarts are reflected by the sudden drops in the plots of NMSE versus CG iteration. The reconstructions for this example were made from projections to (resulting in collected every one degree, from a missing angle of 30 out of 180 ).

212

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

Fig. 4. NMSE versus CG iteration number, for convex and nonconvex EPR’s, with 30 missing angle. Effect of error norm for different incident photon counts.

In all plots, the NMSE for the RAMP error norm shows the fastest initial rate of decrease, while the NMSE for the POISSON error norm shows the slowest initial rate of decrease. This is not surprising, since the convergence rate of the CG algorithm is dependent on the distribution of the [54], where is the adjoint eigenvalues of the matrix is the weighting operator resulting from the of , and particular error norm used. As explained in [37], the RAMP error norm can increase the rate of convergence of the CG better conditioned. algorithm by making the matrix Note, also that the NMSE for the RAMP and CONSTANT error norms is affected more by the incident photon count than is the POISSON error norm. Fig. 5 shows reconstructions after 300 CG iterations, with a missing angle of 30 , using the three error norms and the nonconvex EPR. (Hence, the nine reconstructions shown in

Fig. 5 correspond to the left column of Fig. 4 at 300 iterations.) The left, middle, and right columns used the POISSON, CONSTANT, and RAMP error norms, respectively, and the top, middle, and bottom rows used projections with incident photon counts of 200 000, 20 000, and 2000 counts, respectively. Note that the images shown were taken at 300 CG iterations so that the reconstructions using the POISSON error norm (left column of Fig. 5) are not taken at convergence (as indicated by Fig. 4). It is fair to assume, however, that the reconstruction using the RAMP error norm with high counts (upper righthand corner of Fig. 5) represents (approximately) the best that any error norm could do, since the noise is low and convergence appears to have been reached (as indicated by Fig. 4). Note that for moderate to high incident photon counts, the reconstructions that used the CONSTANT and RAMP error norms are of high quality, although for low incident

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

213

Fig. 5. Reconstructions with 30 missing angle, using the nonconvex EPR. Top row to bottom: incident photon counts of 200 000, 20 000, and 2000. Left column to right: POISSON, CONSTANT, and RAMP error norms.

photon counts these reconstructions show streaking artifacts that are not present in the reconstruction that used the POISSON error norm. (At 2000 incident counts the number of photons detected was as few as zero or one for rays passing through the dense objects—hence, this is a demanding test case.) As expected, Fig. 5 shows that it is critical to use the POISSON error norm only for low incident counts. This observation is important because a fast algorithm recently proposed in [37] can be used to minimize the convex subproblems only if a spectrally weighted norm, such as the CONSTANT or RAMP norm, is used in (4). This fast algorithm was used in all the simulations in this paper that used the CONSTANT or RAMP error norms to facilitate the

collection of data—as well as to demonstrate the effectiveness of the fast Fourier algorithm for limited-angle tomography. Using the low-end Silicon Graphics Indy workstation, each iteration of the fast algorithm (one CG iteration) took about 240 reconstruction—over ten times faster 3.5 s for a 240 than the standard matrix-based CG algorithm that was used with the POISSON error norm. In the next set of simulations, limited-angle reconstructions using the deterministic relaxation algorithm with the convex EPR, nonconvex EPR, and standard quadratic regularization ) are compared under different limited-angle (i.e., scenarios. Fig. 6 shows NMSE versus CG iteration number for the three regularization functionals, with moderate and high incident photon counts. (Again, the FBP reconstruction was

214

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

Fig. 6. NMSE versus CG iteration number for different missing angles and incident photon counts, with RAMP error norm. Comparison of quadratic, convex, and nonconvex EPR’s.

used as the initial starting point for all experiments, with the initial edge map being derived from the FBP reconstruction.) The reconstructions in these plots used the RAMP error norm and projections collected every one degree, with missing angles of 20, 40, or 60 , out of 180 . In all cases, the NMSE’s for both edge-preserving regularization functionals are much lower than the NMSE for the quadratic regularization, with the nonconvex EPR performing significantly better than the convex EPR. Note that for large missing angles, the NMSE for the quadratic case actually increases slightly after the first 20–30 CG iterations before leveling off. This behavior reflects the interaction of the inconsistent constraints that are being applied in the quadratic case. That is, quadratic regularization penalizes the energy of the weighted spectrum and, hence, favors solutions with as little spectral energy as possible. In particular, for the

limited-angle problem, quadratic regularization tends to produce reconstructions with very little energy in the missing wedge of frequencies. We conjecture that during the initial iterations, the penalty on the residual error, combined with the constraints of spatial boundedness and positivity, help to recover some of the missing frequencies and cause the NMSE to drop. At some point, however, the quadratic regularization reduces the spectral energy of the solution, causing the NMSE to rise again, until an equilibrium is reached. Note that beyond the first few CG iterations, the reconstructions using the nonconvex EPR always have the lowest NMSE for any given iteration number. Since the computational cost per CG iteration is roughly the same for all three regularization functionals, these plots show that for a given amount of computation, the nonconvex EPR provides the lowest NMSE. The disadvantage of using the nonconvex EPR is that the

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

215

Fig. 7. Reconstructions using the RAMP error norm and 200 000 incident photon counts. Left column to right column: missing angles of 20, 40, and 60 . Top row to bottom row: quadratic regularization, convex EPR, and nonconvex EPR.

deterministic relaxation algorithm is (at best) guaranteed to converge only to a stationary point of (4), as opposed to convergence to the global minimum for the convex EPR. In practice, we have not found this to be a problem for the proposed nonconvex , especially when the first few iterations of the deterministic relaxation algorithm were performed with a blurred edge map to obtain a good starting point, as described earlier. (Alternatively, the first few iterations could be performed with the convex EPR for the same result.) Fig. 7 shows reconstructions (after 300 CG iterations) using the three regularization functionals with different missing angles. For these reconstructions, the RAMP error norm was used, and the incident photon count was 200 000. The top, middle,

and bottom rows used the quadratic regularization, convex EPR, and nonconvex EPR, respectively, and the left, middle, and right columns had missing angles of 20, 40, and 60 (out of 180 ), respectively. Note than most of the artifacts present in the quadratic case have been eliminated by the edgepreserving regularizers. In addition, many of the horizontal edges in the phantom have been accurately recovered with the edge-preserving regularization, especially with the nonconvex EPR. This demonstrates that the edge-preserving regularizers are not simply creating “sharper” versions of the quadratically regularized reconstructions. Because the edge-preserving regularization attempts to enforce a constraint of piecewise smoothness on the reconstruction—a constraint consistent with the original phantom from which data were collected—it

216

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

Fig. 8. NMSE versus number of projection angles, using the RAMP error norm and nonconvex EPR. Fixed total incident photon count.

actually helps to resolve some of the indeterminacy inherent in the limited-angle problem. Next, we demonstrate how the angular sampling rate affects the quality of limited-angle reconstructions. In many instances, it is desirable to keep the total radiation dosage as low as possible when collecting projection data. In these cases, a tradeoff exists between the number of angles at which projections are collected and the number of photons detected at each angle. That is, one might ask whether it is better to collect many noisy projections, or fewer low-noise projections. Fig. 8 attempts to answer this question by showing the NMSE (after 300 CG iterations) versus the number of uniformly spaced projection angles used. For these reconstructions the RAMP error norm was used along with the nonconvex EPR. The incident photon count is 200 000 for 25 projection angles, and is reduced by half as the number of projection angles doubles, so that the total dosage is constant for all experiments. These plots show that as long as the angular sampling is not extremely sparse, there is not much difference in NMSE for different angular sampling rates. Fig. 9 confirms this observation by showing the reconstructions (after 300 CG iterations) for three different angular sampling rates. These results, in combination with those in Figs. 4 and 5, indicate that in many cases the incident photon count can be increased to a rate high enough that the fast algorithm described in [36] and [37] can be used in the proposed iterative algorithm without causing additional artifacts. Thus far, all simulations have used projection data collected from a piecewise-constant phantom. We now present results using data collected from a piecewise smooth phantom to demonstrate that the proposed edge-preserving regularizers are effective for this case as well. The piecewise smooth phantom is identical to the piecewise constant phantom, except that a two-dimensional (2-D) sinusoid has been added to it. Fig. 10 shows a (noiseless) full-angle reconstruction of this phantom. A plot of NMSE versus iteration number for a missing angle of 20 is shown in Fig. 11 for the different regularization functionals. These reconstructions used the RAMP error norm with an incident photon count of 200 000. Note that the results with the piecewise-smooth phantom are very similar to the results with the piecewise-constant phantom, except for a slight increase in the NMSE for the nonconvex EPR. However, the nonconvex EPR is still superior to the convex EPR. Fig. 12 shows the reconstructions for the 20 -missing-

Fig. 9. Reconstructions using the nonconvex EPR, with 30 missing angle. From top to bottom: 25 angles at 200 000 counts, 100 angles at 50 000 counts, and 400 angles at 12 500 counts.

angle case (after 300 CG iterations), for the three different regularization functionals, and Fig. 13 shows vertical slices through these reconstructions (slightly left of center), comparing each regularized reconstruction to the (noiseless) full-angle reconstruction. Note that the reconstructions using the proposed edgepreserving regularizers closely match the full-angle reconstruction, and do not contain the “banding” artifacts that can result from using some nonconvex edge-preserving regularizers. V. CONCLUSIONS We have introduced a generalization of the deterministic relaxation algorithm of Charbonnier et al. [32], [33] for

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

Fig. 10.

217

Full-angle reconstruction of the piecewise-smooth phantom.

Fig. 11. NMSE versus CG iteration number, using the RAMP error norm (piecewise-smooth phantom).

edge-preserving regularization in linear inverse problems, and proved its global convergence even when the cost functional being minimized is nonconvex. Applying it to tomographic reconstruction from limited-angle data, we have shown that edge-preserving regularization can provide excellent reconstructions in this challenging problem, and that the proposed deterministic relaxation algorithm is an effective tool for minimizing the resulting (possibly nonconvex) cost functionals. Of the two edge-preserving regularizers proposed, the nonconvex one was shown to outperform the convex one in all experiments. In addition, for moderate to high photon counts, spectrally weighted error norms were shown to provide reconstruction quality comparable to that of a standard error norm that is commonly used for Poisson-distributed data, while exhibiting superior convergence properties. This observation suggests that the recently proposed fast Fourier algorithm [36], [37], which is restricted to using a spectrally weighted error norm, can be used in many practical limited-angle problems to perform the minimization needed by the deterministic relaxation algorithm.

Fig. 12. Reconstructions of the piecewise-smooth phantom, using the RAMP error norm with 20 missing angle, and 200 000 counts. From top to bottom: quadratic regularization, convex EPR, nonconvex EPR.

first that (A.1)

(A.2) APPENDIX

so that, because

we have (A.3)

A. Proof of Proposition 1 Let gradient of

denote the gradient of , and the with respect to its first argument. Observe

Now, if

, then from (9)–(13) we have (A.4)

218

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

Lemma 2: The function is strictly convex on Proof: Using the definition of we have

.

(B.3) , because is strictly Noting that for decreasing, it is clear that (B.3) implies that for . Hence, is strictly convex on (0, 1), and convex on (0, 1], and therefore, by the continuity of at , also strictly convex on (0, 1]. Remark: In the sequel we only use the convexity, rather than strict convexity of . Lemma 3: For any given

Proof: Let . First note that is convex on because by Lemma 2 is convex on (0, 1]. Hence, any relative minimizer of is also a global minimizer. Next, taking the partial derivative of with respect to yields

(B.4) . Hence, which vanishes for , and is a relative minimizer of and, therefore, also a global minimizer. Next we define another function (B.5)

Fig. 13.

where

Vertical slices through piecewise-smooth reconstructions of Fig. 12.

Its properties are given by the following Lemma 4: Let where , with equality only when . Proof: Because from (B.1) that

. Then , it follows

. Hence

(B.6) (A.5)

so that the stationary points of of .

coincide with the fixed points

B. Preliminary Results Defining the space function

, we introduce the

(B.1) where

with equality only when is also minimizer of , which would then imply that . Next, by Lemma 3, , so that, combining with (B.6) we have (B.7) with equality only if Lemma 5: Let

. be as defined in (8). Then

is defined by (B.8)

(B.2) First note the following lemmas concerning

and

.

where

is a constant.

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

Proof: We verify that gradients. Indeed

and

have identical

219

2) By Lemma 4, is strictly monontonic with respect to . 3) To prove upper semicontinuity let , and suppose and . Then by (9)–(11) we have , or

(B.9) (C.1)

(B.10) where the equality in (B.10) follows from (A.3) and (B.4). C. Proof of Theorem 1 The proof uses the concept of a point to set map , which associates with every point a set . An algorithm is then defined by , where it is immaterial which of the points in the set is assigned to . The fixed points of are defined to be points satisfying , rather than just . For these points, the mapping is a point to point mapping. We will use the results9 in [55], which we state here for reference as needed. Let be a closed subset of , and a continuous function. A point to set map will be said to be strictly monotonic (with respect to a function ) at if implies whenever is not a fixed point of . is said to be upper semicontinuous (u.s.c.) at if and imply . These properties will be said to hold on if they hold at all points of . Finally, a mapping will be said to be uniformly compact on if there exists a compact for all . set independent of such that Theorem 3—Meyer, 1976 [55]: Let be a point to set mapping such that on is uniformly compact, u.s.c., and strictly monotonic with respect to the function . If is any sequence generated by the algorithm corresponding to , then all accumulation points of will be fixed points, where is a fixed point, , and either converges, or the accumulation points of form a continuum. We show that the conditions of Theorem 3 are met for the algorithm , with , the function , and the set of fixed points . 1) is uniformly compact on , since by (11), for any , given by satisfies , lies in the closed and bounded (and therefore i.e., compact) set . 9 These results are stronger than the similar global convergence theorem (cf., [48]), in that they show convergence of the entire sequence of iterates, rather than only subsequential convergence. The stronger result follows precisely from the more restrictive definition of the map and its fixed points, such that at a fixed point the map is a point to point map.

M

Because all operators in (C.1) are continuously differentiable letting yields . This implies, by the strict convexity of with respect to , that , so that , establishing that is u.s.c. on . Having shown that Theorem 3 applies, the conclusions of Theorem 1 follow. In particular, by compactness any sequence of iterates has convergent subsequences. Their limits, which are the accumulation points of , are fixed points, which, by Proposition 1, must be stationary points of . The strictly monotone convergence of follows from the convergence of in Theorem 1, from the strict monotonicity of (Lemma 4) and from Lemma 5 which identifies . D. Proofs of Corollaries 1–4 Proof of Corollary 1: By (14) the accumulation points are fixed points of , and in turn, by Proposition 1, they are stationary points of . Hence, by the conditions of the corollary they can not form a continuum, and by (17) the sequence must converge. Proof of Corollary 2: Since is strictly convex, it has a single stationary point, which is also its global minimizer. The corollary then follows immediately from Corollary 1. Proof of Corollary 3: Suppose the corollary is not true. Then there exists a subsequence of such that , for all

By compactness, though, there must be an the subsequence converges to some

such that . Clearly

for all . But, by Theorem 1, , which is a contradiction. Proof of Corollary 4: The result follows from the following. Theorem 4—Meyer, 1976 [55]: Let satisfy the conditions of Theorem 3. If is an isolated fixed point of that is also a strong local minimum of , then there exists an open set containing such that implies converges to . Once again, the combination of Proposition 1 and Lemma 5 identifies the fixed points of with the stationary points of , so that our result follows immediately from Theorem 4.

220

E. Strict Convexity of

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 2, FEBRUARY 1998

and

Proof of Lemma 1: Recall that is convex, but not strictly convex because has a nontrivial null space. We will show that likewise is convex, but not strictly convex. However, the sum will be strictly convex if and only if . Note first that the squared magnitude of the image gradient can be represented as (E.1) for an appropriate with respect to

matrix . The Hessian of is therefore given by

(E.7) is convex, where the inequality follows from (E.6). If then , and , so that is convex. If is strictly convex, we establish strict convexity of . Consider first the case . Then, because , the second sum in (E.7) is strictly positive, and . Next, consider the case . In this case (E.7) reduces to (E.8)

(E.2) To check for positive definiteness of

consider

follows in the same way as so that strict convexity of in the proof of Lemma 1 and under the same conditions on and .

(E.3) ACKNOWLEDGMENT . Because , (E.3) only vanishes for some for . Thus, , with equality only for Therefore, is strictly convex with respect to for , and only convex elsewhere. It follows that will be strictly convex everywhere if and only if , because this condition eliminates the possibility that neither term of is strictly convex (and the sum of a strictly convex and a convex function is strictly convex). Proof of Theorem 2: The proof follows a similar argument as in the proof of Lemma 1. Let . Using the relation it is easy to show that

(E.4) where (E.5) Note that by the Cauchy–Schwartz inequality, for any

(E.6) is well defined. so that Suppose that for a given , where the cardinality of the index set

for is

, . Then

The authors thank the anonymous reviewers, J. Fessler, and the Associate Editor, K. D. Sauer, for their careful reading of the manuscript and helpful comments. REFERENCES [1] S. Deans, The Radon Transform and Some of Its Applications. New York: Wiley, 1983. [2] S. W. Rowland, “Computer implementation of image reconstruction formulas,” in Image Reconstruction From Projections. Topics in Applied Physics, G. T. Herman, Ed. New York: Springer-Verlag, 1979, vol. 32, pp. 9–79. [3] A. Peres, “Tomographic reconstruction from limited angular range,” J. Comput. Assist. Tomogr., vol. 3, pp. 800–803, 1979. [4] T. Inouye, “Image reconstruction with limited view angle projections,” in Proc. Int. Workshop on Physics and Engineering in Medical Imaging, Pacific Grove, CA, Mar. 1982. [5] J. L. Prince and A. S. Willsky, “Constrained sinogram restoration for limited-angle tomography,” Opt. Eng., vol. 29, pp. 535–544, 1990. [6] H. Kudo and T. Saito, “Sinogram recovery with the method of convex projections for limited-data reconstruction in computed tomography,” J. Opt. Soc. Amer. A, vol. 8, pp. 1148–1160, 1991. [7] M. C. Lawrence, M. A. Jaffer, and B. T. Sewell, “The application of the maximum entropy method to electron microscopic tomography,” Ultramicroscopy, vol. 31, pp. 285–301, 1989. [8] R. T. Smith, C. K. Zoltani, G. J. Klem, and M. W. Coleman, “Reconstruction of tomographic images from sparse data sets by a new finite element maximum entropy approach,” Appl. Opt., vol. 30, pp. 573–582, 1991. [9] M. L. Reis and N. C. Roberty, “Maximum entropy algorithms for image reconstruction from projections,” Inverse Probl., vol. 8, pp. 623–644, 1992. [10] K. M. Hanson, “Bayesian and related methods in image reconstruction from incomplete data,” in Image Recovery: Theory and Application, H. Stark, Ed. New York: Academic, 1987, pp. 79–127. [11] H. Peng and H. Stark, “One-step image reconstruction from incomplete data in computer tomography,” IEEE Trans. Med. Imag., vol. 8, pp. 16–31, 1989. [12] J. S. Jaffe, “Limited angle reconstruction using stabilized algorithms,” IEEE Trans. Med. Imag., vol. 9, pp. 338–344, 1990. [13] J. Carazo and J. L. Carrascosa, “Information recovery in missing angular data cases: An approach by the convex projections in three dimensions,” J. Microsc., vol. 145, pp. 23–43, 1987. [14] F. A. Grunbaum, “A study of Fourier space methods for limited-angle image reconstruction,” Numer. Funct. Anal. Optim., vol. 2, pp. 31–42, 1980.

DELANEY AND BRESLER: EDGE-PRESERVING REGULARIZED RECONSTRUCTION

[15] K.-C. Tam and V. Perez-Mendez, “Limited-angle three-dimensional reconstructions using Fourier transform iterations and Radon transform iterations,” Opt. Eng., vol. 20, pp. 586–589, 1981. [16] M. I. Sezan and H. Stark, “Applications of convex projection theory to image recovery in tomography and related areas,” in Image Recovery: Theory and Application, H. Stark, Ed. New York: Academic, 1987, pp. 415–462. [17] I. Fujieda, K. Heiskanen, and V. Perez-Mendez, “Versatility of the CFR algorithm,” IEEE Trans. Nucl. Sci., vol. 37, pp. 585–588, 1990. [18] J. S. Karp, G. Muehllehner, and R. M. Lewitt, “Constrained Fourier space method for compensation of missing data in emission computed tomography,” IEEE Trans. Med. Imag., vol. 7, pp. 21–25, 1988. [19] A. H. Andersen, “Algebraic reconstruction in CT from limited views,” IEEE Trans. Med. Imag., vol. 8, pp. 50–55, 1989. [20] B. P. Medoff, “Image reconstruction from limited data: Theory and applications in computerized tomography,” in Image Recovery: Theory and Application, H. Stark, Ed. New York: Academic, 1987, pp. 29–78. [21] R. Rangayyan, A. P. Dhawan, and R. Gordan, “Algorithms for limitedview computed tomography: An annotated bibliography and a challenge,” Appl. Opt., vol. 24, pp. 4000–4012, 1985. [22] D. Verhoeven, “Limited-data computed tomography algorithms for the physical sciences,” Appl. Opt., vol. 32, pp. 3736–3754, 1993. [23] P. Oskoui and H. Stark, “A comparative study of three reconstruction methods for a limited-view computer tomography problem,” IEEE Trans. Med. Imag., vol. 8, pp. 43–49, 1989. [24] M. E. Davison, “The ill-conditioned nature of the limited angle tomography problem,” SIAM J. Appl. Math., vol. 43, pp. 428–448, 1983. [25] W. Qian and D. M. Titterington, “Bayesian image restoration: An application to edge-preserving surface recovery,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, pp. 748–752, 1993. [26] M. Gokmen and C.-C. Li, “Edge detection and surface reconstruction using refined regularization,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, pp. 492–499, 1993. [27] R. R. Schultz and R. L. Stevenson, “A Bayesian approach to image expansion for improved definition,” IEEE Trans. Image Processing, vol. 3, pp. 233–242, 1994. [28] P. J. Green, “Bayesian reconstructions from emission tomography data using a modified EM algorithm,” IEEE Trans. Med. Imag., vol. 9, pp. 84–93, 1990. [29] K. Lange, “Convergence of EM image reconstruction algorithms with Gibbs smoothing,” IEEE Trans. Med. Imag., vol. 9, pp. 439–446, 1990. [30] C. Bouman and K. Sauer, “A generalized Gaussian image model for edge-preserving MAP estimation,” IEEE Trans. Image Processing, vol. 2, pp. 296–310, 1993. [31] A. H. Delaney and Y. Bresler, “Efficient edge-preserving regularization for limited-angle tomography,” in Proc. IEEE Int. Conf. Image Processing Washington, DC, Oct. 1995, vol. III, pp. 176–179. [32] P. Charbonnier, L. Blanc-Feraud, and M. Barlaud, “An adaptive reconstruction method involving discontinuities,” in Proc. ICASSP, Minneapolis, MN, Apr. 1993. , “ARTUR: An adaptive deterministic relaxation algorithm for [33] edge-preserving tomographic reconstruction,” Tech. Rep. 93-76, 13S, Univ. Nice, Sophia Antipolis, France, 1993. [34] G. Aubert, M. Barlaud, L. Blanc-Feraud, and P. Charbonnier, “Deterministic edge-preserving regularization in computed imaging,” Tech. Rep. 94-01, 13S, Univ. Nice, Sophia Antipolis, France, 1994. [35] Y. Censor, “Finite series-expansion reconstruction methods,” Proc. IEEE, vol. 71, pp. 409–419, 1983. [36] A. H. Delaney and Y. Bresler, “A fast iterative tomographic reconstruction algorithm,” in Proc. ICASSP, Detroit, MI, May 1995, vol. 4, pp. 2295–2298. , “A fast and accurate Fourier algorithm for iterative parallel-beam [37] tomography,” IEEE Trans. Image Processing, vol. 5, pp. 740–753, May 1996. [38] K. Sauer and C. Bouman, “Bayesian estimation of transmission tomograms using segmentation based optimization,” IEEE Trans. Nucl. Sci., vol. 39, pp. 1144–1152, 1992. [39] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of image,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-6, pp. 721–741, 1984. [40] R. Stevenson and E. Delp, “Fitting curves with discontinuities,” in Proc. 1st Int. Workshop on Robust Computer Vision, Seattle, WA, Mar. 1990, pp. 127–136. [41] S. Geman and D. McClure, “Bayesian image analysis: An application to single photon emission tomography,” in Proc. Amer. Stat. Assoc. Stat. Comput. Sect., 1985, pp. 12–18. [42] D. Geman and G. Reynolds, “Constrained restoration and the recovery of discontinuities,” IEEE Trans. Pattern Anal. Machine Intell., vol. 14, pp. 367–383, 1992.

221

[43] T. Hebert and R. Leahy, “A generalized EM algorithm for 3-D Bayesian reconstruction from Poisson data using Gibbs priors,” IEEE Trans. Med. Imag., vol. 8, pp. 194–202, 1990. [44] A. Blake and A. Zisserman, Visual Reconstruction. Cambridge, MA: MIT Press, 1987. [45] A. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 200, pp. 671–680, 1983. [46] A. Blake, “Comparison of the efficiency of deterministic and stochastic algorithms for visual reconstruction,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 2–12, 1989. [47] P. Charbonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud, “Two deterministic half-quadratic regularization algorithms for computed imaging,” in Proc. ICIP, Austin, TX, Nov. 1994. [48] D. G. Luenberger, Linear and Nonlinear Programming. Reading, MA: Addison-Wesley, 1984. [49] M. Unser, A. Aldroubi, and M. Eden, “B-spline signal processing: Part I—Theory,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 41, pp. 821–833, 1993. [50] K. M. Hanson and G. W. Wecksung, “Local basis-function approach to computed tomography,” Appl. Opt., vol. 24, pp. 4028–4039, 1985. [51] D. S. Lalush and B. M. W. Tsui, “Simulation evaluation of Gibbs prior distributions for use in maximum a posteriori SPECT reconstructions,” IEEE Trans. Med. Imag., vol. 11, pp. 267–275, 1992. [52] G. Golub, M. Heath, and G. Wahba, “Generalized cross-validation as a method for choosing a good ridge parameter,” Technometrics, vol. 21, pp. 215–223, 1979. [53] J. K. Older and P. C. Johns, “Matrix formulation of computed tomogram reconstruction,” Phys. Med. Biol., vol. 38, pp. 1051–1064, 1993. [54] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD: Johns Hopkins Univ. Press, 1989. [55] R. R. Meyer, “Sufficient conditions for the convergence of monotonic mathematical programming algorithms,” J. Comput. Syst. Sci, vol. 12, pp. 108–121, 1976.

Alexander H. Delaney was born in Anchorage, AK, in 1965. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Illinois, Urbana-Champaign, in 1987, 1989, and 1995, respectively. From 1989 to 1991, he was a Design Engineer at KLA Instruments, San Jose, CA. He is currently a consultant in Sunnyvale, CA.

Yoram Bresler (SM’93) received the B.Sc. (cum laude) and M.Sc. degrees from Technion—Israel Institute of Technology, Haifa, in 1974 and 1981, respectively, and the Ph.D. degree from Stanford University, Stanford, CA, in 1985, all in electrical engineering. From 1974 to 1979, he served as an Electronics Engineer in the Israeli Defense Force. From 1979 to 1981, he was a consultant for the Flight Control Laboratory, Technion, developing algorithms for autonomous TV aircraft guidance. From 1985 to 1987, he was a Research Associate at the Information Systems Laboratory, Stanford University, where his research involved sensor array processing and medical imaging. In 1987, he joined the University of Illinois, Urbana-Champaign, where he is currently an Associate Professor with the Department of Electrical and Computer Engineering and the Bioengineering Program, and Research Associate Professor at the Computer and Systems Research Laboratory. His current research interests include multidimensional and statistical signal processing and their applications to inverse problems in imaging and sensor array signal processing, and to diagnostic and scientific visualization. Dr. Bresler was an Associate Editor for the IEEE TRANSACTIONS ON IMAGE PROCESSING from 1992 to 1993, and is currently on the editorial board of Machine Vision and Applications. He is a member of the IEEE Image and Multidimensional Signal Processing Technical Committee. In 1988 and 1989, he received the Senior Paper Awards from the IEEE Acoustics, Speech, and Signal Processing Society. He is the recipient of a 1991 NSF Presidential Young Investigator Award and a 1995 Technion—Israel Institute of Technology Fellowship.