REGULARISED, SEMI-LOCAL HURST ESTIMATION ... - IEEE Xplore

6 downloads 0 Views 225KB Size Report
{corina.nafornita@upt.ro, alexandru.isar@upt.ro}. J. D. B. Nelson. Department of Statistical Science. University College London [email protected]. ABSTRACT.
REGULARISED, SEMI-LOCAL HURST ESTIMATION VIA GENERALISED LASSO AND DUAL-TREE COMPLEX WAVELETS C. Nafornita, A. Isar

J. D. B. Nelson

Communications Department Politehnica University Timisoara {[email protected], [email protected]}

Department of Statistical Science University College London [email protected]

ABSTRACT Semi-local Hurst estimation is considered for random fields where the regularity varies in a piecewise manner. The recently developed generalised lasso is exploited to propose a spatially regularised Hurst estimator. Dual-tree complex wavelets are used to formulate the usual log-spectrum regression problem and an interlaced penalty matrix is constructed to form a 2-d fused lasso constraint on the double-indexed parameters. We thus extend a regularity-based denoising approach and demonstrate the utility of our method with experiments. 1. INTRODUCTION Strictly self-similar processes are invariant in distribution, up to a constant, under spatial (or temporal) scalings. That these processes display long-range dependence behaviour continues to afford them much interest in myriad image processing applications such as texture super-resolution [1], artwork classification [2], surface inspection [3], analysis of fMRI data [4] and mammograms [5], adaptive filtering for sonar [6], regularity-controlled denoising and de-trending [7, 8, 9], and many more [10]. The long-range dependency is encapsulated by regularity parameters such as the Hurst or H¨older exponents. These shape the spatial correlatory structure and determine the smoothness present in complex textures and natural phenomena. In the near-trivial case, regularity is assumed constant throughout the data. However, this global formulation has limited value for image processing. Images typically comprise multiple textures and, as such, often manifest multiple Hurst exponents throughout their spatial support. To this, and other, ends, the powerful multifractal formalism of Brown et al [11] has now been successfully integrated into various image processing problems by various researchers in the multiresolution framework [10, 12, 13]. Assuming that the pointwise smoothness estimation of a process with quickly varying regularity is a futile pursuit, this formalism instead provides a means to describe how regularity varies over the image. J. D. B. Nelson was partially supported by grants from the Dstl, TSB, and D2U-EPSRC

978-1-4799-5751-4/14/$31.00 ©2014 IEEE

In this paper, we will consider a somewhat more elementary case whereby the Hurst exponent is allowed to vary in a piecewise constant manner. This, semi-local, statistical description of image data may, for example, be appropriate for segmentation or adaptive denoising-type sub-problems where an image can be thought of as a disjoint union of textures. We demonstrate numerically that this setting does indeed allow reasonably accurate pointwise estimates of the Hurst exponent. After taking care of some preliminaries we illustrate, in Section 3, that recent developments in statistical computing [14] can be leveraged to exploit the piecewise constant Hurst prior. Moreover, our construction is such that this can easily be extended to the case where the Hurst exponent varies as a polynomial. In Section 4, a number of experiments are performed which confirm that our lasso-based approach holds an advantage over the usual least-squares (or linear extensions thereof). In this manner, we extend a regularity-based denoising approach of Echelard and L´evy V´ehel [7]. 2. WEAKLY SELF-SIMILAR PROCESSES If a self-similar process f satisfies some further modest technical conditions (stochastic continuity and non-triviality, cf. [15]) then there must exist an exponent H > 0 such that d f (λ·) = λH f (·). In fact, we only here require a more general form of self-similarity, whereby the invariance property is satisfied over the first two orders of statistics. Definition 1 (weak statistical self-similarity). Let (Ω, F, P) be a probability space and let λ ∈ R, H ∈ (0, 1), t ∈ T ⊂ R2 . The stochastic field f : Ω × T → R is said to have weak statistical self-similarity, and we write f ∈ WSH , if Ef (λ·) = λH Ef and Ef (λt)f (λ·) = λ2H Ef (t)f (·). It immediately follows that weak self-similar processes are strictly self-similar, they contain the much studied Fractional Brownian surface, and have a power law spectrum. 2.1. Mutiresolution analysis As follows, and elsewhere [10], wavelets offer a natural and compelling means to study self-similar processes.

2689

ICIP 2014

Proposition 2 (e.g. Nelson & Kingsbury [6]). Let f ∈ WSH .  2 Then E(Wf )(· ; k, m) ∝ 22k(H+1) where W denotes the wavelet defined by (Wf )(t; k, m) :=   transform operator, 2−k f, ψm (2−k · −t) , with some suitable wavelet ψ defined over space t, orientation m, and at the kth finest scale level. The Hurst exponent describes smoothness. Values close to one (zero) will be relatively smooth (rough). If H is fixed over the entire support, it can be estimated by taking the log of both sides of the proportionality in Prop. 2 and computing the slope of the regression via least squares. In general, regularity can vary with respect to space and direction and we have H = Hm (t). In this case, the Hurst parameter can still be estimated by carrying out least squares over localised cones, and directional subbands, in the wavelet domain. 2 In practice E |·| is approximated by the sample 2nd moment E in the region δt , scale k and orientation m, namely 2 −1  2k(Hm (t)+1) Ekm (t) := |δt | (Wf )(· ; k, m) (δ ) ∝ (1) ∼2 2

ˆ i = (βˆ2 [i]/2 − 1). The from the estimate of slope via H pointwise least-squares solver can be readily extended to the entire spatial domain via minB Y − X0 B2 where Y = [y1 , . . . , yn ] ∈ RK×n and B = [β 1 , . . . , β n ] ∈ R2×n . As  −1 T X0 Y only in the pointwise case, the solution XT0 X0 involves the inversion of a two-by-two matrix. 3. GENERALISED LASSO FORMULATION The main drawback with the Hurst estimation, described above, is that the estimated regularity can vary quite markedly according to the behaviour at the finest, and usually noisier, scales. Ideally, we might want some spatial smoothing but only at those locations where the linear least squares solution was a poor fit to the log spectrum. This motivates us to propose the generalised lasso [14, 18] as a means to spatially regularise the Hurst estimates. This non-linear smoother takes the form 2

argmin y − Xβ2 + λ Dβ1 , β

t

When spatial localisation of Hurst is important, the pointwise  2 estimate (Wf )(· ; k, m) is used. The energy of the discrete wavelet coefficients, computed over a dyadic 2-d grid, can then be computed on the integer lattice (the original pixel locations) at each scale level by appropriate interpolation.

where y is the response (data) vector, X is a matrix of predictor variables (the model), β is a vector of model parameters, and D is a penalty matrix. This problem reduces to the so-termed 1-d fused lasso when D has a main diagonal of negative ones, an upper diagonal of positive ones, and is zero elsewhere.

2.2. Estimation Recently, Nelson and Kingsbury [16] proposed and explored the use of the dual-tree complex wavelet estimator for the case where the Hurst exponent is both anisotropic and piecewise locally varying. They showed that (shift-invariant) dual-tree wavelets provided Hurst estimates with greater accuracy and less variance than other, shift-variant, decimated wavelet transforms; a key reason for this is that the shift invariance of dualtree wavelets provides more stable energy estimates, especially near singularities or considerable oscillations [17]. Index the spatial domain as t ∈ T = {ti }ni=1 . Since the same analysis can be applied in each direction as required, we drop m and let yk [i] := log Ek,m (ti ) denote the log sample second moment of the wavelet magnitudes about the location ti and scale k. Throughout we will use dual-tree wavelets to compute the sample energy Ekm (t) but other wavelets or measures may be used to derive E without loss of generality. Given the power law, the log sample second moments will ideally follow the simple linear model yk [i] = kβ2 [i] + β1 [i], with slope β2 [i] = 2(Hi + 1), where Hi = H[ti ]. In practice, the energies at some of the finest (low SNR) and coarsest (poorly localised) scale levels are excluded from the regression. The corresponding least squares problem takes the form minβi yi − X0 β i 2 , with ⎤ ⎡ ⎡ ⎤ yk− [i] 1 k−

⎢ ⎢ ⎥ .. ⎥ , β = β1 [i] . yi = ⎣ ... ⎦ , X0 = ⎣ ... ⎦ i . β2 [i] 1 k+ yk+ [i]

3.1. Regularised Hurst estimation The generalised lasso framework can accommodate a spatially regularised version of Hurst estimation as follows. We rearrange the least-squares problem described in Section 2.2 as argminβ∈RKn y − Xβ2 with y = [y1T , . . . , ynT ]T ∈ RKn , β = [β T1 , . . . , β Tn ]T ∈ R2n , and ⎤ ⎡ X0 0 ⎥ ⎢ Kn×2n .. X = In ⊗ X0 = ⎣ , ⎦∈R .

Here, the 1st (k− −1) finest, and (k+ −1) coarsest, scale levels are discarded. The Hurst estimate is then simply obtained

0

X0

where ⊗ denotes Kronecker product and In is the n-by-n identity matrix. A modification of the so-called ‘2-d fused lasso’ penalty is used to impose spatial regularisation on both parameters β1 [i] and β2 [i]. For the case where we have defined β as above, this is achieved by designing an interlaced version of the usual 2-d penalty, namely D = [DT2 , DT2n0 ]T where D2 is the diagonal sub-matrix diag([1, 0, −1]) which performs the horizontal differences of the form βj [i] − βj [i + 1] and, likewise, D2n0 is the diagonal sub-matrix which performs the vertical differences βj [i] − βj [i + 2n0 ], and where n0 is the width of the image over which the β parameters are defined. The extra zeros between the +1 and −1 have the effect that the odd-numbered rows of the penalty matrix produce differences in the intercept parameter β1 [i] and the even-numbered rows produce differences in the slope parameter β2 [i]; hence the term ‘interlaced’. Like the usual 2-d fusion penalty, it can easily be seen that the interlaced version also has a rank equal to the number of columns which is less than the number of rows. As such,

2690

ICIP 2014

Mean abs. error

Mean abs. error

0.17 0.16 0.15 0.14 0.5

1

1.5

σ

2

2.5

Table 1. Mean absolute error and, in brackets, error standard deviation of the Hurst estimators: OLS, PF, and lasso.

0.13 0.12

3

0.1 0

10

(a) PF

20

λ

(b) Lasso

Fig. 1. Mean error of ‘curves4’, over all pixels and instances, and standard error (dotted lines), over the instances, with respect to smoothing parameter. Note the OLS error is 0.187.

Chequers

Curves3

Curves4

Curves5

ˆ OLS H

H 0

0.1

0.2

0.3

0.4

ˆ PF H 0.5

0.6

ˆ LASSO H 0.7

0.8

0.9

Frequency (x104)

Fig. 2. Mean Hurst estimates of four fractional Brownian surfaces (over 100 simulations each). 1st column: true Hurst; 2nd: OLS; 3rd: PF; 4th: LASSO 6

Curves5

4

ols pf lasso

2 0

Data chequers curves3 curves4 curves5

0.11

0.02 0.05 0.09 0.12 0.15 0.19 0.22 0.26 0.29 0.32 0.36 0.39 Absolute error

Fig. 3. Mean absolute error histograms of the OLS, PF, and lasso methods for the ‘curves5’ data. we can follow similar arguments to that of Tibshirani [14] to conclude that our interlaced fusion penalty is ‘generalised’ in the sense that cannot be reduced to the standard lasso problem.

4. EXPERIMENTS Two main sets of experiments were carried out: Hurst estimation and spatially adaptive, regularity-based denoising.

OLS 0.171 (0.045) 0.251 (0.183) 0.187 (0.070) 0.177 (0.481)

PF 0.145 (0.030) 0.181 (0.065) 0.145 (0.028) 0.159 (0.029)

Lasso 0.125 (0.022) 0.118 (0.025) 0.111 (0.019) 0.125 (0.021)

4.1. Estimation experiments The incremental Fourier synthesis method [19] favoured by the Fraclab toolbox [20] was used to simulate fractional Brownian surfaces. These were furnished with piecewise Hurst parameters by stitching together surfaces generated with the same underlying white noise process but different spectral slopes in the same way as [16]. This ensures that there were no ‘artificial’ artefacts caused by jumps from one piecewise region to the next. Four image types were designed and one hundred instances of each design were used for testing. The underlying Hurst parameters are depicted with respect to space in the first column of Figure 2. These are referred to here as ‘chequers’, ‘curves3’, ‘curves4’, and ‘curves5’. In keeping with the emphasis on the localised nature of our analysis all images were restricted in size to 64 × 64. Three methods were used to estimate the Hurst parameter, namely: (i) the usual least squares technique (OLS); (ii) least squares followed by post-filtering (PS)—convolution of β i by a 5 × 5 Gaussian filter; and (iii) the proposed generalised lasso, mentioned above. As well as Fraclab, we used Kingsbury’s DTCWT Matlab toolbox and Tibshirani’s genlasso R package. The finest and coarsest scale levels were discarded from the regression (k− = 2, k+ = 5) in all cases. Methods (ii) and (iii) require one parameter each to be set — the variance of the Gaussian filter and the ‘λ’ of the lasso. However, the results obtained here (cf. Fig 1) suggest that the optimal settings are very stable across instances of the same image type and that both methods are superior to least squares over very large intervals of the parameter values. To illustrate their realistic potential, we used three hold-out images to train the optimal parameter settings for each image type. Generally, if training data is not available, the smoothing parameters should set in accordance to how rapidly one expects, or wants, the Hurst parameter to vary. Columns 2-4 in Figure 2 illustrate the mean Hurst estimates of each method. The banding effect in the OLS method is due to the fact that, near the boundaries of the piecewise regions, the Hurst parameter estimation is disturbed by the conflicting statistics of the neighbouring regions. The PF method smooths these artefacts out but at the cost of spatial coherence. Only the lasso-based method appears, in the mean, to be able to convincingly cope with this effect. Furthermore, as Table 1 shows, lasso also obtains smaller error variance. Figure 3 shows the absolute error histograms of the methods for the ‘curves5’ image. Although not shown here due to space restrictions, the OLS errors follow a very similar

2691

ICIP 2014

distribution over all data types. This is perhaps not too surprising since it is a more localised method than PF or lasso. The lasso’s errors are smaller than PF which is smaller than OLS. The advantage of the lasso as well as the PF becomes steadily weaker as the Hurst exponent varies more. This is to be expected since spatial smoothing has less significance when regularity varies more rapidly in space. 4.2. Denoising Various strategies have been devised to exploit the Hurst or H¨older (regularity) exponent for denoising [7, 8, 9, 21, 22]. The general idea assumes that the signal of interest follows a power law and uses shrinkage to mitigate any significant deviation from that model. Often, the regularity is known or assumed, as in [9]. We follow, in spirit, the work of Echelard and L´evy-V´ehel [7] where the regularity is estimated and extend this by allowing the regularity to vary piecewise. The four image types were simulated as before. Highfrequency Gaussian noise was then added (in the wavelet domain to the finer scale levels: σ = 4 at k = 1 and σ = 2 at k = 2). Then, scale levels k = 3, . . . , 5 were used to estimate the Hurst parameter using OLS, PF, and lasso. Any (dual-tree) wavelet coefficients at levels k = 1 or k = 2 which had a magnitude above the estimated power law decay were shrunk to the expected value as defined by the power law model. Figures 4 and 5 illustrate a typical example on the ‘curves3’ data. For comparison, in addition to the OLS, PF, and lasso regularity-based shrinkage methods, a non-adaptive hard thresholding was also implemented which simply shrunk all coefficients in the first two finest scale levels to zero. It can be seen that the hard shrinkage over smooths the data. The optimal variance parameter for the PF method was so small that it gave almost identical results to the OLS method. The PF and OLS did better than hard shrinkage but, because the Hurst estimation is not as good as the lasso-based method, they under- and over-shrunk various parts of the images. As suggested by Table 2 and confirmed by the error histogram in Fig. 6, this lead to many more large errors than the lasso. Some of these artefacts are clearly visible in Figs 4 and 5. Table 2. Mean absolute error and, in brackets, error standard deviation of the regularity-based denoisers: OLS, PF, lasso, and hard thresholding. See also Fig. 6. OLS 0.200 (0.211) 0.169 (0.175) 0.190 (0.193) 0.189 (0.185)

PF 0.200 (0.211) 0.169 (0.175) 0.190 (0.193) 0.189 (0.185)

Lasso 0.163 (0.151) 0.135 (0.118) 0.154 (0.137) 0.153 (0.135)

Hard 0.373 (0.329) 0.354 (0.344) 0.523 (0.467) 0.356 (0.305)

noisy

hard

OLS

PF

Lasso

Fig. 4. Regularity-based denoising. The ‘curves3’ data is subjected to high frequency noise. Hard thresholding, together with the OLS, PF, and lasso regularity-based methods are used to denoise the data. idea to image reconstruction to arrive at a regularity-based denoising method which adapts to piecewise varying regularity. Much further work is possible: higher order differencers, (cf. [14]); anisotropic estimations (cf. [23]) and subsequent filtering; combination with data-driven methods (cf. [24]); adaptive basis selection (extending e.g. [25]); establish and compare theoretical results on convergence and conditions thereof (cf. other local regularity-based denoising e.g. [22]). noisy

Lasso

Fig. 5. Top: A typical lasso-regularity-denoised example from the ‘curves3’ image data, plotted with respect to vectorised pixel index. Bottom: A zoomed-in segment of the top figure 4

5. CONCLUSION We have introduced a new spatially regularised Hurst estimation method which exploits situations where regularity is constant over unknown regions in an image. We have shown that, by framing the problem in terms of the generalised lasso, the solution can be obtained with powerful methodology from computational statistics. We have furthermore translated this

original

original hard PF Lasso

Frequency

Data chequers curves3 curves4 curves5

original

10

OLS PF lasso

Curves3

3

10

2

10

1

10

0

10

0.2

0.4

0.6

0.8

1 1.2 1.4 Absolute error

1.6

1.8

2

Fig. 6. Mean absolute error histogram of the regularity-based denoising methods, OLS, PF, and the lasso.

2692

ICIP 2014

6. REFERENCES [1] I. Zachevsky and Y. Y. Zeevi, “Single-image superresolution of self-similar textures,” IEEE International Conference on Image Processing, 2013. [2] P. Abry, H. Wendt, and S. Jaffard, “When Van Gogh meets Mandelbrot: Multifractal classification of painting’s texture,” Signal Processing, vol. 93, no. 3, pp. 554–572, 2013. [3] J. Blackledge and D. Dubovitski, “A surface inspection machine vision system that includes fractal analysis,” International Society for Advanced Science and Technology, Journal of Electronics and Signal Processing, vol. 3, no. 2, pp. 76–89, 2008. [4] E. Bullmore, J. Fadili, M. Breakspear, R. Salvador, J. Suckling, and M. Brammer, “Wavelets and statistical analysis of functional magnetic resonance images of the human brain,” Statistical methods in medical research, vol. 12, no. 12, pp. 375–399, 2003. [5] P. Kestener, J. M. Lina, P. Saint-Jean, and A. Arneodo, “Wavelet-based multifractal formalism to assist in diagnosis in digitized mammograms,” Image Analysis and Stereology, vol. 20, no. 3, pp. 169–174, 2001.

[13] S. Jaffard, B. Lashermes, and P. Abry, “Wavelet leaders in multifractal analysis,” in Wavelet Analysis and Applications, M. I. Vai T. Qian and X. Yuesheng, Eds., Applied and Numerical Harmonic Analysis, pp. 201–246. Birkh¨auser, 2007. [14] R. J. Tibshirani and J. Taylor, “The solution path of the generalised Lasso,” Annals of Statistics, vol. 39, no. 3, pp. 1335–1371, 2010. [15] P. Embrechts and M. Maejima, “An introduction to the theory of self-similar stochastic processes,” Journal of Modern Physics B, vol. 14, pp. 1399–1420, 2000. [16] J. D. B. Nelson and N. G. Kingsbury, “Dual-tree wavelets for estimation of locally varying and anisotropic fractal dimension,” IEEE International Conference on Image Processing, pp. 341–344, 2010. [17] I. W. Selesnick, R. G. Baraniuk, and N. G. Kingsbury, “The dual-tree complex wavelet transform,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 123–151, 2005. [18] Y. She, “Sparse regression with exact clustering,” Electronic Journal of Statistics, vol. 4, pp. 1055–1096, 2010. [19] L. M. Kaplan and C. C. J. Kuo, “An improved method for 2-d self-similar image synthesis,” IEEE Transactions on Image Processing, vol. 5, no. 5, pp. 754–761, 1996.

[6] J. D. B. Nelson and N. G. Kingsbury, “Fractal dimension based sand ripple suppression for mine hunting with sidescan sonar,” International Conference on Synthetic Aperture Sonar and Synthetic Aperture Radar, 2010.

[20] Inria Saclay/Ecole Centrale de Paris Regularity Team, “Fraclab Toolbox,” http: //fraclab.saclay.inria.fr, 2014.

[7] A. Echelard and J. L´evy V´ehel, “Wavelet denoising based on local regularity information,” Proceedings of the European Signal Processing Program, 2008.

[21] E. Del´echelle, J.-C. Nunes, and J. Lemoine, “Empirical mode decomposition synthesis of fractional processes in 1D and 2D space,” Image and Vision Computing, vol. 23, no. 9, pp. 799–806, 2005.

[8] P. Flandrin, P. Gonc¸alv`es, and G. Rilling, “Detrending and denoising with empirical mode decompositions,” in Proceedings of the European Signal Processing Program, 2004, pp. 1581–1584.

[22] P. Legrand and J. L´evy V´ehel, “Local regularity-based image denoising,” IEEE International Conference on Image Processing, vol. 2, pp. 377–380, 2003.

[9] B. D. Vidakovic, G. G. Katul, and J. D. Albertson, “Multiscale denoising of self-similar processes,” Journal of Geophysical Research, vol. 105, no. D22, pp. 27049– 27058, 2000. [10] P. Abry, P. Gonc¸alv`es, and J. L´evy V´ehel, Scaling Fractals and wavelets, Wiley, 2009. [11] G. Brown, G. Michon, and J. Peyri`ere, “On the multifractal analysis of measures,” Journal of Statistical Physics, vol. 66, pp. 775–790, 1992. [12] P. Abry, P. Flandrin, M. S. Taqqu, and D. Veitch, Selfsimilarity and long-range dependence through the wavelet lens, pp. 527–556, Birkh¨auser, 2002.

[23] C. Nafornita and A. Isar, “Estimating directional smoothness of images with the aid of the hyperanalytic wavelet packet transform,” International Symposium on Signals, Circuits, and Systems, 2013. [24] L. Trujillo, P. Legrand, G. Olague, and J. L´evy V´ehel, “Evolving estimators of the pointwise H¨older exponent with genetic programming,” Information Sciences, vol. 209, no. 20, pp. 61–79, 2012. [25] O. Rioul, Ondelettes R´eguli`eres: Application a` la Compression d’Images Fixes, Ph.D. thesis, ENST, Paris, 1993.

2693

ICIP 2014