A Content Separation Image Fusion Approach: Toward ... - IEEE Xplore

4 downloads 0 Views 910KB Size Report
Vladimir Buntilov and Timo R. Bretschneider, Member, IEEE. Abstract—For many remote-sensing applications, it is desirable to have the best possible spatial ...
3252

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 10, OCTOBER 2007

A Content Separation Image Fusion Approach: Toward Conformity Between Spectral and Spatial Information Vladimir Buntilov and Timo R. Bretschneider, Member, IEEE

Abstract—For many remote-sensing applications, it is desirable to have the best possible spatial resolution in order to resolve fine features on the earth’s surface. At the same time, a high spectral resolution is needed to distinguish among different ground covers. This paper presents an approach that is based on separating the spatial and spectral characteristics of features extracted from the high-frequency undecimated wavelet transform components. The extracted content, which is expressed in terms of wavelet transform modulus maxima, is used to construct the wavelet coefficients, which reflect the fine features of the more highly resolved scene and are in agreement with spectral properties of the lower resolved multispectral bands. The method effectively sharpens multispectral data and preserves the spectral characteristics of features to a great extent. The experiments have shown that the proposed framework can be applied to fuse data sets of both highand low-correlated optical as well as near-infrared imagery of low m/p spatial resolution ratio, such as SPOT4. Index Terms—Image sharpening, spatial quality, spectral preservation, wavelet transform modulus maxima.

I. I NTRODUCTION

T

HE MAIN requirement for an image fusion algorithm in remote sensing is the spectral fidelity preservation of the data, i.e., the radiometric properties of objects have to remain unchanged. In the same context, possible artifacts in the fused product have to be avoided. However, recent comparative analyses of different image fusion techniques showed that, although the quality of the merged products has been greatly improved since the early days of image fusion, the problem of satisfying the aforementioned requirements is still open and requires further attention [1], [2]. Among numerous fusion techniques, multiresolutiontransform-based fusion approaches are of particular interest since they allow to control the amount of spatial information to be injected into the fused product. The wavelet transform, being a typical representative of multiresolution decomposition techniques, has been successfully employed for image fusion tasks over many years. The high-frequency components of a decomposed image (details) are considered to represent the spatial information of the scene. Hence, the wavelet

Manuscript received February 10, 2006; revised August 16, 2006. The authors are with the School of Computer Engineering, Nanyang Technological University, Singapore 639798 (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2007.903048

details of the more highly resolved panchromatic band are integrated in the less highly resolved multispectral bands in order to enhance their spatial resolution. However, many authors recognized that, without proper modification of these components, degradation of spatial features and color distortions in the fused product occur. Thus, recent waveletbased image fusion algorithms mainly focus on the adjustment of the high-frequency content of the more highly resolved band in order to achieve consistency with the spectral properties of the lower resolved bands [3]–[6]. In this paper, the degree to which the wavelet decomposition separates the spectral and spatial characteristics of features is investigated. Starting from the observation that a modification of the details usually changes both spectral as well as spatial characteristics of the reflected features, an ambiguity of the high-frequency wavelet coefficients is described. This finding is of particular interest since it contradicts the spectral preservation requirements in remote-sensing image fusion. Hence, there is a need for a method that modifies the details and recognizes and treats the spectral and spatial contents independently. In particular, this is of significance if the spectral similarity of the input bands is low. However, to our knowledge, a desirable degree of content separability is not achieved by any currently existing fusion technique. In this paper, a novel method for image fusion is presented, which is based on an extended separation of spectral and spatial contents using the undecimated wavelet transform (content separation fusion, CSF). The high-frequency wavelet coefficients of the input data are used to extract the spectral and spatial attributes of the observable features in the form of the wavelet transform modulus maxima. Afterward, synthetic detail coefficients, which contain the spatial information from the more highly resolved band and, moreover, are in agreement with the spectral properties of the lower resolved multispectral band, are constructed. Eventually, these details are combined with the approximation coefficients of the less highly resolved band in order to form the fused product. The conducted experiments have demonstrated that the proposed method is able to effectively sharpen both highly as well as weakly correlated sets of scenes. Furthermore, the spectral signatures of the features in the fused product are preserved, provided that the spatial resolution ratio of the merged data is within certain limits, e.g., SPOT4 multispectral/panchromatic imagery. The remaining part of this paper is organized as follows. Section II presents theoretical considerations regarding

0196-2892/$25.00 © 2007 IEEE

BUNTILOV AND BRETSCHNEIDER: CONTENT SEPARATION IMAGE FUSION APPROACH

3253

Fig. 1. Distortion of a modeled step edge. (a) Example for overshooting distortion. (b) Example of smoothing distortion. (c) Wavelet details’ coefficients of step edges in (a) and (d). Wavelet details’ coefficients of step edges in (b).

the proposed fusion method, while its implementation details are given in Section III. Section IV describes the experiments for the performance assessment, with the actual results shown in Section V. Finally, Section VI concludes this paper.

II. P ROPOSED T ECHNIQUE Since the pioneering work of image fusion, the performance of current algorithms has been greatly improved in terms of preserving the radiometry and sharpening the image at the same time. Nevertheless, the occurrence of introduced distortions in the spectral information is still an open issue and was addressed in recently proposed fusion methods [1], [2], [7]. In many real-world scenarios, image fusion is employed to enhance images for visual perception only, i.e., the sharpened product is used by human observers who benefit from the fact that subtle features appear more noticeable. Therefore, the processing goal is to sharpen the image to the most suitable extent while keeping the spectral distortions within reasonable levels. However, the largest benefit of fused data products is in the further utilization by other applications. However, the requirements imposed by these applications on the fused product may result in images that are not as sharp and visually appealing as images produced specifically for human perception. An example is given by the aforementioned spectral signature preservation since a violation can possibly create new classes in the fused product, which occur neither in the original multispectral data nor in reality.

A. Problem Analysis As shown by a number of investigations that are related to the performance of fusion algorithms, methods based on multiresolution analysis and, particularly, the wavelet transform as a typical representative of it are able to deal with spectral distortions more effectively than other techniques [2], [4], [7]–[10]. This is due to their capability to separate spectral information and spatial information to a certain degree. In spite of it, spectral distortions are still observable in waveletbased fused products, and both spectral quality and spatial quality greatly depend on the construction methods of the highfrequency coefficients. Examples for two typical distortions such as overshooting and smoothing, which may occur during wavelet-based fusion, are given in Fig. 1. The simulated 1-D responses of a highresolution (p) and a low-resolution (m) scanner imaging a step edge are shown in Fig. 1(a), whereby the intensity levels of p and m are different due to the differences in the sensors’ spectral responses. Edge profile i represents what a multispectral sensor with the spatial resolution capability of the panchromatic sensor would observe. The corresponding detail coefficients dm , dp , and di of the responses’ wavelet transform using the quadratic spline wavelet [11] are depicted in Fig. 1(c). The profile of f in Fig. 1(a) represents the fused signal if the approximation of the low-resolved m is directly combined with the details taken from the highly resolved p without any changes. The differences in the details of p and i cause two overshoots on both sides of the edge in the reconstructed signal. Nevertheless, it can be seen that the resolution has been increased since

3254

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 10, OCTOBER 2007

the edge is resolved sharper, i.e., has a steeper edge rise. Fig. 1(b) shows another example where the response of the high-resolution scanner is lower than the response of the lowresolution instrument. The corresponding wavelet transforms are depicted in Fig. 1(d). In this case, although the spatial resolution of p is higher, inserting the details without proper modification distorts the features and decreases the resolution significantly, as shown by the shape of f . In this context, the investigation found that the actual shapes of the distortions are specific for a chosen wavelet. It is worthwhile mentioning that, with increased dissimilarity of the spectral responses of the two underlying sensors, the distortion problem becomes more significant if adaptation of coefficients is not accounted for. From the examples in Fig. 1(c) and (d), it can be concluded that the values of the wavelet details of the more highly resolved signal can be either smaller or larger than the corresponding values of the less highly resolved signal. Thus, analyzing the values without taking the spectral properties into account cannot characterize the spatial resolution of the signal. At the same time, comparing the wavelet transform of m and i shows that the information about the spatial resolution can still be found in the detail coefficients. Although the spectral properties of these signals are identical, i.e., they differ only in sharpness of the edge, their wavelet transforms are different. The shape of dm is more stretched in the horizontal direction, and the maximum value of di is larger. This ambiguity of wavelet coefficients concerning spectral information and spatial information makes it difficult to construct the details, which then meet the requirements of preserving spectral as well as enhancing the spatial content. Hence, the solution for this problem lies in the further separation of spectral information and spatial information within the sets of wavelet coefficients. B. Spatial Information Edges of physical objects on earth are represented in the sensed data as sharp variations in pixels’ intensities convolved by the imaging system inherent point spread function. Hereafter, a 1-D spatial feature is defined as an interval of pixels where the data have certain variability. This definition can be straightforwardly extended to the 2-D case. Our previous work has shown that only regions with spatial variability need fusion intervention, while homogeneous areas preferably remain unchanged [12]. Therefore, wavelets are particularly useful for describing spatial features since the details’ coefficients tend to have nonzero values where variations occur within the scene. To characterize the spatial information of the features, their position and shape descriptors are established in the following manner: It is natural to define the position of a spatial feature as the location where it locally exhibits the largest degree of variability. In the proposed method, the high-frequency wavelet coefficients are used as an analog of the gradient. For further details, the work by Mallat et al. provides more theoretical considerations regarding the description of a signal’s variability by its undecimated wavelet transform [13]. In summary, it was shown that the wavelet transform modulus maxima W M occur at points of locally extreme variability, provided that

Fig. 2. Propagation of W M (marked by circles) across four levels of decomposition (D1 , . . . , D4 : wavelet transform details of signal S, A4 : approximation signal of the fourth level).

the wavelet function is chosen appropriately [14]. Therefore, in this paper, the positions of features at a given scale of decomposition are defined as the positions of the corresponding wavelet transform modulus maxima. It has to be noted that, for some types of objects, more than one local modulus maximum exist at the same decomposition level, e.g., for the bell-shaped feature in Fig. 2. In this case, it is assumed that this more complex structured feature is constituted of two (or more) simple features (for example step edges), for which only one maximum can be found. The next step is to represent the shape of a feature from its wavelet transform as the second descriptor of spatial information. Obviously, the actual shape of a feature is encompassed in both the low- and high-frequency components of the signal. The low frequencies are automatically taken into account in the approximation part of the wavelet transform, and for the fused image, they are usually taken “as is” from the multispectral band. Thus, hereafter, only the high-frequency part of a shape’s description is considered in more detail. Mathematically, an irregularity in a signal can be characterized by its Lipschitz exponent, which reflects the order of the polynomial, approximating the signal within the neighborhood of the irregularity [14]. Mallat et al. proved that the local shape of sharp variations can be characterized by the evolution of their wavelet transform modulus maxima that propagate across scales [11], [14]. For example, the magnitudes of W M corresponding to the sharp step edge in Fig. 2 do not change from scale to scale, which shows that the feature has a low regularity. Contrary, the W M magnitudes of the less sharply resolved step edge grow as the scale number increases, which indicates that the actual feature is smooth. Thus, the rate of how the W M , corresponding to the same feature, changes relatively from scale to scale can be used to describe the sharpness of a feature, which in this paper, is referred to as the highfrequency part of a feature’s shape description. As shown later in Section III-C, the proposed fusion method does not require explicit calculation of the features’ Lipschitz regularity, which significantly simplifies the implementation.

BUNTILOV AND BRETSCHNEIDER: CONTENT SEPARATION IMAGE FUSION APPROACH

3255

Fig. 3. Illustration of energy preservation concept for enhancing a 1-D step edge.

In summary, the high-frequency part of spatial information of a sharp variation can be characterized by the locations of the corresponding wavelet transform modulus maxima and their evolution through different scales. Although the W M approach is not the only concept that is suitable for spatial characterization, it was found to provide a very comprehensive framework and is therefore used in this paper. Other techniques include zero crossings of the wavelet transform [13], [15], extrema of Gaussian-filtered data [16], and zero crossings of Laplacian-ofGaussian-filtered signals [17]. C. Spectral Information The digital number of a pixel is linearly related to the integral over the energy reflected and emitted from the corresponding earth surface within a given spectral range. By virtually increasing the spatial resolution and keeping the spectral sensitivity untouched, the total energy received by the pixels corresponding to the analyzed area remains identical. An example for this energy preservation rule is presented for the 1-D case of a stepedge feature in Fig. 3. The original spatially low-resolved feature is depicted as profile A ABB . After virtually increasing the spatial resolution of the sensing system, the profile of the feature becomes A ADD BB . The total energy sensed by the scanner from the surface of the feature can be represented by the area under the feature’s profile and is the same for both the low and highly resolved cases. Accordingly, the spectral information of a feature is sufficiently described by the homogeneous DNs of the pixels before the feature’s spatial occurrence and by the difference of DNs after the occurrence, e.g., for the case shown in Fig. 3, the corresponding DNs of regions A A and BC. The outlined approach coincides with the wavelet transform splitting the original signal into a low-frequency component, which shows a general trend of DNs at a coarse level, and highfrequency components representing the variability of the data. It has been found that the magnitude of the wavelet transform modulus maxima is directly related to the DNs’ variability within the neighborhood of a corresponding singularity, provided that the transform is performed using the proper type of wavelet [11]. For the instance shown in Fig. 3, the value of BC is proportional to the value of the W M , which is assured by the linearity of the wavelet transform, given that the analyzing wavelet is sufficiently broad to cover the entire feature. The width of the wavelet can be altered by changing the decomposition level of the analysis. Consequently, the spectral information of features can be denoted by their W M values at

Fig. 4. Construction of the high-frequency wavelet coefficients D(f ) using spectral (spe(·)) and spatial (spa(·)) information, which was extracted from the wavelet detail coefficients D(p) and D(m).

the analysis level of decomposition since they characterize the total energy received by the sensor from the features’ regions. D. Fusion of Spectral and Spatial Contents In the preceding two sections, it was shown that, by using W M , it is possible to separate the spatial information and the spectral information of a signal contained in its wavelet detail coefficients. Thus, a fusion method can combine purely spatial information and spectral information from two sources in order to benefit from their complementary potential and minimize distortions. This concept is depicted graphically in Fig. 4, where D(p) and D(m) represent the high-frequency wavelet coefficients of the two signals, with spa(·) and spe(·) denoting the separated spatial information and spectral information of the corresponding details. Previously, a technique based on W M was used to merge multiple images of the same spatial resolution in order to construct a gray-level image, which includes edge information from all bands [18]. At the same time, a similar approach of combining multiscale edge graphs was proposed in [19]. These approaches, however, contradict the main objectives of image fusion in remote sensing [10]. In the proposed technique, the individual spatially low-resolved bands are sharpened by the more highly resolved band. The source with higher confidence in spatial content, i.e., the panchromatic image, is used to extract the positions and shapes of spatial features, while the spectral information is taken from the multispectral bands. The extracted spectral information and spatial information are fused, and the result is represented by a set of W M , which are used to construct the actual details employing a reconstruction algorithm [20]. The proposed technique is CSF. More details about the implementation of CSF are given in the next section. III. I MPLEMENTATION A. Analysis Level After the preprocessing of the images, i.e., resampling and alignment (more details in Section IV), the undecimated wavelet transform is applied using quadratic spline wavelet filters [11]. The redundant wavelet decomposition was chosen, instead of its critically sampled (decimated) counterpart, since

3256

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 10, OCTOBER 2007

C. Fusion of Spectral and Spatial Contents

Fig. 5. Extracting the spectral information of (blue) a resampled feature. (a) (Black) wavelet effectively covers the original step edge feature. (b) After resampling, the wavelet function needs to be dilated (as shown by arrows) in order to extract the spectral characteristic.

the former performs better if employed for image fusion [21]. The deepest level of decomposition, which, in this paper, is called the analysis level, is used to extract the spectral information of the features. The actual level is chosen according to the spatial resolution ratio of the image set, whereby Fig. 5 shows graphically the necessity for its careful adjustment: A feature in the original sequence can be analyzed by a certain wavelet, as represented in Fig. 5(a), but after resampling, as shown in Fig. 5(b), the original wavelet function, which is represented by the dashed line, no longer cover the whole feature. Thus, it is not able to extract the spectral characteristics. Instead, the analyzing function needs to be dilated (solid line in the same figure), whereby the degree of dilation is controlled by the decomposition level. On the other hand, the spatial support of the analyzing wavelet function has to be minimized in order to optimally capture small features. For example, it was found that, in case of a resampling factor of two, which is the case for SPOT4 bands, the second decomposition level is the best choice to be the analyzing level, while it changes to the third level for IKONOS imagery. B. Wavelet Transform Modulus Maxima Extraction In the 2-D case, the wavelet transform is performed by separable consecutive filtering operations along the rows and columns of an image by applying corresponding low- and high-pass filters. At each level, the high-frequency coefficients are the products of high-pass filtering and are denoted by {H,V } ({m, p}) with the vertical (H) or horizontal (V ) diWk rection of the corresponding single band (multispectral m or panchromatic p) taken at decomposition level k. The vertical and horizontal components can be recombined to form the 2-D magnitude Mk and angle Ak images:  2  2 Mk (·) = WkH (·) + WkV (·)   Ak (·) = arg WkH (·) + iWkV (·) .

(1)

The W M image of the corresponding level k is denoted as W Mk (·) and represents the points of local maxima of the corresponding Mk extracted from WkH (·) and WkV (·). The local maxima points are determined using Canny’s algorithm for edge detection, with k = 1, . . . , Na , where Na is the analysis level. The W Mk (·) image has nonzero values only at those pixels where Mk exhibits local maxima in the direction of the gradient given by Ak . For these locations, both magnitude and angle values are recorded.

After extracting W M1,...,Na (m) and W M1,...,Na (p), the spatial information and spectral information are fused, and W M1,...,Na (f ) is constructed, where f is the fused product. Level Na is used for analysis purposes only since the spatial resolutions of p and m at this level are identical. Therefore, the actual fusion is performed on levels 1, . . . , Na − 1. To simplify the fusion algorithm, the W Mk (·) images are split into their horizontal and vertical components, which allows to process the data rowwise and columnwise, respectively. The components are calculated by a simple masking operation given by {H,V } W M1,...,Na (·)

 =

{H,V }

where W M1,...,Na (·) = 0 . otherwise (2)

W1,...,Na (·), 0,

1) Spectral Signature Extraction: As shown in Section II-B, the spatial information of a feature is defined as the positions of W M points corresponding to this feature at different scales and their relative changes from scale to scale, which accounts for the Lipschitz regularity of a variation. {H,V } Let W M 1,...,Na ({m, p})s be a set of W M points at scales 1, . . . , Na , which corresponds to a certain image feature s, which is observable in both m and p. Since the spatial information of the fused feature is taken from p, it is reasonable to set 

   {H,V } {H,V } } ∗ W M (p) W M 1,...,Na −1 (f ) = λ{H,V s 1,...,Na −1 s

s

(3)

{H,V }

where λs are scaling coefficients. In this case, the spatial information from p is preserved since this operation does not create new maxima. Moreover, relative changes across scales remain untouched, thus keeping the feature’s Lipschitz regularity unchanged without explicitly calculating it. The coefficients {H,V } {H,V } are chosen in such a way that W M Na (f )s = λs {H,V }

W M Na (m)s , i.e., the spectral characteristic of the fused feature is the same as that of the original one at the analyzing level, as shown by     {H,V } {H,V } λH,V = W M Na (m) W M Na (p) . s s

s

(4)

Note that coefficient λs depends solely on the feature’s spectral characteristics in p and m and does not depend on the feature’s orientation. {H,V } corresponding to individIn the proposed method, all λs ual features are combined into an image Λ, which has the same size as the wavelet transform planes of the input images. Hence, nonzero values in Λ are the coefficients for multiplication with {H,V } the corresponding points of W M1,...,Na −1 (p) in order to obtain the W M of the sharpened band. Zero values in Λ indicate that, if there are local maxima points in W M1,...,Na −1 (m) at corresponding locations, then those have to be transferred to W M1,...,Na −1 (f ) without changes with the purpose of preserving both spatial and spectral properties of the multispectral band.

BUNTILOV AND BRETSCHNEIDER: CONTENT SEPARATION IMAGE FUSION APPROACH

3257

The horizontal and vertical components of W MNa (m) and W MNa (p) are processed individually, and the resulting values of λH and λV , which were calculated according to (4), are used to form ΛH and ΛV . Obviously, the spectral characteristic of a spatial feature does not depend on its orientation; thus, theoretically, nonzero λH and λV corresponding to the same feature should coincide in their values. Nevertheless, it was found that λH and λV might vary in values because of noise, misregistration errors, disturbances from other adjacent features, etc. In order to stabilize the estimation of the spectral properties, ΛH and ΛV are combined into a single Λ by selecting the coefficients from either ΛH or ΛV , depending on the orientation of the feature. In case of the diagonal orientation, Λ = (ΛH + ΛV )/2. It is clear from the definition of Λ{H,V } that, in order to cal{H,V } culate the ratios of the W M , first, the points of W MNa (m) {H,V }

and W MNa (p), which represent the same feature, have to be found. In the ideal case, these points are located at identical positions, which sometimes is violated in real-world imagery. Again, noise, slight misregistration of the imagery, and the influence of adjacent objects might disturb the actual locations {H,V } {H,V } of W MNa (m) and W MNa (p) corresponding to the same features. Therefore, in order to simplify the algorithm and to increase its robustness, the calculation of the two 2-D components Λ{H,V } is performed separately for each column or {H,V } row of W MNa ({p, m}). In particular, the algorithm is given {H,V }

as follows, where W MNa ({p, m}) denotes the column or row of the corresponding component: {H,V }

1) At the positions where the value of W MNa {H,V }

W MNa

(p) and

Fig. 6. Flowchart of W M construction for the fused image. W M are determined from the wavelet details W of m and p. The spectral signatures extracted from W M are combined into the Λe image and used to construct the W M of the sharpened product.

(m) are nonzero, the output value is equal to {H,V }

the ratio of W MNa

{H,V }

(m) and W MNa

(p). {H,V }

2) At the positions where the values of W MNa (p) are equal to zero, the output is zero. {H,V } 3) At the positions where the values of W MNa (p) are {H,V }

nonzero but the values of W MNa

(m) are equal to {H,V }

zero, search for the nonzero values of W MNa (m) within a two-pixel neighborhood in both directions. If the pixel with a nonzero value is found and it is not assigned {H,V } to any other point in W MNa (p) from step 1, then the output at the analyzing pixel is equal to the ratio of the {H,V } {H,V } corresponding pixels of W MNa (m) to W MNa (p). Otherwise, the output is zero. 2) Constructing the W M of the Fused Image: As can be seen from the construction algorithm, the zero values in Λ correspond to three cases. Case 1) Neither W MNa (p) nor W MNa (m) shows the presence of a feature. Case 2) W MNa (p) shows that a feature is present at the position, but the algorithm is not able to locate the corresponding feature in W MNa (m), i.e., the method failed to determine the spectral property of the feature. Thus, if there are nonzero values of W M1,...,Na −1 (m) present, they are taken to W M1,...,Na −1 (f ) without modification.

Case 3) W MNa (p) shows no presence of any features. In this case, if there are nonzero values in W M1,...,Na −1 (m) present, they are taken without any changes since the spatial properties of the feature cannot be determined from p. According to the model in (3), the nonzero values of Λ are {H,V } multiplied with the corresponding values of W M1,...,Na −1 (p) Thus, the maxima points representing individual features through scales 1, . . . , Na have to be determined. In order to simplify the algorithm, it was assumed that, if W M 1,...,Na (p)s are the nonzero points that represent a certain feature s, then W M 1,...,Na −1 (p)s are located within a small neighborhood of W M Na (p)s . Therefore, all the values of W M1,...,Na −1 (p) within this neighborhood are multiplied with the corresponding coefficient. For this a new image, Λe is used, in which every nonzero value of Λ is expanded to fill its neighboring pixels. The optimal size of the neighboring region was determined empirically and, for example, is taken as a circle with a radius of two pixels for the case of fusing SPOT4 data sets. Fig. 6 shows the diagram that represents the construction of the W M for the fused image, which is the core of the CSF approach. Finally, to compute the actual high-frequency wavelet coefficients of the fused image from W M1,...,Na −1 (f ), the iterative reconstruction technique developed by Liew and Law was applied [20]. It was found that ten iterations are sufficient for convergence.

3258

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 10, OCTOBER 2007

IV. E XPERIMENT To compare the performance of the proposed technique with other fusion methods, a set of SPOT4 multispectral and panchromatic images was selected. The scenes were acquired over Jakarta, Indonesia, in June 2004. First, the four multispectral bands (m1 : 0.50–0.59 µm, m2 : 0.61–0.68 µm, m3 : 0.78–0.89 µm, m4 : 1.58–1.75 µm) with a groundprojected pixel size of 20 m were resampled using bicubic interpolation to match the size of the more highly resolved panchromatic (0.61–0.68 µm) image p with a spatial resolution of 10 m. Afterward, the multispectral bands were coregistered with respect to p with subpixel precision using ENVI’s registration tool. The test scene included different types of ground covers such as forestry, plantations, marine environment, and built-up areas in order to analyze the fusion methods for a variety of spectral and spatial contents. The benchmarking was performed for ten fusion methods. 1) Wavelet-based fusion with full detail substitution rule (WT) [22]: Each band mi of the multispectral image was sharpened individually by combining the approximation of mi with the details of p. The same wavelet, as described in Section III, was used for analysis. Before the decomposition, histogram matching was applied to the panchromatic band. 2) Additive wavelet technique using “à-trous” decomposition method (AWT) [9]: The high-frequency wavelet plane, which was extracted from the histogram-matched band p, was added to each mi individually. 3) Gram–Schmidt spectral sharpening (GS), which was implemented in the ENVI package [23]: The Gram–Schmidt transformation was applied to the spatially low resolved image set with the consecutive replacing of the first transformation component by the spatially highly resolved panchromatic data. 4) Principal-component-analysis-based method (PCA) [24]: In this method, the first principal component pc1 of the decorrelated multispectral bands was replaced by the histogram-matched panchromatic band. 5) Intensity-hue-saturation-based method (IHS) [25]: In this method, the imagery was sharpened by replacing its intensity component I by the histogram-matched panchromatic band. Since IHS-based methods can only be applied to triplets of data, the four available bands were recombined into two groups of three components. The first group m421 was formed by the bands that are most correlated with respect to p, i.e., m4 , m2 , and m1 , while the second group m341 contained the least correlated m3 . 6) Combination of PCA- and wavelet-based techniques (PCA–WT) [26]: In this method, the first principal component pc1 of the decorrelated multispectral bands was replaced by its sharpened counterpart pc∗1 . To obtain pc∗1 , pc1 was spatially enhanced by p using the WT method. 7) Combination of IHS- and wavelet-based techniques (IHS–WT) [27]: In this method, the intensity component was sharpened by p using the WT method.

8) Proposed method CSF applied to each band mi individually. The second level of decomposition was used as analyzing level. 9) Modification of the proposed method called PCA–CSF, where the CSF technique was applied to sharpen the first principal component of the set m1,...,4 . 10) Modification of the proposed method called IHS–CSF, where the CSF technique was applied to sharpen the corresponding intensity component of the selected group of bands. The quality of the fused products was compared qualitatively and quantitatively. Both spatial and spectral characteristics of the fused images were taken into account. Since the formal quality assessment protocol [10] requires the “ideal” image to be compared with the fused products, both p and mi were degraded by a factor of two using ENVI’s aggregation algorithm and the fusion was repeated using the same techniques, as described previously. Thus, the original multispectral scenes served as the theoretical images, which would be sensed by the multispectral scanner if it had the spatial resolving ability of the panchromatic instrument. The following objective indexes were employed for the evaluation of spectral quality: bias; correlation (Cc); universal image quality index (Q) and its counterpart Qavg , which is calculated for a sliding window of the size 8 × 8 pixels and averaged over the entire scene [28]; root-mean-square error (rmse); and ERGAS [4], which accumulates the rmse for all available bands. The fused product was compared with the corresponding multispectral image in the case of originalresolution data merging and with the ideal image in downgraded data set fusion. It must be noted that, although the ERGAS index is band independent, for the sake of conciseness, it is placed together with the other indexes, which are calculated individually for each band. To assess the spatial quality of the merged data, the highfrequency information from the fused product and the reference image was compared by calculating the correlation and Qavg index between the high-pass-filtered images (Spatcc and SpatQ , respectively). In the original approach proposed by Zhou et al. [8], the panchromatic image served as reference image. However, in this paper, the ideal image was used, instead of p, since the specific scope of fusing weakly correlated cannot be assessed well by the original approach. Therefore, the indexes of spatial quality were calculated only for the case when the ideal spatial information was available, i.e., for the downgraded data set. All indexes were calculated both for the entire test scene as well as for the extracted 100 × 100 pixels subimages. Using small subimages allows restricting the fusion performance analysis to particular objects with specific characteristics of interest (color and shape). This enables the accentuation of typical positive and negative aspects of the tested fusion methods and makes the evaluation more specific. The quantitative evaluations presented in the next section are performed for the selected small subimages, while the complete results are publicly available on the authors’ website [29].

BUNTILOV AND BRETSCHNEIDER: CONTENT SEPARATION IMAGE FUSION APPROACH

3259

Fig. 7. Examples for fusion of highly correlated SPOT4 data. (a) 100 × 100 pixels subscene m2(1) of the original “red” multispectral band. (b) Corresponding subscene of the panchromatic band p. (c) IHS. (d) AWT. (e) WT. (f) CSF. (g) IHS–WT. (h) IHS–CSF. (i) PCA–WT. (j) PCA–CSF.

Fig. 8. Examples for fusion of weakly correlated SPOT4 data. (a) 100 × 100 pixels subscene m3(2) of the original near-infrared band. (b) Corresponding subscene of the panchromatic band p. (c) IHS. (d) AWT. (e) WT. (f) CSF. (g) IHS–WT. (h) IHS–CSF. (i) PCA–WT. (j) PCA–CSF.

V. R ESULT AND D ISCUSSION A. Spatial Quality As shown in Section II-A, the spatial quality of fused data depends greatly on the spectral differences between the input images. Therefore, the spatial quality of the fused products was evaluated separately for highly and weakly correlated data. Fig. 7 shows 100 × 100 pixels subimages of the fused multispectral band m2 , depicting Jakarta’s Bung Karno stadium. In order to differentiate between the entire scene of a certain band mi and the extracted subscenes, hereafter, the latter ones are denoted by adding an extra subscript in parenthesis to the band’s name, e.g., the aforementioned 100 × 100 pixels subim-

age of m2 is denoted as m2(1) . Note that m2 is highly correlated with p. For those techniques that fuse triplets of bands, i.e., IHS, IHS–WT, and IHS–CSF, the fused result for m2(1) was taken from the simultaneous fusion of the strongly correlated group m421 . Examining the region around the stadium, it can be concluded that all tested fusion methods increase the spatial resolution, although to a variable extent. The sharpest products are obtained by IHS, PCA, AWT, WT, and their combinations, while the methods that incorporate CSF give slightly less sharp results. The fused products of less correlated data are presented in Fig. 8. The shown taxi and runways of Soekarno–Hatta airport have an opposite spectral response in subscene m3(2) of the

3260

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 10, OCTOBER 2007

Fig. 9. Cross-sectional profiles of step edges extracted from the SPOT4 fused product. (a) Multispectral band m2 . (b) Multispectral band m3 . TABLE I SPECTRAL AND SPATIAL QUALITY ASSESSMENT OF THE FUSED RESULTS FOR SPOT4 SUBREGIONS (VALUES MEASURED WITH RESPECT TO ORIGINAL/DEGRADED RESOLUTION). VALUES HIGHLIGHTED IN BOLD SHOW THE TWO BEST PERFORMANCES FOR A PARTICULAR QUALITY INDEX. NA (NOT AVAILABLE) IS SHOWN IF THE FUSION TECHNIQUE WAS NOT EVALUATED

near-infrared band with respect to p. In this case, the visually sharpest results are produced by the methods that incorporate the proposed CSF technique, namely, CSF, IHS–CSF, and PCA–CSF. Methods based on WT, i.e., WT, IHS–WT, and PCA–WT, create serious spatial distortions near the edges of the ceiled surfaces and fail to sharpen the imagery. Fig. 9 demonstrates the spatial quality of the fused products more vividly, showing extracted profiles corresponding to the highly-correlated-with-p-band m2 and the weakly correlated m3 . Only two fusion methods were selected for visualization. As can be seen from Fig. 9(a), for the highly correlated band, the method based on the CSF technique provides less sharp results than the one based on WT. However, for the less correlated band, the CSF method is more suitable than the WTbased counterpart: The latter one does not enhance but distorts the spatial feature as shown in Fig. 9(b). B. Spectral Quality The spectral quality of the fused products was assessed individually with respect to the degree of correlation, for groups

m421 and m341 . In general, it was concluded from the experiments that, for highly correlated data, PCA and IHS distort the spectral characteristics most strongly. The WT technique distorts the colors less, while CSF, IHS–CSF, PCA–CSF, IHS–WT, and PCA–WT preserve the original spectral characteristics very well. For weakly correlated data, the spectral properties of the sharpened bands are greatly distorted by the IHS, PCA, and WT methods. IHS–WT and PCA–WT changed the original colors slightly less; however, the distortions are still very noticeable, and the results are far from satisfactory. On the contrary, those techniques based on CSF preserved the spectral properties of the original data very well. Quantitative evaluation of the spectral properties was conducted over all test regions for all four bands. Table I presents the results only for selected fusion techniques and for subareas shown in Figs. 7 and 8, i.e., the highly correlated case m2(1) and the weakly correlated instance m3(2) . The complete evaluation results can be found on the author’s website [29]. Since the original multispectral image itself bears the color information about the objects, comparing it with the corresponding fused scene reflects the spectral quality of the latter one. The first six

BUNTILOV AND BRETSCHNEIDER: CONTENT SEPARATION IMAGE FUSION APPROACH

3261

TABLE II SPECTRAL AND SPATIAL QUALITY ASSESSMENT OF THE FUSED RESULTS FOR IKONOS BLUE BAND SUBREGION. VALUES HIGHLIGHTED IN BOLD SHOW THE TWO BEST PERFORMANCES ACHIEVED FOR A PARTICULAR QUALITY INDEX

quality indexes in Table I account for the spectral quality, and the first value in each cell reflects the original-resolution data set fusion case. In order to pick the leaders among the fusion methods, the two best values for each quality index, which are the closest to the ideal value, were emphasized in bold individually for each test region. As can be seen from Table I, both for highly and less highly correlated cases, the CSF-based methods often outperform their analogs based on WT. While for some quality indexes, e.g., bias, Cc, and Q, the leadership is mostly shared between WT- and CSF-based techniques, other measures (rmse, ERGAS, and Qavg ) show a significant superiority of CSF-based methods. It must be noted that IHS, GS, and PCA techniques show unsatisfactory fusion results for both correlation cases. In general, the quantitative evaluation of the merged products’ quality is in agreement with visual inspections.

C. Quality Assessment of Spatially Degraded Data Set The second value in the cells of Table I represents the performance evaluation of the fused data set, which was previously spatially degraded by a factor of two. Similar to the results discussed in Section V-A and B, GS, IHS, and PCA methods are among the poorest in terms of ability to bring the downgraded data set to the original spatial resolution. The experiments on the downgraded data set reveal a trend, which was also observable during visual inspections of the fused nondowngraded data, but which was not clearly recognized in the quantitative assessment. Namely: the CSF-based fusion techniques are the undoubted leaders in case of weakly correlated data, while for highly correlated data, WT-based and AWT techniques outperform all other methods. This holds true for both the spectral quality and spatial quality of the fused products. The reason for this phenomenon is that, for the fusion of highly correlated data, the high-frequency details of the panchromatic image, which are later injected into the lower resolved multispectral bands, are already very close to the ideal high-frequency components. Thus, any alternation of these details deteriorates the result. Moreover, the “select-all” fusion rule [30] employed in this paper for WTbased and AWT techniques ensures a great stability and spatial congruence of the fusion results over the whole scenes since all the details are taken by definition from a single source.

D. Comments on the Quality Assessment Results From visual inspection of the fused products, it can be concluded that CSF-based techniques gives generally less sharp results than other fusion methods. The main reason for this is that the proposed method has to establish the correspondence between features in p and m in order to enhance the spatial information. Thus, if it fails to find this relation, only the spectral information and spatial information from m contribute to the fused product. Therefore, the images sharpened by CSFbased methods might contain problematic areas that totally coincide with the low resolved m. This unnatural patchy look is unfamiliar to human observers, particularly, when a prolonged physical object extends over both problematic and sharpened regions [29]. Another drawback of the CSF-based techniques is that the degree of accuracy, with which the spatial descriptors of a feature are determined, depends on the presence of other objects in the vicinity. The best accuracy is achieved in case of an isolated object. However, a strong influence of adjacent features changes the model of W M propagation across scales [14], thus making the proposed fusion model that was described by (3) and (4) incapable. Whether a spatial feature can be regarded as isolated or not was discussed in [31]. Therefore, the CSF-based methods show poor performance in areas in which spatial features are located close to each other, e.g., texture regions [29]. It must be noted that the degree of spatial and color distortion introduced by a fusion method differs from object to object. The selected subregions of interest were chosen to accentuate the advantages and drawbacks of the analyzed fusion methods. Thus, the provided numbers of quantitative comparison should be treated with care in order to represent the methods’ performance for the entire scene. E. Initial Experiments With IKONOS Data Set Taking into account the rising popularity of very high resolution data in different applications of remote sensing, the experiments were repeated for IKONOS data. The blue band was chosen since fusion techniques often fail to effectively sharpen it due to its low correlation with the panchromatic band. From the visual inspection of the fused products, it was concluded that the CSF-based methods give satisfactory results, although it is not obvious which technique performs best. As shown by quantitative evaluations presented in Table II, for most of the quality indexes, which were calculated for

3262

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 45, NO. 10, OCTOBER 2007

the selected subregion, the WT-based and AWT techniques outperform all other techniques, including those based on CSF. According to the authors’ observations and experience, the ratio of spatial resolution between the multispectral and panchromatic images, which is two and four for SPOT4 and IKONOS, respectively, has an influence on the performance of CSF-based techniques. The main reason for the deteriorated performance of CSF-based methods for the IKONOS data set is the increased number of problematic areas described in Section V-D. Although the most obvious possible solution for this problem might be to employ other fusion methods, e.g., WT-based methods, for these identified regions, the current quality-evaluation experiments focus on the assessment of “pure” techniques, i.e., the fusion methods are always applied to the data set individually, without mixing them. Therefore, at the current stage, it is not recommended to use CSF-based techniques alone for the fusion of data, if the spatial resolution ratio of m to p is larger than two. VI. C ONCLUSION A new method for image fusion, which is called CSF, has been proposed in this paper. The method relies on the separation of the spatial information and the spectral information of features. It has been shown that, in order to satisfy the requirement of spectral preservation, a careful separation of the features’ spatial and spectral characteristics is needed before the actual fusion. This is accomplished by the undecimated wavelet transform and utilization of the wavelet transform modulus maxima. The details of the wavelet transform are used to extract both spatial information and spectral information about the features. The spectral characteristics are determined at a sufficiently high decomposition level. Afterward, the extracted spatial information of the input images is integrated, taking proper account of the spectral content. To reconstruct the high-frequency wavelet coefficients, an iterative approach is used. Finally, the acquired details are injected into the lower resolved band, yielding an image that is in agreement with the goals of remote-sensing-related image fusion. Visual and objective performance evaluations of the proposed technique have been conducted using imagery acquired by SPOT4 and IKONOS. Both spectral and spatial qualities of the fused products were assessed. It can be concluded that, for the given data, the proposed method is able to preserve spectral information better than any other tested technique. In particular, this has been vividly demonstrated for weakly correlated bands. The CSF-based methods produce slightly less sharp results compared to other techniques, provided that the input images are highly correlated, whereas the experiments on the low-correlated set of bands have demonstrated that the proposed technique enhances the spatial resolution more effectively than other fusion methods. It was found that the CSF-based methods perform best on the data sets in which the spatial resolution ratio of the images to be fused is small (SPOT4), while for larger resolution ratios (IKONOS), the conventional WT-based or AWT techniques achieve better quality.

R EFERENCES [1] F. Laporterie-Dejean, H. de Boissezon, G. Flouzat, and M.-J. Lefevre-Fonollosa, “Thematic and statistical evaluations of five panchromatic/multispectral fusion methods on simulated PLEIADES-HR images,” Inf. Fusion, vol. 6, no. 3, pp. 193–212, Sep. 2005. [2] Z. Wang, D. Ziou, C. Armenakis, D. Li, and Q. Li, “A comparative analysis of image fusion methods,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1391–1402, Jun. 2005. [3] B. Aiazzi, L. Alparone, S. Baronti, and A. Garzelli, “Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 10, pp. 2300–2312, Oct. 2002. [4] T. Ranchin, B. Aiazzi, L. Alparone, S. Baronti, and L. Wald, “Image fusion-the ARSIS concept and some successful implementation schemes,” ISPRS J. Photogramm. Remote Sens., vol. 58, no. 1/2, pp. 4–18, Jun. 2003. [5] A. Garzelli and F. Nencini, “Interband structure modeling for Pansharpening of very high-resolution multispectral images,” Inf. Fusion, vol. 6, no. 3, pp. 213–224, Sep. 2005. [6] X. Otazu, M. Gonzalez-Audicana, O. Fors, and J. Nunez, “Introduction of sensor spectral response into image fusion methods: Application to wavelet-based methods,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 10, pp. 2376–2385, Oct. 2005. [7] A. Garzelli, F. Nencini, L. Alparone, B. Aiazzi, and S. Baronti, “Pansharpening of multispectral images: A critical review and comparison,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2004, vol. 1, pp. 81–84. [8] J. Zhou, D. L. Civco, and J. A. Silander, “A wavelet transform method to merge Landsat TM and SPOT panchromatic data,” Int. J. Remote Sens., vol. 19, no. 4, pp. 743–757, 1998. [9] J. Nunez, X. Otazu, O. Fors, A. Prades, V. Pala, and R. Arbiol, “Multiresolution-based image fusion with additive wavelet decomposition,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 3, pp. 1204–1211, May 1999. [10] L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,” Photogramm. Eng. Remote Sens., vol. 63, no. 6, pp. 691–699, Jun. 1997. [11] S. Mallat and S. Zhong, “Characterization of signals from multiscale edges,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 7, pp. 710– 732, Jul. 1992. [12] V. Buntilov and T. Bretschneider, “A fusion evaluation approach with region relating objective function for multispectral image sharpening,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2005, vol. 4, pp. 2830–2833. [13] S. Mallat, “Zero-crossings of a wavelet transform,” IEEE Trans. Inf. Theory, vol. 37, no. 4, pp. 1019–1033, Jul. 1991. [14] S. Mallat and W. L. Hwang, “Singularity detection and processing with wavelets,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 617–643, Mar. 1992. [15] Z. Berman and J. S. Baras, “Properties of the multiscale maxima and zerocrossings representations,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3216–3231, Dec. 1993. [16] A. Witkin, “Scale-space filtering: A new approach to multi-scale description,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Mar. 1984, vol. 9, pp. 150–153. [17] R. Hummel and R. Moniot, “Reconstructions from zero crossings in scale space,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 12, pp. 2111–2130, Dec. 1989. [18] P. Scheunders, “Multiscale edge representation applied to image fusion,” Proc. SPIE, vol. 4119, pp. 894–901, 2000. [19] S. G. Nikolov, D. R. Bull, C. N. Canagarajah, M. Halliwell, and P. N. T. Wells, “Fusion of 2-D images using their multiscale edges,” in Proc. IEEE Int. Conf. Pattern Recog., 2000, vol. 3, pp. 41–44. [20] A. W. C. Liew and N. F. Law, “Reconstruction from 2-D wavelet transform modulus maxima using projection,” Proc. Inst. Electr. Eng.—Vis. Image Signal Process., vol. 147, no. 2, pp. 176–184, Apr. 2000. [21] Y. Chibani and A. Houacine, “Redundant versus orthogonal wavelet decomposition for multisensor image fusion,” Pattern Recognit., vol. 36, no. 4, pp. 879–887, Apr. 2003. [22] D. Yocky, “Image merging and data fusion by means of the discrete 2-D wavelet transform,” J. Opt. Soc. Amer. A, Opt. Image Sci., vol. 12, no. 9, pp. 1834–1841, Sep. 1995. [23] C. A. Laben and B. V. Brower, “Process for enhancing the spatial resolution of multispectral imagery using pan-sharpening,” U.S. Patent 6 011 875, Jan. 4, 2000. [24] V. K. Shettigara, “A generalized component substitution technique for spatial enhancement of multispectral images using a higher resolution

BUNTILOV AND BRETSCHNEIDER: CONTENT SEPARATION IMAGE FUSION APPROACH

[25]

[26] [27] [28] [29] [30] [31]

data set,” Photogramm. Eng. Remote Sens., vol. 58, no. 5, pp. 561–567, May 1992. W. J. Carper, T. M. Lillesand, and R. W. Kiefer, “The use of intensity–hue–saturation transformations for merging SPOT panchromatic and multispectral data,” Photogramm. Eng. Remote Sens., vol. 56, no. 4, pp. 459–467, 1990. J. Li, Y. Zhou, and D. Li, “PCA and wavelet transform for fusing panchromatic and multi-spectral images,” Proc. SPIE, vol. 3719, pp. 369–377, 1999. A. S. Kumar, B. Kartikeyan, and K. L. Majumdar, “Band sharpening of IRS-multispectral imagery by cubic spline wavelets,” Int. J. Remote Sens., vol. 21, no. 3, pp. 581–594, 2000. Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE Signal Process. Lett., vol. 9, no. 3, pp. 81–84, Mar. 2002. [Online]. Available: http://www.ntu.edu.sg/home5/bunt0001/fusion.html H. Li, B. S. Manjunath, and S. K. Mitra, “Multisensor image fusion using the wavelet transform,” Graph. Models Image Process., vol. 57, no. 3, pp. 235–245, May 1995. L. M. Bruce, “Isolation criteria for the wavelet transform mod-max method,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 45, no. 8, pp. 1084–1087, Aug. 1998.

Vladimir Buntilov was born in Tambov, Russia, in 1981. He received the B.Sc. and M.Sc. degrees in applied physics and mathematics from Moscow Institute of Physics and Technology, Dolgoprudny, in 2002 and 2004, respectively. He is currently working toward the Ph.D. degree in computer engineering at Nanyang Technological University, Singapore.

3263

Timo R. Bretschneider (S’00–M’02) received the M.Sc. and Ph.D. degrees from the Technical University of Clausthal, Clausthal-Zellerfeld, Germany, in 1998 and 2001, respectively. From 1999 to 2000, he was a Researcher with the University of Canterbury, Christchurch, New Zealand, specializing in remote sensing. Since 2002, he has been an Assistant Professor with the Nanyang Technological University, Singapore, leading a research team in novel remote sensing approaches and is actively involved in the design of minisatellites. He is a Reviewer of reputable international journals and participates in various technical, organization, and steering committees of international conferences. He is the author or coauthor of numerous publications in satellite image processing, parallel architectures, and data retrieval.