An Unsupervised Classification Approach for

0 downloads 0 Views 57MB Size Report
Herein, a Wishart test statistic distance is proposed and used in order to pairwisely test the equality of cluster center coherency matrices and merge clusters into an appropriate number ...... det(A) = aei − afh − bdi + bfg + cdh − cge = 1 and.
An Unsupervised Classification Approach for Polarimetric SAR Data Based on the Chernoff Distance for Complex Wishart Distribution Mohammed Dabboor, Michael J. Collins, Senior Member, IEEE, Vassilia Karathanassi, and Alexander Braun Abstract

IN T

A new unsupervised classification approach for Polarimetric Synthetic Aperture Radar (POLSAR) data is proposed in this study. The Wishart-Chernoff distance is calculated and used in an agglomerative hierarchical clustering approach. Initial segmentation of POLSAR data into clusters is obtained based on the total backscattering power (SPAN) combined with the entropy, alpha angle and anisotropy. The complex Wishart clustering is performed to optimize the initialization. Optimized clusters with minimum Wishart-Chernoff distance are merged hierarchically into an appropriate number of classes. The appropriate number of classes is estimated based on the data log-likelihood algorithm. Classification results show that the use of Wishart-Chernoff distance is superior to the Wishart test statistic distance. The effectiveness of the proposed Wishart-Chernoff distance is demonstrated using Advanced Land Observing Satellite (ALOS) polarimetric SAR data.

I. I NTRODUCTION

PR EP R

Classification of land cover types within an image is one of the many important applications of polarimetric SAR imagery. Various classification techniques, supervised and unsupervised, have been developed for POLSAR classifications. Many of the proposed techniques are based on physical scattering mechanisms obtained from different polarimetric decomposition methods. Classification approaches which involve the analysis images produced by the Pauli [1] and Freeman-Durden [2] polarimetric decomposition methods were studied in [3], [4], [5], [6], [7]. Other approaches are based on the Cloude-Pottier decomposition method [1], [3], [8]. Classification of POLSAR data into eight classes based on the entropy and alpha angle (H/α) plane was proposed in [9]. Classification was improved in [10] by applying the complex Wishart classifier on the H/α space, and extended to sixteen classes by involving the anisotropy parameter in the classification approach [11]. An attempt to improve the classification of POLSAR data by applying the fuzzy concept in the H/α plane was presented in [12]. Herein, data are generally classified into a predefined number of classes by dividing the two-dimensional H/α or the threedimensional H/A/α space into a fixed number of subspaces [9], [10], [11], [12]. But in many cases, the predefined number of classes might not correspond to the appropriate number of classes in the POLSAR data. The agglomerative clustering technique can be used to overcome the prementioned drawback by the initial segmentation of the POLSAR data into small clusters that can be merged later into an appropriate number of classes (large clusters). The appropriate number of classes can be estimated using algorithms, such as the log-likelihood algorithm [13]. Criteria are used to measure the similarity of the clusters to be merged. In [14] a test statistic that estimates the equality of two cluster center coherency matrices and an associated asymptotic probability for obtaining a smaller value of the test statistic are derived. The proposed test statistic is applied successfully to change detection in POLSAR data. An agglomerative clustering scheme using the H/A/α and the additional information of the total backscattering power (SPAN) was discussed in [13]. Herein, a Wishart test statistic distance is proposed and used in order to pairwisely test the equality of cluster center coherency matrices and merge clusters into an appropriate number of classes. The data log-likelihood algorithm is used for estimating the appropriate number of classes. However, the distance criterion used for merging clusters affects the estimated appropriate number of classes. Moreover, the Wishart test statistic distance criterion depends on the number of the samples in the clusters, where clusters with low number of samples tend to be merged with clusters that have high number of samples. Thus, the Wishart test statistic distance is a reasonable criterion for clusters that have comparable numbers of samples. In this paper, the Chernoff distance is mathematically derived for the complex Wishart distribution (Wishart-Chernoff distance). The derived Wishart-Chernoff distance measures the similarity between two complex Wishart distributions and it is independent of the number of samples in each distribution. The Wishart-Chernoff distance is used as a distance criterion in an agglomerative clustering approach for unsupervised POLSAR classification. Agglomerative clustering is a bottom-up approach of merging clusters hierarchically into an appropriate number of large clusters (classes). The clusters to be merged result from a segmentation of the POLSAR data based on the H/A/α parameters and the SPAN. The data log-likelihood algorithm is used for the estimation of the appropriate number of classes. The estimated appropriate published in 2013: IEEE Transactions on Geoscience and Remote Sensing, 51(7), 4200–4213 Dabboor and Collins are with the Department of Geomatics Engineering, Schulich School of Engineering, University of Calgary, Calgary, Alberta, T2N 1N4, Canada, email: [email protected], [email protected] Karathanassi is with the Laboratory of Remote Sensing, School of Rural and Surveying Engineering, National Technical University of Athens, Heroon Polytechniou 9, Zographos, 15780, Greece, email: [email protected] Braun is with the Department of Geosciences, School of Natural Sciences and Mathematics, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX, 75080, USA, email: [email protected]

2

number of classes based on the Wishart-Chernoff distance is compared with the resulting number based on the Wishart test statistic distance. Furthermore, the unsupervised classification of POLSAR data using the Wishart-Chernoff distance criterion is investigated and compared with the classification results obtained based on the Wishart test statistic distance criterion. The reminder of this paper is organized as follows. In Section II basic concepts of radar polarimetry are briefly presented. The Wishart-Chernoff distance is mathematically derived in Section III. The proposed classification methodology is described in Section IV, while its implementation and the obtained results are discussed in Section V. Conclusions are provided in Section VI.

IN T

II. BASIC P OLARIMETRIC SAR C ONCEPTS A scattering matrix [S] can be transformed into a scattering vector ~kp using the basis set of Pauli spin matrices. The scattering vector in the general monostatic backscattering configuration is given by [1], ~kp = √1 [SHH + SV V , SHH − SV V , 2SHV ] (1) 2 where SHV is the scattering element of horizontal (H) transmitting and vertical (V ) receiving polarizations. Then, the n-look coherency matrix T can be written as, n

1 X~ T = kp (i) · ~kp∗ (i) n i=1

(2)

where ∗ denotes the complex conjugate transpose. By definition, the coherency matrix T is a 3 × 3 Hermitian positive semidefinite matrix. Based on the Cloude-Pottier decomposition method of the coherency matrix into eigenvalues and eigenvectors, three parameters can be derived [1]. The polarimetric entropy H (0 ≤ H ≤ 1) is defined by the logarithmic sum of the eigenvalues, 3 X

PR EP R H=−

Pi log3 Pi

(3)

i=1

P3 where Pi = λi / i=1 λi , and λi (i = 1, 2, 3) is the ith eigenvalue of the coherency matrix T . This parameter is an indicator of the number of effective scattering mechanisms that take place in the scattering process. The anisotropy A (0 ≤ A ≤ 1) describes the proportions between the secondary scattering mechanisms, A=

λ2 − λ3 λ2 + λ3

(4)

The anisotropy A yields additional information only for medium values of H because in this case secondary scattering mechanisms play role in the scattering process. The alpha angle α (0 ≤ α ≤ 90◦ ) represents the type of the scattering mechanism, α=

3 X

Pi αi

(5)

i=1

where cos(αi ) is the magnitude of the first component of the ith coherency matrix eigenvector, ei . III. W ISHART-C HERNOFF D ISTANCE D ERIVATION

A. Complex Wishart Distribution

Assuming that the target scattering vector ~kp follows the complex Gaussian distribution, the probability density function of the n-look sample coherency matrix T is given by [15] as p(T ) =

nqn |T |(n−q) exp{−n Tr(V −1 T )} k(n, q)|V |n

(6)

where q = 3 for the reciprocal case and q = 4 for the bistatic case, Tr is the trace of a matrix, and k(n, q) = π q(q−1)/2 Γ(n)...Γ(n − q + 1) (Γ is the Gamma function). V = E{T } is the expected value of the sample coherency matrix and can be estimated by, V =

N 1 X Ti N i=1

where N is the number of samples. and Ti is the ith sample coherency matrix.

(7)

3

B. Chernoff Error Bound Assume that a data vector x belongs to one of two possible classes: ω1 or ω2 . It is required to determine which one of the following two hypotheses is more likely to occur [16]: H1

:

x comes from the class ω1

H2

: x comes from the class ω2

It is usually assumed that data vectors belonging to each class ωi , i = 1, 2, follow a certain distribution p(x|ωi ). According to the Bayesian decision theory [17], the optimal decision rule is given by, H2 H1

IN T

< P (ω1 )p(x|ω1 ) > P (ω2 )p(x|ω2 )

(8)

where P (ωi ) is the a priori probability. A classification error occurs if a data vector x belongs to one class but falls in the decision region of the other class. The full calculation of the error probability is quite difficult, especially in high dimensions. This is because of the discontinuous nature of the decision regions [17]. However, in the two-category case, the Chernoff bound gives an upper bound of the error. For the complex Wishart distribution, the Chernoff bound is given by [17]. Z P (error) ≤ P β (ω1 )P (1−β) (ω2 ) pβ (T |ω1 )p(1−β) (T |ω2 ) dT (9) where β is a parameter, 0 ≤ β ≤ 1. The integral in (9) can be evaluated analytically, yielding [17], Z pβ (T |ω1 )p(1−β) (T |ω2 ) dT = exp{−f (β)}

(10)

PR EP R

where f (β) is a function of the parameter β, called the Wishart-Chernoff distance. The Wishart-Chernoff distance measures the similarity between two complex Wishart distributions. It can be calculated assuming that, L = pβ (T |ω1 ) p1−β (T |ω2 )

(11)

Taking into account (6), (11) can be written as follows: L =

=

=

nβqn |T |β(n−q) exp{−βn Tr(V1−1 T )} × k β (n, q)|V1 |βn

n(1−β)qn |T |(1−β)(n−q) exp{−(1 − β)n Tr(V2−1 T )} k (1−β) (n, q)|V2 |(1−β)n nqn |T |(n−q) exp{−βn Tr(V1−1 T ) − n Tr(V2−1 T ) + βn Tr(V2−1 T )} k(n, q)|V1 |βn |V2 |(1−β)n nqn |T |(n−q) exp{Tr(−βnV1−1 T − nV2−1 T + βnV2−1 T )} k(n, q)|V1 |βn |V2 |(1−β)n

(12)

Letting V = βnV1−1 + nV2−1 − βnV2−1 , (12) can be rewritten as L=

nqn |T |(n−q) exp{−Tr(V T )} k(n, q)|V1 |βn |V2 |(1−β)n

(13)

So, the integral in (10) according to [18], is equal to Z

pβ (T |ω1 )p(1−β) (T |ω2 ) dT

Z nqn |T |(n−q) exp{−Tr(V T )} dT k(n, q)|V1 |βn |V2 |(1−β)n nqn = nn(1−q) k(n, q)|V |−n k(n, q)|V1 |βn |V2 |(1−β)n nn |V |−n = |V1 |βn |V2 |(1−β)n  n n|V |−1 = exp{−f (β)} = |V1 |β |V2 |(1−β) =

Thus, the Wishart-Chernoff distance can be defined as,

(14)

4

 =

−n ln

=

−n ln

=

−n ln

=

−n ln

 n|V |−1 |V1 |β |V2 |(1−β) −1 ! n βnV −1 + nV −1 − βnV −1 1

2

2

|V1 |β |V2 |(1−β) −1 ! nn−q βV1−1 + (1 − β)V2−1 |V1 |β |V2 |(1−β) −1 −1 ! βV + (1 − β)V2−1 1 − n(1 − q) ln(n) |V1 |β |V2 |(1−β)

(15)

IN T

f (β)

In (15), the second term can be dropped provided that both q and n are constants. In addition, n in the first term is a scale parameter which can be omitted for simplification, and the Wishart-Chernoff distance for n-look processed polarimetric SAR becomes, −1 −1 ! βV + (1 − β)V2−1 1 (16) f (β) = − ln |V1 |β |V2 |(1−β)

PR EP R

The Chernoff distance is a distance measure between two distributions and can be used as a criterion for estimating the similarity between the respective probability densities. Since the simplified Wishart-Chernoff distance in (16) is independent of the number of looks, it can be applied to multi-look processed or speckle filtered POLSAR data. In addition, the WishartChernoff distance is symmetric: the distance between two complex Wishart distributions V1 and V2 respectively for β = β1 is equal to the distance between V2 and V1 for β = β2 , where β2 = 1 − β1 . Moreover, the Wishart-Chernoff distance is independent of polarization basis. As shown in Appendix A, the data in covariance matrices, coherency matrices, circular polarization matrices, would produce identical classification result. The optimum Wishart-Chernoff distance which best describes the similarity between two distributions is calculated by finding the value βopt that minimizes, −1 −1 βV + (1 − β)V2−1 1 g(β) = = exp{−f (β)} (17) |V1 |β |V2 |(1−β) The key benefit here is that this optimization is in the one-dimensional β space [17]. The value βopt corresponds to the value of β that minimizes the classification error probability. Simulated data are used to empirically study the behavior of the g(β) function. The parameter βopt is calculated using the Golden Section Search in one dimension sampling method proposed in [19]. Given that two distributions are identical, the two center coherency matrices have equal eigenvalues and eigenvectors and the Wishart-Chernoff distance is expected to be zero as shown in Fig. 1 (top). In this case, shown in Fig. 1 (bottom), the g(β) is a constant, independent of β with g(β) = 1. [Fig. 1 about here.]

Two distributions are assumed similar if the two center coherency matrices are similar matrices, produced using an orthogonal symmetric invertable matrix, which implies that they have equal eigenvalues but different eigenvectors. In this case, the βopt is found to be 0.5, see Appendix B and C. In this case, the overlap area between the two distributions is symmetric with respect to the optimal decision boundary. Herein, the Wishart-Chernoff distance depends on the rotation angle between the orthonormal eigenvectors of the first and the second center coherency matrices. For the two distributions in Fig. 1 (top), for example, the Wishart-Chernoff distance is small with g(βopt ) close to one for βopt equal to 0.5, Fig. 1 (bottom). Two dissimilar distributions are assumed if they have different eigenvalues and different or equal eigenvectors. In this case, the βopt differs from 0.5. In Fig. 1 (bottom), βopt is 0.25 and the g(βopt ) is close to zero, i.e. the Wishart-Chernoff distance between the two distributions is larger as shown in Fig. 1 (top). The true Chernoff bound that gives the optimum upper bound of the classification error probability is obtained based on the estimated βopt h in P (error) ≤ P βopt (ω1 )P 1−βopt (ω2 ) n(1−q) g(βopt ) (18) IV. M ETHODS A. Study Area and Data The study area in our research is an area of western Somerset and eastern Devon in the United Kingdom, on the south shore of the Bristol Channel. An Ordnance Survey map of the study area is shown in Fig. 2a. The towns of Minehead, in Somerset,

5

and Tiverton and Cullompton, in Devon, are indicated on the map. In our study, Advanced Land Observing Satellite (ALOS) full polarimetric SAR data (Level 1.1 Quad Polarimetric Mode), acquired in May 2007 with slant range resolution 9.5 m and azimuth resolution 4.5 m, are used. In order to reduce speckle noise, Lee’s POLSAR speckle filter [20], which uses a multiplicative noise model and a 7 × 7 directional window, was applied on the initial POLSAR image. The application of the Lee speckle filter does not interfere with the estimation of the similarity between the clusters. An RGB representation of the despeckled ALOS scene is shown in Fig. 2c. The dimension of the ALOS scene is 1248 × 18432 pixels. In Fig. 2b, an AVNIR-2 RGB optical image (acquired in June 2006 with 10 m resolution) of the study area is presented. Using this optical image and the map, we identified several land cover types which will be the focus of attention in the study.

IN T

[Fig. 2 about here.] B. Experimental Methods

1) Initialization and Calculation of Appropriate Number of Classes: The purpose of this study is to present an unsupervised agglomerative classification algorithm based on the derived Wishart-Chernoff distance. The initialization for the agglomerative hierarchical clustering can be obtained by dividing POLSAR data into clusters based on the SPAN/H/A/α space. The SPAN is equal to the trace of the coherency matrix T , SPAN = T11 + T22 + T33

(19)

PR EP R

where T11 , T22 and, T33 are the diagonal elements of the coherency matrix. As suggested in [13], a maximum of 48 optimal initial clusters, which is equal to the initial number of classes for the agglomerative hierarchical clustering, can be obtained by performing the following three steps: 1) Dividing the SPAN histogram into three clusters: low density, medium density, and high density, 2) Further division into 24 clusters based on the H/α plane, and 3) Using the anisotropy A to divide the 24 clusters into 48. After each step, the complex Wishart clustering is performed to optimize the initialization. Optimized initial clusters are merged hierarchically into an appropriate number of classes. Clusters are merged iteratively based on the derived Wishart-Chernoff distance. For each iteration, the distances of all possible pairs of clusters are calculated, and the two clusters with the minimum distance are merged to decrease the number of clusters by one. After each merging, the complex Wishart clustering is performed in order to meliorate the locations of the cluster centers and cluster center coherency matrices are recalculated. Merging continues until the appropriate number of classes is obtained. In this paper, the appropriate number of classes is estimated based on the data log-likelihood algorithm [13]. The mth data log-likelihood is defined as, !  N m  X X Ni · exp{−n · dm (Tl , Vi )} (20) Lm (X) = ln N i=1 l=1

where m is the number of clusters, Ni is the number of samples within the ith cluster, N is the number of samples within the whole data set and dm (Tl , Vi ) measures the distance between the lth sample coherency matrix Tl and the ith cluster center coherency matrix Vi dm (Tl , Vi ) = ln |Vi | + Tr Vi−1 Tl



(21)

The data log-likelihood algorithm has the potential ability to reveal the inner structure of POLSAR data [13]. This is because it quantitatively measures the fitness between the number of the clusters and the POLSAR data inner structure. A log-likelihood value is calculated for each number of clusters m, starting from m equal to the number of clusters produced by the SPAN/H/A/α initialization until m = 1. 2) Testing: Estimating the accuracy of unsupervised classification algorithms is not straightforward. One cannot compare reference pixels to classified pixels as in the case of supervised classification, since unsupervised classifiers do not generate defined classes. One must compare clusters to reference classes. In our study we will follow the approach taken by [21] to roughly estimate accuracy. Using the map, shown in Fig. 2a, blocks of reference pixels are identified. We then examine the classified images and calculate the percent error for each reference class. For our qualitative evaluation we examine four types of land cover: urban (Minehead, Tiverton, and Cullompton); forest (coniferous and non-coniferous); streams and agricultural fields. For quantitative analysis we examine four classes: forest, ocean, natural grasslands and urban. V. R ESULTS The proposed unsupervised classification approach is applied on the L-band ALOS POLSAR data of the study area. In this case study, the initial division of the data based on the SPAN/H/A/α produces 40 clusters. The appropriate number of classes is calculated based on the data log-likelihood algorithm. As discussed in [22], as m decreases the data log-likelihood initially

6

flattens until a certain point, called “elbow”, from which the data log-likelihood monotonically decreases. This point suggests that the appropriate number of clusters has been reached. Fig. 3 gives the data log-likelihood, when the Wishart-Chernoff distance is used to merge clusters. [Fig. 3 about here.] As shown in Fig. 3, the point in the graph where the flattening stops and the decreasing starts is found for m = 18, which corresponds to the appropriate number of classes. For comparison, the same initial clusters produced based on the SPAN/H/A/α are used as input data for agglomerative clustering based on the Wishart test statistic distance [13],

where

IN T

d(V1 , V2 ) = (N1 + N2 ) ln |V | − N1 ln |V1 | − N2 ln |V2 |

(22)

PN1 +N2 V =

Ti i=1 N1 + N2

(23)

PR EP R

N1 and N2 are the number of samples in the two considered clusters with cluster center coherency matrices V1 and V2 , respectively. The appropriate number of classes is calculated based on the data log-likelihood algorithm and using the Wishart test statistic distance as a criterion for cluster merging. The data log-likelihood results are shown in Fig. 3. Here, the appropriate number of classes is determined as m = 8. Although the data log-likelihood algorithm depends on the number of the clusters, the distance criterion plays an important role in the estimation of the appropriate number of clusters. In the case of Wishart-Chernoff distance, the estimated number of classes is much higher than the estimated number based on the Wishart test statistic distance. This is because the WishartChernoff distance is independent of the number of samples in the clusters. This has the advantage of preserving detailed information which leads to the discrimination of small thematic land cover types. In the case of the Wishart test statistic distance, the number of samples acts as a weight and the estimated distance is reduced when there is a large difference in the number of samples in the two clusters. This results in a loss of detailed information. The Wishart-Chernoff distance-based classification of the entire study area is shown in Fig. 2d. From Fig. 2d, different regions (shown within boxes in Fig. 2d) are selected and classes are compared with the classes resulted from the classification based on the Wishart test statistic distance. Furthermore, the resulting classes from the two methods are compared with the corresponded surface types appear in the optical image and the map of the study area. Comparing the classification results based on the Wishart-Chernoff distance and the Wishart test statistic distance with map of the study area, more classes can be seen in the high-energy returns associated with urban and forest areas for the case of Wishart-Chernoff distance. Urban: Three main classes are obtained for the urban areas based on the Wishart-Chernoff distance, as shown in Fig. 4. On the other hand, one class is obtained based on the Wishart test statistic distance. Comparing the two results with the urban blocks appearing in the optical image and area map, it is obvious that urban areas are better captured using the Wishart-Chernoff distance. [Fig. 4 about here.]

Forest: The forest results are shown in Fig. 5. The forested area discriminated by the Wishart test statistic (bottom left) is shown as sea green and agrees well with the map and optical image. This polygon includes both coniferous and nonconiferous stands. The non-coniferous stands are a lighter green color on the outer edges of the forest in the optical image. The Wishart-Chernoff distance is able to separate non-coniferous (dark green) from coniferous (sea green) as seen in the bottom right. [Fig. 5 about here.]

Streams: Results for two streams are shown in Fig. 6. The optical image and the map of the study area indicate the presence of streams, which can be represented as a small class (brown) using the Wishart-Chernoff distance. However, streams are misclassified as forest and urban areas using the Wishart test statistic distance. The results obtained from the Wishart test statistic distance suggest that rivers are covered by trees and canopy as they are classified as forest. However, our method shows that this argument, while reasonable in principle, is not valid as the rivers are detectable using the Wishart-Chernoff distance clustering. [Fig. 6 about here.] Agricultural Fields: The study area contains various agricultural fields and croplands with different types of crops shown in Fig. 7. Assigning the resulting classes to thematic types of crops was not possible for two reasons. First, agricultural fields and croplands are not mapped in the available study area map. Second, the optical image of the study area was taken one year before the polarimetric data collection. Thus, the comparison is not reasonable because the type of crops might have changed. Four main classes (yellow, orange, light green and green) correspond to agricultural fields and croplands are obtained using the

7

Wishart-Chernoff distance. Using the Wishart test statistic distance, three classes (yellow, light green and green) are extracted. Different areas are misclassified as natural grassland (cyan). These areas appear as noise with approximately linear shapes. [Fig. 7 about here.]

IN T

An accuracy assessment of the performance of the two distances for POLSAR classification can be performed using test samples. These test samples can be selected for the main surface types by labelling pixels of the POLSAR data using the optical image and the map of the study area as guide. Thus, test samples for the following cover types are indicated in Fig. 2c as polygons; each polygon is surrounded by a larger rectangle to help the reader locate them. Ocean - orange; two types of the forest - green and sea green; natural grassland - cyan; and urban blocks - red. The percentage of the classification error for the selected test samples is calculated for each classification result. Table I presents the relative classification errors for the Wishart-Chernoff distance and the Wishart test statistic distance. The classification error for the both methods on the ocean is zero. The classification error of the urban areas is significantly higher using the Wishart test statistic distance, in comparison to the Wishart-Chernoff distance. The Wishart test statistic distance appears to have better classification results (zero classification error) than the Wishart-Chernoff distance (7.25% classification error) for the coniferous forest type. However, this is not true because in the case of Wishart test statistic distance the two types of the forest are merged in one class. This is why the classification error for the non-coniferous forest type is 100% in the case of Wishart test statistic distance. The overall classification accuracy using the Wishart-Chernoff distance is 82.6%, while for the case of Wishart test statistic distance is 54.6%. [TABLE 1 about here.] VI. C ONCLUSIONS

PR EP R

The proposed Wishart-Chernoff distance is a matrix distance based on the probability densities of the complex Wishart distribution. The derived Wishart-Chernoff distance is always non-negative, with zero distance occurring only if the probability distributions are identical. An unsupervised agglomerative approach for ALOS POLSAR data classification is discussed in this paper. POLSAR data are divided into clusters based on the SPAN/H/A/α space. The Wishart-Chernoff distance is used to merge clusters hierarchically into an appropriate number of classes. The data log-likelihood algorithm is used to calculate the appropriate number of classes. Although the data log-likelihood algorithm estimates the appropriate number of classes taking into account the inner data structure, the obtained appropriate number differs according to the distance criterion used to merge the clusters. Both, the Wishart-Chernoff distance and the Wishart test statistic distance exploit the nature of the complex Wishart distribution. However, the Wishart-Chernoff distance tends to provide a higher number of classes than the Wishart test statistic distance, preserving more detailed information of the POLSAR data, which corresponds to detailed land cover types. The fact that the Wishart-Chernoff distance is independent of the number of samples makes it much more robust distance criterion. Promising classification results are obtained using the Wishart-Chernoff distance, including details as well as uniform areas and a relative large number of classes. However, the classes remain consistent and are not mixed in homogeneous areas. Thus, the proposed Wishart-Chernoff distance can be applied when detailed information are of interest. The comparison of the Wishart-Chernoff distance criterion with other distance criteria, such as the Manhattan distance and the Bartlett distance will be subject of future work. A PPENDIX A

Below, we prove that the Wishart Chernoff distance is independent of the polarization basis. Assume a scattering vector v, expressed in a new polarization base, which is related to the scattering vector u expressed in the original polarization base by, v = Pu

(24)

where P is a constant matrix. Then, a multi-look coherency matrix can be formed as, n

Z=

1X v(k) · v(k)∗ = P T P ∗ n

(25)

k=1

where

n

T =

1X u(k)u(k)∗ n k=1

andu(k) and v(k), k = 1, ..., n are the scattering vectors in the original and new polarization bases, respectively. So, the ith cluster center coherency matrix can be calculated by Mi =

Ni Ni 1 X 1 X Zi = P Vi P ∗ where Vi = Ti Ni i=1 Ni i=1

(26)

8

Thus, the Wishart-Chernoff distance between two clusters M1 = P V1 P ∗ and M2 = P V2 P ∗ is,

f (β)

= − ln = − ln = − ln

! β(P V1 P ∗ )−1 + (1 − β)(P V2 P ∗ )−1 −1 |P V1 P ∗ |β |P V2 P ∗ |(1−β) ∗ −1 −1 −1 −1 ! βP V P + (1 − β)P ∗ −1 V −1 P −1 1

2

|P V1 P ∗ |β |P V2 P ∗ |(1−β) −1 ! ∗ −1  −1  P βV1 + (1 − β)V2−1 P −1 |P V1 P ∗ |β |P V2 P ∗ |(1−β)

(27)

f (β)

= − ln = − ln = − ln

IN T

Since |AB| = |A| |B|, the Wishart-Chernoff distance becomes

∗ −1 −1 −1 −1 −1 −1 ! βV P P + (1 − β)V2−1 1 |P |β |V1 |β |P ∗ |β |P |(1−β) |V2 |(1−β) |P ∗ |(1−β) −1 −1 −1 ! ∗ −1 −1 −1 βV P P + (1 − β)V2−1 1 |P ∗ | |V1 |β |V2 |(1−β) |P | −1 ! −1 βV + (1 − β)V −1 1

2

|V1 |β |V2 |(1−β)

(28)

P as a constant matrix that represents a basis change must be unitary matrix that satisfies the following two conditions: P −1 = P ∗ and |P | = 1.

PR EP R

A PPENDIX B Herein, we prove that, in the case of similar complex Wishart distributions, the optimal value of β is equal to 0.5. Assume two similar complex Wishart distributions where V1 and V2 are Hermitian positive semidefinite similar matrices (V1 and V2 have equal eigenvalues but different eigenvectors). Then, V2 = U −1 V1 U

(29)

where U is an orthogonal symmetric invertible transformation 3 × 3 matrix. Thus, g(β)

=

=

=

=

|βV1−1 + (1 − β)V2−1 |−1 |V1 |β |V2 |(1−β)

|βV1−1 + (1 − β)(U −1 V1 U )−1 |−1 |V1 |β |U −1 V1 U |(1−β) −1 −1 βV + U −1 V1−1 U − βU −1 V1−1 U 1 |V1 | 1 βI + V1 U −1 V −1 U − βV1 U −1 V −1 U 1 1

(30)

where I is the identity matrix. Let,



V1 U −1 V1−1 U = V1 V2−1

Then g(β) can take the following form,

g(β) =

a = d g

b e h

 c f =A i

1 |β(I − A) + A|

The matrix A is complex with the following properties (see Appendix C) for the proof of these two properties, 1) The determinant of A is equal to one: det(A) = 1, 2) The trace of A Tr(A), is a real number where Tr(A) = Tr(A−1 ) =

1 Tr (adj(A)) = Tr (adj(A)) det(A)

(31)

9

where adj(A) is the transpose of the matrix of cofactors of A. This implies that,      a b e f a Tr(A) = a + e + i = det + det + det d e h i g

c i



Thus the denominator of (31) can be written as       1 0 0 a b c a b c |β(I − A) + A| = β  0 1 0  −  d e f  −  d e f  0 0 1 g h i g h i   β + a(1 − β) (1 − β)b (1 − β)c β + e(1 − β) (1 − β)f  =  (1 − β)d (1 − β)g (1 − β)h β + i(1 − β)

|β(I − A) + A| = (β + a(1 − β)) β 2 + e(β − β 2 ) + i(β − β 2 ) + ei(1 − β)2 − f h(1 − β)2  − b d(1 − β)2 β + di(1 − β)3 − f g(1 − β)3  + c dh(1 − β)3 − g(1 − β)2 β − ge(1 − β)3 Consequently,

(33)

IN T

So, the determinant of the denominator can take the form,

(32)



(34)

|β(I − A) + A| = β 3 + e(β 2 − β 3 ) + i(β 2 − β 3 ) + ei(1 − β)2 β − f h(1 − β)2 β

+ a(1 − β)β 2 + ae(1 − β)2 β + ai(1 − β)2 β + aei(1 − β)3 − af h(1 − β)3 − bd(1 − β)2 β − bdi(1 − β)3 + bf g(1 − β)3

+ cdh(1 − β)3 − cg(1 − β)2 β − cge(1 − β)3

PR EP R

Taking into account that

(35)

det(A) = aei − af h − bdi + bf g + cdh − cge = 1

and

Tr(A) = a + e + i = ae − bd + ei − f h + ai − cg

The determinant takes the following form,

|β(I − A) + A| = β 3 + Tr(A)β 2 − Tr(A)β 3 + Tr(A)(1 − β)2 β + (1 − β)3

(36)

So, in order to find the optimal value of β which minimizes the function g(β) we need to find the value of β for which the derivative of the denominator is equal to zero. Thus, we set, d |β(I − A) + A| = 3β 2 + 2Tr(A)β − 3Tr(A)β 2 + Tr(A)(1 − β)2 − 2Tr(A)(1 − β)β − 3(1 − β)2 = 0 dβ

Thus,

3β 2 + 2Tr(A)β − 3Tr(A)β 2 + Tr(A) − 2Tr(A)β + Tr(A)β 2 − 2Tr(A)β + 2Tr(A)β 2 − 3 + 6β − 3β 2 = Tr(A) − 2Tr(A)β − 3 + 6β = 0

This leads to ,

β(6 − 2Tr(A)) = 3 − Tr(A) ⇒ β =

3 − Tr(A) = 0.5 2(3 − Tr(A))

(37)

(38)

The critical reader should note that this case is a special case of the general polarimetric SAR basis transformation where V1 and V2 are related under orthogonal similarity transformation. Let V1 = E1 Λ1 E1∗ and V2 = E2 Λ2 E2∗ be two similar matrices where Ei (i = 1, 2) are the complex-valued matrices of the eigenvectors and Λi (i = 1, 2) are diagonal matrices of the eigenvalues where Λ1 = Λ2 . By definition, the following identities hold: E ∗ E = EE ∗ = I and E −1 = E ∗ . Thus, whenever the eigenvalues are equal, i.e. Λ1 = Λ2 , we find that, V1 = E1 Λ1 E1∗ = E1 (E2∗ E2 )Λ1 (E2∗ E2 )E1∗ = (E1 E2∗ )E2 Λ1 E2∗ (E2 E1∗ ) = (E1 E2∗ )V2 (E1 E2∗ )

This implies that the general eigenvector transformation is given by the matrix (E1 E2∗ ) and its Hermitian conjugate, (E1 E2∗ )∗ = (E1 E2∗ )−1 . This transformation is more general than the discussed similarity transformation in this study, where the transformation matrix is not only orthogonal but also symmetric. A PPENDIX C Here we prove the two properties of the matrix A mentioned in Appendix B.

10

1) det(A) = 1 We define A such that, A = V1 V2−1 = V1 U −1 V1−1 U

(39)

det(A) = det(V1 V2−1 ) = det(V1 U −1 V1−1 U )

(40)

Thus, So, det(A) = det(V1 )det(U −1 )det(V1−1 )det(U ) = det(V1 )

1 1 det(U ) =1 det(V1 ) det(U )

(41)

IN T

2) Tr(A) = Tr(A−1 ) Since U is an orthogonal matrix, the transpose of U is equal to its inverse: U T = U −1 . Furthermore, the matrix U is symmetric, U = U T . This leads to U = U T = U −1 . The trace of A can be written as, Tr(A) = Tr(V1 V2−1 ) = Tr(V1 U −1 V1−1 U ) = Tr(V1 U V1−1 U ) The trace of A−1 can be written as,

    Tr(A−1 ) = Tr (V1 V2−1 )−1 = Tr (V1 U −1 V1−1 U )−1 = Tr(U −1 V1 U V1−1 ) Since U = U −1 , the trace of A−1 can be rewritten as,

Tr(A−1 ) = Tr(U −1 V1 U V1−1 ) = Tr(U V1 U V1−1 ) = Tr(V1 U V1−1 U ) = Tr(A)

(42)

(43)

(44)

ACKNOWLEDGEMENT

PR EP R

The authors would like to thank the Japan Aerospace Exploration Agency (JAXA) for providing ALOS PALSAR data under the program AOALO.3728. We also thank our anonymous referees who have helped us improve the paper. R EFERENCES

[1] S. R. Cloude and E. Pottier, “A review of target decomposition theorems in radar polarimetry,” IEEE Transactions on Geoscience and Remote Sensing, vol. 34, no. 2, pp. 498–518, 1996. [2] A. Freeman and S. Durden, “A three-component scattering model for polarimetric SAR data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 3, pp. 963–973, 1998. [3] R. Paladini, M. Martorella, and F. Berizzi, “Classification of man-made targets via invariant coherency-matrix eigenvector decomposition of polarimetric SAR/ISAR images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 8, pp. 3022–3034, 2011. [4] V. Karathanassi and M. Dabboor, “Land cover classification using E-SAR polarimetric data,” in Proceedings of the XX International ISPRS Congress ”Geo-Imagery Bridging Continents, Istanbul, Turkey, July 2004, Commission VII, 2004, pp. 280–285. [5] M. Dabboor and V. Karathanassi, “A knowledge-based classification method for polarimetric SAR data,” in Proceedings of the SPIE, SAR Image Analysis, Modeling, and Techniques VII, Bruges, Belgium, Sep. 2005, Volume 5980, 2005, pp. 109–121. [6] M. Dabboor, V. Karathanassi, and A. Braun, “Multilevel hierarchical segmentation method for polarimetric SAR data based on scattering behaviour and histograms,” Canadian Journal of Remote Sensing, vol. 36, no. 2, pp. 142–153, 2010. [7] J.-S. Lee, M. R. Grunes, E. Pottier, and L. Ferro-Famil, “Unsupervised terrain classification preserving polarimetric scattering characteristics,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 4, pp. 722–731, 2004. [8] C. Lardeux, P. L. Frison, C. Tison, J. C. Souyris, B. Stoll, B. Fruneau, and J. P. Rudant, “Support vector machine for multifrequency SAR polarimetric data classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 12, pp. 4143–4152, 2009. [9] S. R. Cloude and E. Pottier, “An entropy based classification scheme for land applications of polarimetric SAR,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 1, pp. 68–78, 1997. [10] J. S. Lee, M. Grunes, T. Ainsworth, L. Du, D. Schuler, and S. Cloude, “Unsupervised classification using polarimetric decomposition and the complex Wishart classifier,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 5, 1999. [11] E. Pottier and J. S. Lee, “Application of the H/A/α polarimetric decomposition theorem for unsupervised classification of fully polarimetric SAR data based on the Wishart distribution,” in Proceedings of the Committee Earth Observing Satellite, SAR Workshop, Toulouse, France. Oct. 1999, 1999, pp. 335–340. [12] S. E. Park and W. Moon, “Unsupervised classification of scattering mechanisms in polarimetric SAR data using fuzzy logic in entropy and alpha plane,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 8, pp. 2652–2664, 2007. [13] F. Cao, W. Hong, Y. Wu, and E. Pottier, “An unsupervised segmentation with an adaptive number of clusters using the SPAN/H/α/A space and the complex Wishart clustering for fully polarimetric SAR data analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 11, pp. 3454–3467, 2007. [14] K. Conradsen, A. A. Nielsen, J. Schou, and H. Skriver, “A test statistic in the complex wishart distribution and its application to change detection in polarimetric SAR data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 1, pp. 4–19, 2003. [15] J. S. Lee, M. R. Grunes, and R. Kwok, “Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution,” International Journal of Remote Sensing, vol. 15, no. 11, pp. 2299–2311, 1994. [16] M. E. Ayadi, M. Kamel, and F. Karray, “Toward a tight upper bound for the error probability of the binary gaussian classification problem,” Pattern Recognition, vol. 41, no. 6, pp. 2120–2132, 2008. [17] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: Wiley Interscience, 2000. [18] C. G. Khatri, “On certain distribution problems based on positive definite quadratic functions in normal vectors,” The Annals of Mathematical Statistics, vol. 37, no. 2, 1966. [19] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes: The Art of Scientific Computing, 3rd ed. Cambridge, UK: Cambridge University Press, 2007. [20] J. S. Lee, M. R. Grunes, and G. D. Grandi, “Polarimetric SAR speckle filtering and its implication on classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 5, pp. 2363–2373, 1999.

11

PR EP R

IN T

[21] P. R. Kersten, J.-S. Lee, and T. L. Ainsworth, “Unsupervised classification of polarimetric synthetic aperture radar images using fuzzy clustering and EM clustering,” IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 3, pp. 519–527, 2005. [22] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clusters in a data set via the gap statistic,” Journal of the Royal Statistical Society, B, vol. 63, pp. 411–423, 2001.

12

L IST OF F IGURES

2

3 4

5 6

PR EP R

7

The graph of f (β) (top) and g(β) (bottom) for two statistically: identical distributions; similar distributions; and dissimilar distributions. In the identical distributions case, solid line appears for f (β) = 0 and g(β) = 1. . . . . (a) A map of the study area. (b) AVNIR-2 RGB optical image of the study area. The study area is 83 km N-S and 12 km W-E. (c) RGB of the despeckled POLSAR data (red = T22 , green = T33 , blue = T11 ). (d) POLSAR data classification based on the Wishart-Chernoff distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot of the data log-likelihood calculated using the Wishart-Chernoff distance to merge clusters (solid line) and the data log-likelihood calculated using the Wishart test statistic distance to merge clusters (dashed line). . . . . Urban results for Minehead (top), Tiverton (center) and Cullompton (bottom) For each result, urban blocks as shown in the map of the study area (top left) and in the optical image (top right). The Wishart test statistic result is shown bottom left and the Wishart-Chernoff distance is shown bottom right. . . . . . . . . . . . . . . . . . . . Forests as shown in the map of the study area (top left), the optical image (top right). The Wishart test statistic result is shown bottom left and the Wishart-Chernoff distance result is bottom right. . . . . . . . . . . . . . . . . Two results for streams: Mill Hill (top) and Badgworthy Hill (bottom). The map of the study area is shown top left, the optical image top right. The Wishart test statistic result is shown bottom left and the Wishart-Chernoff distance result is bottom right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Agricultural fields as shown in the optical image at the top. The Wishart test statistic result is shown in the center and the Wishart-Chernoff distance result is at the bottom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

IN T

1

13

14 15

16 17

18 19

FIGURES

13

similar distributions identical distributions

PR EP R

f(β)

IN T

dissimilar distributions

0 .2

0 .3

0 .4

0 .5

β

0 .6

0 .7

0 .8

0 .9

1 .0

g(β)

0 .1

0 .1 0 0 ..1

0..2 2 0 .2 0

0 .3 0 .3

0..4 4 0

0 .5

β

0 .6

0 .7

similar distributions identical distributions dissimilar distributions 0 .8

0 .9

1 .0

Fig. 1: The graph of f (β) (top) and g(β) (bottom) for two statistically: identical distributions; similar distributions; and dissimilar distributions. In the identical distributions case, solid line appears for f (β) = 0 and g(β) = 1.

Cullompton

(b) AVNIR-2 Image

C

a

(c) POLSAR Image

d

a

b

a

(d) Classified Image

IN T

C

Fig. 2: (a) A map of the study area. (b) AVNIR-2 RGB optical image of the study area. The study area is 83 km N-S and 12 km W-E. (c) RGB of the despeckled POLSAR data (red = T22 , green = T33 , blue = T11 ). (d) POLSAR data classification based on the Wishart-Chernoff distance.

(a) Area Map

Tiverton

Minehead

Cyan Orange Yellow Brown Light green Green Dark green Sea green

PR EP R FIGURES 14

15

PR EP R

IN T

FIGURES

Fig. 3: Plot of the data log-likelihood calculated using the Wishart-Chernoff distance to merge clusters (solid line) and the data log-likelihood calculated using the Wishart test statistic distance to merge clusters (dashed line).

16

PR EP R

IN T

FIGURES

Fig. 4: Urban results for Minehead (top), Tiverton (center) and Cullompton (bottom) For each result, urban blocks as shown in the map of the study area (top left) and in the optical image (top right). The Wishart test statistic result is shown bottom left and the Wishart-Chernoff distance is shown bottom right.

17

PR EP R

IN T

FIGURES

Fig. 5: Forests as shown in the map of the study area (top left), the optical image (top right). The Wishart test statistic result is shown bottom left and the Wishart-Chernoff distance result is bottom right.

18

PR EP R

IN T

FIGURES

Fig. 6: Two results for streams: Mill Hill (top) and Badgworthy Hill (bottom). The map of the study area is shown top left, the optical image top right. The Wishart test statistic result is shown bottom left and the Wishart-Chernoff distance result is bottom right.

19

PR EP R

IN T

FIGURES

Fig. 7: Agricultural fields as shown in the optical image at the top. The Wishart test statistic result is shown in the center and the Wishart-Chernoff distance result is at the bottom.

FIGURES

20

L IST OF TABLES Percent error for reference regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PR EP R

IN T

I

21

21

IN T

TABLES

PR EP R

TABLE I: Percent error for reference regions. Land cover

Number of pixels

Ocean Urban Coniferous Forest Non-Coniferous Forest Natural grassland

983 680 515 316 515

Wishart-Chernoff Distance 0 47.3 7.3 3.8 0

Wishart Test Statistic Distance 0 83.9 0 100 1.8