Land Cover Classification for Remote Sensing Imagery ... - IEEE Xplore

1 downloads 0 Views 690KB Size Report
(CTF) method to utilize widely available historical land cover. (HLC) maps in land use/cover classification on high-resolution images. The CTF is based on texton ...
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 720

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 4, JULY 2011

Land Cover Classification for Remote Sensing Imagery Using Conditional Texton Forest With Historical Land Cover Map Zhen Lei, Tao Fang, and Deren Li

Abstract—In this letter, we propose a “conditional texton forest” (CTF) method to utilize widely available historical land cover (HLC) maps in land use/cover classification on high-resolution images. The CTF is based on texton forest (TF), which is a popular and powerful method in image semantic segmentation due to its effective use of spatial contextual information, its high accuracy, and its fast speed in multiclass classification. The proposed CTF method nonparametrically aggregates a bank of TFs according to HLC information and uses the fact that different types of HLC follow different transition rules. The performance of CTF is compared to support vector machine (SVM), Markov random field (MRF), and a naive TF method which uses historical data directly as a feature channel. On average, CTF results in a 2%–5% higher classification accuracy than other classifiers in our experiment. The classifying speed of CTF is similar with TF, five times faster than MRF, and hundreds of times faster than SVM. Given the abundance of HLC data, the proposed method can be expected to be useful in a wide range of socioeconomic and environmental studies. Index Terms—Ensemble classifier, land cover, land cover change, land use, random forest.

I. I NTRODUCTION

A

MAJOR difficulty in land cover classification using remotely sensed data is associated with the high spectral variation that the same land cover class can give rise to due to the high heterogeneity in complex landscapes [1]. Many different classifiers, such as random forest [2], have been proposed to combine multisource remotely sensed data and geographic data including elevation, slop, and aspect [3] to improve classification accuracy. Random forest [2] is an ensemble of decision trees. As noted by [4], random forest methods are much faster than traditional classifiers such as support vector machine (SVM), and their classification accuracy is similar to that of SVM. As evidenced by its successful applications in multiclass classifications in

Manuscript received August 18, 2010; accepted October 28, 2010. This work was supported in part by the National Key Basic Research and Development Program of the People’s Republic of China under Grant 2006CB701303 and in part by the National High Technology Research and Development Program of the People’s Republic of China under Grant 2006AA12Z105. Z. Lei and T. Fang are with the Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]; [email protected]). D. Li is with Wuhan University, Wuhan 430079, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2010.2103045

general-scene images [4], [5] and land cover images [3], [6], random forest methods can be used to effectively combine different sources of information. Recently, Shotton et al. [7] have proposed semantic texton forest by extending the clustering random forest [8] method for pixel-level image classification. To simplify the notation, we use “texton forest” (TF) in place of “semantic texton forest” when there is no ambiguity. Schroff et al. [9] extended this approach by incorporating multiple feature sources. A distinctive characteristic of the aforementioned methods is that they utilize contextual information in local neighborhoods under a random forest framework. Although there exists pioneering research on using TF in [10] combining remotely sensed data with height information, to the authors’ best knowledge, there is no reported work on using TF or random forest to incorporate historical land cover (HLC) information, even though a tremendous amount of HLC has been generated and accumulated for more than half a century. From a different perspective, many researchers have focused on incorporating multiple sources of remotely sensed data as well as ancillary geographic information system (GIS) data [11], [12]. For example, Solberg et al. [11] introduced a general framework based on Markov random field model (MRF) which integrates multiple sources of remotely sensed data, ancillary GIS data, and HLC maps. Three types of contextual information, namely, spatial, spectral, and temporal information, are utilized. This method was extended by a number of researches such as [13]. All these MRF approaches improve the classification by exploring the temporal components of multitemporal imagery in terms of land cover transition probabilities. However, these approaches require an additional procedure for the choice or estimation of MRF parameters [14]. Moreover, Bayesian approaches to classification and regression trees [15]– [17] can draw prior information from training data according to Bayesian theory. They also have the potential to incorporate historical data into their prior model. In this letter, we propose a “conditional texton forest” (CTF) method in which a bank of TFs is aggregated according to HLC. Utilizing TF [7], our method preserves the merits of TF such as stability, speed, and an accuracy comparable to SVM while allowing the classifier to exploit the temporal contextual information embedded in the combination of current remotely sensed image and HLC maps. Compared with MRF-based methods [11], our method does not need to estimate MRF parameters. It should be noted that the proposed CTF method is not a method for general semantic segmentation on ordinary images like in [7]. The need for additional historical groundtruth (GT) images largely restrained the proposed method to

1545-598X/$26.00 © 2011 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LEI et al.: LAND COVER CLASSIFICATION FOR REMOTE SENSING IMAGERY

721

to refer to the manner in which TFs are trained and used conditionally based on a specific historical category. The CTF is quite simple in notion. Suppose that D is an image containing feature channels derived from remotely sensed imagery. C and G are the coregistered HLC image and GT image with a category index on each pixel location (x, y), respectively. For each category i in m HLC categories, we can define CTF as the collection of T Fi = Texton Forest Learning (Di ),

Fig. 1.

Workflow of CTF for land cover classification.

Earth observation or other similar applications where repeated observations are available. Given the wide availability of HLC maps, the method can be expected to be useful for different real socioeconomic and environmental applications. II. M ETHOD Just as “what we are” highly depends on “what we were,” the current land cover category of a piece of land is highly dependent on its past category. The gist of our method is that we can partition the land cover classification problem into a set of conditional classification subproblems, one for each HLC category. For each subproblem, we use the state-of-the-art TF [7] which can exploit the spatial contextual information. A. Texton Forest Like other random forest classifiers, TF learns multiple decision trees for a data set and then combines the predictions from each decision tree. As discussed in detail in [7], the most significant difference between TF and other random forests is the split function. In TF, the split functions on branch nodes act on textons (i.e., small image patches of size d × d pixels around a central pixel). These functions can be the following: 1) the value px,y,b of a single pixel (x, y) in color channel b of image patch p; or 2) the sum px1,y1,b1 + px2,y2,b2 ; 3) the difference px1,y1,b1 − px2,y2,b2 ; or 4) the absolute difference |px1,y1,b1 − px2,y2,b2 |, where (x1 , y1 ) and (x2 , y2 ) are a pair of pixels inside the texton, and b1 and b2 are the color channels of the two pixels, respectively. By allowing the split function to be calculated over a neighborhood of a central pixel, the spatial contextual information is recorded in the process of selecting the best split in many random trials. In our proposed method, we do not change these TF settings. B. CTF for Land Cover Classification Since different types of HLC tend to follow different change rules, we propose to use a higher ensemble of TFs with one TF for each historical category. As shown in Fig. 1, we partition the image pixels into subsets according to their HLC categories. Then, we train a TF independently within each subset to capture the unique class transition rule of that category. We call such an ensemble a “conditional ensemble” and the conditional ensemble of texton forests a CTF. We use the term “conditional”

i = 1, . . . , m (1)

where T Fi is the TF corresponding to category i and Di = {D(x,  y)|C(x, y) =i} forms a partition of D of training data, i.e., Di = D, Di Dj = ∅, if 1 = j. This method of generating training sample sets {Di } is different from the random selection process in bagging ensembles. In the classification phase of CTF, for a given pixel (x, y), the TF with class index C(x, y) is selected to classify the feature vector of D(x, y) of the data to be classified, rather than averaging all subclassifier outputs as is the case with bagging ensembles. In general, historical information helps conditional ensemble to effectively capture the unique transition rule in each category. On the other hand, it also poses a higher information requirement. In terms of its algorithm, CTF needs to preprocess data before training and classification. This includes the conversion of HLC from GIS format to raster image C, the coregistration of images C, D, and G, and the selection of training samples. We select the training samples using a random process to avoid overinterference of human experts, as described in experiment section. We also included the first derivatives of color in feature channels like in [4] and [10]. This gives about a 0.5% improvement. CTF needs to build up multiple TFs using same set of training data. This raises a concern about the scarcity of training sample (particularly for the less populated categories). However, two factors can relieve this issue. First, CTF learns pretty fast with increasing training samples. As shown in Fig. 5(d), the forests trained with 5% data works are almost as good as those trained with 50% data. Second, in real applications, training samples can often be manually collected. This allows for selectively enriching training sets of less populated categories. Statistically, a TF is a procedure to estimate conditional probability of categories based on observations: T F (observ) → P (class|observ), where class and observ represent current land cover class and observed features. Moreover, CTF is a collection of statistical estimation CT F (observ, class ) → P (class|observ, class )

(2)

where class represents HLC. This indicates that the outcome of CTF can be viewed as a transition probability P (class|class ) from HLC to current land cover class conditional on the observed features in the training sample set. To visualize this conditional transition probability, we show in Fig. 2 an instance of it estimated from a small data set. The rows are histograms of current land cover class for five HLC classes present in the data set, namely, farm lands, roads, waste lands, water bodies, and building. Current land cover categories include not only the historical categories but also a new category of forest. Although the data are quite small, the figure can already show certain transition patterns. For example, most classes except for farm lands do not significantly change

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 722

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 4, JULY 2011

Fig. 2. Histogram of conditional land cover transition.

to other classes. This relative stability could well be the case in many practical applications in land cover studies since they are often carried out in an incremental manner. Furthermore, some farm lands have changed to building, and a significant percentage has changed to forest class. However, very few building areas have changed to farm land. The proposed CTF method has several advantages. First, it is effective in learning and utilizing conditional land cover transition probabilities P (class|class ) by dividing and conquering. By incorporating such transition probability, the discriminative power of CTF increases since it is focused on the more likely cases for that historical category. For example, if the HLC class of an area is building, and the current local image feature is ambiguous between building and farm land, the building category now has a higher probability to win. An alternative strategy can be applied to utilize HLC information directly as a normal feature channel in a TF. Although this alternative strategy also helps to improve classification performance, the knowledge that the HLC is the starting point of a land cover transition is ignored, and thus, it does not perform well as CTF, as is proven by our experiment result in Section III-B. Second, CTF inherited the fast speed of random forest methods. Random forest methods are quite fast. For example, Bosch et al. [4] reported in their experiments that their random-forest-based method is 40 times faster than an SVMbased reference method. TF is used as a reference method for comparing computational complexity. Suppose that the computational complexities of the learning and classifying phases of TF are OTF_learn and OTF_classify , and m is the number of HLC classes, the learning computational complexity of CTF is m × OTF_learn because m T F s are built. The computational complexity of the classification of CTF is OTF_classify because only one TF is used in the classification stage. Therefore, the computational complexity of CTF is linear with respect to the complexity of TF. Third, CTF does not require the choice of controlling parameters which are difficult to determine automatically [11], [14]. Finally, CTF does not require that the numbers of HLC classes and current land cover classes should be the same. These two sets are independent elements in (2). For example, in Fig. 2, the forest class is new and not present in the historical classification. Conceivably, one can make use of an HLC map with a completely different classification scheme as long as it contributes information to the present day land cover. In the case when all pixels in the historical image belong to one single

Fig. 3. CTF results on Nanking data. (a) Original image. (b) HLC. (c) GT. (d) Classification result. Testing blocks are with black margins.

class, i.e., when there is no historical information, the CTF reduces to a normal TF. III. E XPERIMENT In this section, we describe the experimental result of the proposed CTF method on two test sites. The first site is a small suburban town of Xuzhou, Jiangsu, China. Fused QuickBird imagery acquired in August 2007 and an HLC map produced in 2003 are used. In the picture, there is a highway, some aqueducts, and new developing zones surrounded by some farm lands. It is representative of suburban scenarios. A second more complex site is chosen near Nanking, Jiangsu, China, as shown in Fig. 3(a). An aerial image with a resolution of 0.25 m per pixel acquired in April 2008 and an HLC map produced in 1999 are used. In the second scene, there are a town and several villages surrounded by farm lands, a lake, and many ponds. It is a typical rural scenario in Eastern China. For both sites, we use visual judgment of human experts to build GTs. We use six classes in classifying our test sites, namely, farm lands, forests, roads, waste lands, water bodies, and buildings. To evaluate the effectiveness of classification, we used several performance measures commonly used in a multiclass remote sensing classification setting. These include confusion matrixes, overall accuracy, and the Kappa coefficient, which is a similarity measure of the results of two classifiers [18]. While the main procedure of CTF is described in Section II-B, some implementation details are worth noting. The code for TF we used is based on the implementation generously made available by [19]. We also used the same criteria for evaluating classification results as [19] to make the comparison easier. When splitting training/testing data, approximately half of all image blocks are randomly selected for training, and the rest are used for testing. Because there is much randomness involved in the splitting of training/testing sample and the training procedure of TF, the results of different runs of a given experiment setting will not be exactly the same. By default, we use the average of five runs for each specific parameter setting.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LEI et al.: LAND COVER CLASSIFICATION FOR REMOTE SENSING IMAGERY

723

TABLE I C ONFUSION M ATRIX OF T EST-I MAGE -O NLY C LASSIFICATION : OVERALL ACCURACY I S 87.3% AND K APPA C OEFFICIENT I S 0.7848

A. Classification Results Fig. 3 shows the land cover classification results of a random run of CTF on the Nanking data set. In Fig. 3(d), the randomly chosen testing image blocks are marked with black margins. All training samples are randomly chosen from the rest of the image area. In Fig. 3, we observe that CTF achieves good performance. Multiple changes from HLC map have been updated, such as shrunken water bodies, forests in old building areas, and new road structure in the center bottom area. To quantitatively evaluate the overall performance, we calculated the confusion matrix, overall accuracy, and the Kappa coefficient as presented in Table I, in which each row corresponds to a different class in test map. To evaluate whether the proposed method is not overfitting like other random-forest-based methods [2], we also experiment on classifying on the whole image which includes both training and testing image patches. The same CTF learned in Fig. 3 is used for the classification. When training data are added to the test set, the accuracy and Kappa coefficient have not improved and have dropped slightly from 87.3% to 86.8% and from 0.7848 to 0.7727, respectively. This suggests that the proposed CTF is not overfitting. B. Influence of Factors To better illustrate the characteristics of CTF, we test the influence of the degree of data-set-related land cover change and that of the parameters of TF. Strictly speaking, the degree of land cover change (DoLCC) should be defined in terms of the similarity/difference between GT of historical and current land cover. However, as historical GT is not available in most cases and HLC maps are produced to approximate historical GT, we use Kappa coefficients between HLC and current GT image to measure the DoLCC where lower Kappa coefficients indicate greater land cover change. To investigate the influence of DoLCC, we artificially create two hypothetical data sets based on Nanking data by modifying the HLC images. In one data set, we use GT map as the ideal HLC map and call it “Nanking GT.” This represents the case when there is absolutely no change in land cover. In the other data set, we deliberately distort the HLC map by changing the boundaries and attributes of GIS object for a greater DoLCC to test the robustness of our method. We call it “Nanking distorted.” The DoLCC of data sets of Nanking distorted, Nanking, Xuzhou, and Nanking GT are 0.4556, 0.6015, 0.8385, and 1, respectively, representing variations from very large amount of land cover change to no change. In Fig. 4, we plot the influence of DoLCC on the accuracy and Kappa of CTF. From the figure, we can observe that first, in

Fig. 4. Influence of the DoLCC on CTF classification.

Fig. 5. Influence of TF parameters to CTF method: (a) Tree number per forest. (b) Random feature number. (c) Texton size. (d) Training sample ratio.

all cases, smaller DoLCC (i.e., higher Kappa between HLC and GT) leads to better classification performance. Second, when HLC map tends to perfectly match the current GT (i.e., toward the right ends of each curve), the classification of CTF also tends to be perfectly correct. Overall, this figure suggests that the degree of land cover change between HLC and current GT does have a strong influence on classifier performance. In addition to DoLCC, parameters of TF, such as the number of trees, the number of features, training sample ratio, and texton size, may also cast impact on the performance of CTF. We investigate such influences by varying one parameter at a time and fixing all other parameters to default values. By default, we use five trees in each forest, 400 random feature evaluations in each tree node split function, a 40-pixel-wide square texton, and 50% samples for training. Fig. 5 presents the results of these performance curves. As we mentioned before, all the statistic values are the average of five independent runs. From Fig. 5, we note that the proposed method is not significantly influenced by the TF parameters in our test range. The test ranges are relatively broad and include relatively low values in the one end and reasonably high values in the other end as are used in practice. For example, the ratio of sampling has been chosen to vary from 5% to 50%, but the performance of the classifier does not significantly decay on the low end when only 5% of the sample is used for training. Overall, these diagrams suggest that the proposed method is robust over a relatively broad range of parameter settings.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 724

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 4, JULY 2011

TABLE II C OMPARISON OF M ETHODS ON D IFFERENT DATA S ETS

C. Comparison Finally, we compared our method with three state-of-the-art alternatives. First, we compare it with MRF-based method [11]. We followed [11] and determined user-defined βsp = 2000 and βg = 500 using the suggested procedure. In the comparison, with pixel-based multiclass SVM method [12], we followed their method except expanding their feature vector by using the same feature channels of our CTF experiment including color, gradients, and HLC information channels. To illustrate the effectiveness of conditional ensemble, we also compared our method with a naive alternative TF method which directly uses HLC information as one of the feature channels. The methods were tested on six data sets. Four of them were introduced before, including “III” for “Xuzhou,” “IV,” “V,” and “VI” for “Nanking distorted,” “Nanking,” and “Nanking GT,” respectively. Since TF and SVM methods do not naturally require HLC like CTF and MRF methods, we also tested them on pure remotely sensed data in Xuzhou data set as “I” and Nanking as “II.” The results are presented in Table II. The DoLCC of the data sets, which is Kappa between HLC and GT, is also listed for reference. Although the MRF-based method in [11] can easily utilize spatial and temporal contextual information, the inefficiency of its “data cost function” based on feature space distance makes it 3%–7% less accurate than CTF as shown in Table II. SVM typically has slightly better performance than random forest methods [4]. However, because CTF utilizes spatial contextual information, it is about 2% more accurate on average than pixel-based SVM method in [12]. We also find that CTF is about 2% more accurate than TF. As discussed in Section II-B, this shows that conditional ensemble effectively improves the classification accuracy. In summary, CTF method shows an advantage on accuracy over the alternative methods we have compared. We also measured the average running times of five independent runs of CTF, TF, MRF, and SVM methods in the same test setting on Nanking data set. Normalized by image size, training one million pixels took 429, 350, 3, and 393 s, respectively, and classifying one million pixels needed 29, 27, 165, and 10688 s, respectively. Confirming our previous analysis of computational complexity, CTF has almost the same speed as TF but is several times faster than MRF and hundreds of times faster than SVM. On the other hand, CTF training is slightly slower than TF and SVM. MRF does not need much time for training. IV. C ONCLUSION In this letter, we have explored measures to enhance the TF methods by incorporating both spatial and temporal contextual

information embedded in the current remotely sensed imagery and HLC maps. In particular, a CTF method has been proposed to utilize the unique land cover transition pattern for each historical class and nonparametrically train a collection of TF. Our experimental results have shown that, due to its utilization of both spatial and temporal context information, the proposed CTF is more accurate than MRF method, pixel-based SVM method, and a naive TF method. Our results have also shown that the CTF is not overfitting, and its performance is significantly influenced by DoLCC. In addition, CTF classification has about the same speed with TF and is five times faster than MRF and roughly 300 times faster than an SVM method. From the view of multisource land cover classification studies, our CTF provides a novel, simple, yet effective way to incorporate temporal and spatial contextual information. However, it cannot utilize multiple HLC maps or multiple historical remotely sensed images at the same time. In our future work, we plan to further investigate these issues.

R EFERENCES [1] D. Lu and Q. Weng, “A survey of image classification methods and techniques for improving classification performance,” Int. J. Remote Sens., vol. 28, no. 5, pp. 823–870, Jan. 2007. [2] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001. [3] P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, “Random forests for land cover classification,” Pattern Recognit. Lett., vol. 27, no. 4, pp. 294–300, Mar. 2006. [4] A. Bosch, A. Zisserman, and X. Munoz, “Image classification using random forests and ferns,” in Proc. ICCV, 2007, pp. 1–8. [5] V. Lepetit and P. Fua, “Keypoint recognition using randomized trees,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9, pp. 1465–1479, Sep. 2006. [6] J. Ham, Y. Chen, M. M. Crawford, and J. Ghosh, “Investigation of the random forest framework for classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 492–501, Mar. 2005. [7] J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests for image categorization and segmentation,” in Proc. CVPR, 2008, pp. 1–8. [8] F. Moosmann, B. Triggs, and F. Jurie, “Fast discriminative visual codebooks using randomized clustering forests,” in Proc. Adv. Neural Inf. Process. Syst., 2007, vol. 19, pp. 985–992. [9] F. Schroff, A. Criminisi, and A. Zisserman, “Object class segmentation using random forests,” presented at the Br. Machine Vision Conf., 2008. [10] S. Kluckner, T. Mauthner, M. Roth, and H. Bischof, “Semantic classification in aerial imagery by integrating appearance and height information,” in Proc. ACCV, 2009, pp. 477–488. [11] A. Solberg, T. Taxt, and A. Jain, “A Markov random field model for classification of multisource satellite imagery,” IEEE Trans. Geosci. Remote Sens., vol. 34, no. 1, pp. 100–113, Jan. 1996. [12] L. M. He, F. S. Kong, and Z. Q. Shen, “Multiclass SVM based land cover classification with multisource data,” in Proc. Mach. Learn. Cybern., 2005, pp. 3541–3545. [13] D. Liu, K. Song, J. R. G. Townshend, and P. Gong, “Using local transition probability models in Markov random fields for forest change detection,” Remote Sens. Environ., vol. 112, no. 5, pp. 2222–2231, May 2008. [14] N. W. Park, “Accounting for temporal contextual information in landcover classification with multi-sensor SAR data,” Int. J. Remote Sens., vol. 31, no. 2, pp. 281–298, Mar. 2010. [15] W. Buntine, “Learning classification trees,” Stat. Comput., vol. 2, no. 2, pp. 63–73, Jun. 1992. [16] H. A. Chipman, E. I. George, and R. E. McCulloch, “Bayesian CART model search,” J. Amer. Stat. Assoc., vol. 93, no. 443, pp. 935–948, Sep. 1998. [17] D. G. T. Denison, B. Mallick, and A. Smith, “A Bayesian CART algorithm,” Biometrika, vol. 85, no. 2, pp. 363–377, Jun. 1998. [18] J. Cohen, “A coefficient of agreement for nominal scales,” Educ. Psychol. Meas., vol. 20, no. 1, pp. 37–46, Apr. 1960. [19] M. Johnson, Semantic Texton Forests Implementation, 2009. [Online]. Available: www.matthewajohnson.org/research/stf.html