a multiresolution threshold selection method

4 downloads 0 Views 247KB Size Report
With the selected modes it computes the output histogram region {SHROIl} at level l. ..... paper the descriptors used are selected to provide information on ...
J. R. Martínez-de Dios y A. Ollero. A Multiresolution Threshold Selection Method Based on Training. Lecture Notes in Computer Science. Vo. 3211. 2004. Pgs. 90 - 97.

A MULTIRESOLUTION THRESHOLD SELECTION METHOD BASED ON TRAINING J.R. Martinez-de Dios and A. Ollero Grupo de Robótica, Visión y Control. Departamento de Ingeniería de Sistemas y Automática. Escuela Superior de Ingenieros. Universidad de Sevilla. Camino de los Descubrimientos, 41092 Sevilla (Spain). Phone: +34-954487357; Fax:+34-954487340. email: {jdedios, aollero}@cartuja.us.es

Abstract. This paper presents a new training-based threshold selection method for grey level images. One of the main limitations of existing threshold selection methods is the lack of capacity of adaptation to specific vision applications. The proposed method represents a procedure to adapt threshold selection methods to specific applications. The proposed method is based on the analysis of multiresolution decompositions of the image histogram, which is supervised by fuzzy systems in which the particularities of the specific applications were introduced by means of a training process. The presented method permits simple training-based adaptation to computer-vision applications. Its performance exhibits significant robustness to changes in illumination conditions and to noise in the images. The method has been extensively applied in various computer vision applications, one of which is described in the paper.

1. Introduction Image segmentation is an essential process in image analysis. Threshold selection methods can be classified according to the information on which they rely for object/background classification [10]. Some methods rely on grey level information and ignore spatial dependence such as those based on maximization of entropy functions [15], [8] and [1]; maximization of class separability [14] and minimization of misclassification error [9]. Some other thresholding methods use spatial information. The method described in [13] preserves the connectivity of regions. These methods are based on general object/background separability criteria and are not capable of adapting to specific vision applications. Thresholding problem is highly dependent on the computer vision application. It is not possible to find a general solution and specific solutions necessarily involve specific aspects of the vision problem. This paper presents a training-based method for threshold selection. The method uses knowledge extracted from training images of a vision application to supervise a multiresolution threshold selection. This method represents a procedure to design threshold selection methods particularized to specific applications. The proposed method is robust to changes in illumination. Its flexibility permits straightforward adaptation to specific vision applications. This paper is organized as follows. Section 2 presents the multiresolution method for threshold selection. The training process is detailed in Section 3 and Section 4. The presented method is applied Section 5 describes the application of the presented method to an automatic infrared detection system in outdoor urban scenarios. Final remarks are presented in Conclusions.

2. Multiresolution scheme for threshold selection Most histogram-based threshold selection methods assume that pixels of the same object have similar intensities. Thus, objects in the images are represented as histogram modes. Some methods aim to identify the mode or modes corresponding to the object of interest -object modes-. The method presented in this paper divides the identification and selection of object modes in L+1 steps. The method presented is based on the analysis of approximations of the image histogram at different levels of resolution l. Let

h(z ) be the histogram of an image with NL intensity levels. Let CAl (z ) be the multiresolution approximations of h(z ) at resolution l ∈ {L , L − 1,...,0} ⊆ . CAl (z ) is computed from h( z ) by applying Mallat’s approximation decomposition algorithm [12] at level l. The wavelet decomposition uses Haar

J. R. Martínez-de Dios y A. Ollero. A Multiresolution Threshold Selection Method Based on Training. Lecture Notes in Computer Science. Vo. 3211. 2004. Pgs. 90 - 97. basis functions due to its efficient implementation [7]. Notice that wavelet tree is used as a bank of lowpass filters. The low-pass filtering effect increases with l. This paper also uses the concept of histogram region that will be defined as a set of values {z : a ≤ z ≤ b} , where a and b are two intensity levels of the

image in the range a ,b ∈ [0 , NL − 1] . Fig. 1 shows the general scheme of the presented method. The first two steps are the computation of

h(z ) and the wavelet approximation of h(z ) at level L, CA L (z ) . Then, comes the iterative application of Mode Selection System from l=L to l=0. At each level l, Mode Selection System carries out the function f: {IHROI l } → {SHROI l } . Mode Selection System selects the modes in {IHROIl} that correspond to the object of interest. With the selected modes it computes the output histogram region {SHROIl} at level l. The selection of the object modes will be described in Section 2.1. The histogram region with the selected modes at level l, {SHROIl}, will be analyzed with more resolution at level l-1: assuming dyadic decompositions, at l-1 {SHROIl}={z: a ≤ z ≤ b} is transformed to {IHROIl-1}={z: 2a ≤ z ≤ 2b}. The application of Mode Selection System from level l=L to l=0 analyzes the histogram at increasing resolutions and perform an iterative restriction of the histogram region of interest. The resulting histogram region, {SHROI0}, contains the modes selected as object modes in h(z ) since CA 0 (z ) = h( z ) . Finally, the threshold value is computed as the lower value of {SHROI0}.

Image

histogram computation

Wavelet decomposition at level L

Mode Selection System at level L

Mode Selection System at level L-1

Mode Selection System at level 0

Threshold

Fig. 1. General scheme of the multiresolution method for threshold selection. 2.1 Mode Selection System Mode Selection System can be divided in three steps: histogram characterization, modes iterative selection and computation of {SHROIl}. Mode Selection System at level l analyzes CAl (z ) , {IHROIl}, selects the modes corresponding to the object and, computes {SHROIl} from the selected modes. Assuming a probabilistic (Bayesian) approach commonly considered in the literature, histograms can be modeled as a mixture of Gaussian probability density functions. Histogram decomposition in Gaussian modes can be considered as an unsupervised problem, which can be solved by the method described in [3] and modified in [5]. Assume S is the set of modes selected as corresponding to the object, S = { selected ωi } . The histogram can be divided in two components: one with the selected modes, hs ( z ) = ∑ P(ωi ) p (z ωi ) and ωi ∈S

one with the non-selected modes, hu ( z ) = ∑ P(ωi ) p(z ωi ) . The aim of Mode Selection System is to ωi ∉S

select the modes such that hs (z ) contains the modes corresponding to the object and hu ( z ) the modes corresponding to the background. Histogram modes interpretation is not an easy problem. Precise mode interpretation is highly dependent on the application. The Fuzzy Supervising System, also denoted by FSS, is responsible for selecting the histogram modes according to knowledge of the specific application. The aim of the FSS is to recognize mode features in order to classify a mode as corresponding to the object or to the background. Consider the selection of mode ωu . Let {descriptord } , d ∈ {1, 2, ..., D} be a set of features

able to describe ωu . The knowledge of the application was introduced in the FSS during the FSS

Training (Section 3). The FSS uses the features of ωu expressed in terms of {descriptord } and uses the

knowledge of the application to decide if ωu is considered part of the object or part of the background.

The operation of the FSS is detailed in Section 2.2. The selection of {descriptord } is detailed in Section 4. To generalize the expressions for level l, their formulation should be transformed by substituting ωi

{

}

by ωil , S by S l = selected ωil , and hs (z ) and hu ( z ) by CAsl (z ) and CAul (z ) . Fig. 2 shows the scheme of the Mode Selection System at level l. j is the number of modes in which CA l (z ) is decomposed. At

J. R. Martínez-de Dios y A. Ollero. A Multiresolution Threshold Selection Method Based on Training. Lecture Notes in Computer Science. Vo. 3211. 2004. Pgs. 90 - 97. each iteration the selection of one mode, denoted by ωul , is analyzed. If a mode is selected as corresponding to one object, CAsl (z ) and CAul (z ) are updated. The proposed method assumes that the object corresponds to the modes with higher intensity levels in the histogram. In order to select all the modes corresponding to the object, the iterations continue in descending order (from ω lj to ω0l ) until a mode is not selected. Once the iterative selection has finished, {SHROIl} is computed as the histogram region

TH sl , the value that optimally distinguishes

{ z : TH sl ≤ z ≤ (NL / 2l )−1}, which is lower bounded by CAsl ( z ) from CAul (z ) , i.e. CAsl (TH sl ) = CAul (TH sl ) . The

computation of TH sl consists in solving a simple 2nd order equation (see [4]) by applying simple expressions. l

{IHROI } characterization

Mode characterization at level l

CAsl (z ) = 0

∀z

i=0

( )

ω ul

CAsl ( z ) = CAsl (z ) + P ω ul p z ω ul   

modes iterative selection

= ω lj −i

i =i+1 FSS at level l

( )

Select ωul

( )

NotSelect ωul

computation l of {SHROI }

Compute {SHROIl}

{SHROIl} Fig. 2. Scheme of the Mode Selection System at level l.

2.2 Fuzzy Supervising System Fig. 3 depicts the diagram block of the FSS. The input of the FSS is {descriptord } , d ∈ {1, 2, ..., D} . The output, y∈[0,1], represents a possibility value to select ωul as part of the object. ωul is selected if y ≥ α, where α∈[0,1] ⊂ ℜ , will be called the FSS decision threshold. The FSS receives as inputs the features of mode ωul and computes the possibility to consider ωul as part of the object according to the knowledge of the application incorporated in the FSS during the training. If the output (y) is higher than α, the FSS decision threshold will consider ωul part of the object and ωul is selected. If y is lower than α, ωul will be considered as corresponding to the background and will be not selected.

{descriptor d }

FSS at level l

( ) y≥α ( ) yα, where α is the FSS decision threshold and β ∈ [0, 1]∈ ℜ will be named as the protection band width. If the decision is not to select ωul , the output should be dyk=α-β jn NS is found. A measure of the relevance of input-out pair k (PRk) could be defined as the reduction in the number of misclassified pixels originated by taking the correct Select/NotSelect decision: PRk = max( jn S , jn NS ) − min( jn S , jn NS )

(4)

Conflicts between contradictory pairs, those with similar inputs but different outputs, are solved by removing the pairs with lower value of PRk. The set of pairs contains the desired input-output values for the FSS such that the application of the method to the training images provides the desired thresholds, thn . The following step consists in training a fuzzy system in order to incorporate the knowledge expressed in terms of these input-output pairs.

3.2 Fuzzy Identification Fuzzy Identification consists in training the FSS to approximate the input-output pairs that contain the knowledge of the specific application. In this paper Fuzzy Identification is performed by applying ANFIS method. ANFIS [6] uses a hybrid learning algorithm to identify the parameters of Sugeno-type fuzzy inference systems. It applies a combination of the least-squares method and the backpropagation gradient descent method for training fuzzy parameters to fit a data set. In this paper ANFIS receives as inputs a set of NP filtered pairs, and trains the FSS by minimizing the following MSE error:

MSE =

1 NP 2 ∑ (ek ) , NP f k =1

(

(

)

ek = dyk − FSS { descriptord }k ,

(5)

)

where FSS { descriptord }k is the output of the FSS to input { descriptord }k . The FSS will be considered to be successfully trained if it classifies all the pairs without error, i.e. if it satisfies:

IF ( dyk ≥ α

) THEN

( FSS ({ descriptord }k ) ≥ α )

(6)

IF ( dyk < α

) THEN

( FSS ({ descriptord }k ) < α )

(7)

ANFIS iterates until rules in Eq. (6) and (7) are satisfied for all pairs. Once the ANFIS algorithm has converged, the trained FSS approximates the input-output pairs extracted from the training images. Thus, the threshold selection method computes thn as threshold values for imagen and, therefore, the threshold selection method is adapted to the conditions of the specific application.

4. Selection of descriptors The set of descriptors represent the features that will be used for mode interpretation in the mode selection. The particular descriptors depend on the specific application. In the examples shown in this paper the descriptors used are selected to provide information on illumination conditions and mode similarity. None of these features can provide by itself enough information for mode selection and, a

J. R. Martínez-de Dios y A. Ollero. A Multiresolution Threshold Selection Method Based on Training. Lecture Notes in Computer Science. Vo. 3211. 2004. Pgs. 90 - 97. combination of them is required. The substitution or introduction of descriptors to consider specific aspects of applications do not involve any significant change in the method. Changes in illumination conditions have severe impact on the performance of most thresholding methods. The effect of illumination on ωul can be parameterized in terms of its mean and standard deviation. To make µ l and σ l independent to l, the factor introduced by the dyadic decomposition, ωu ωu

(2l ), should be considered, i.e. µω* ul = (2l )µωul

()

and σ * l = 2l σ l . Several segmentation techniques ωu ωu have been developed basing on theoretical and empirical definitions of mode similarity measures such as Kullback-Leibler distance [2] or Weighted-Mean-Variance distance [11]. In this work, mode similarity mainly refers to the notion of “proximity” and is estimated by a simplified definition of the WeightedMean-Variance distance defined by:

(

)

d ω sl ,ωul =

µ l −µ l ωs ωu µ l

(8)

ωs

5. Results and discussions The presented method has been applied in an automatic infrared detection system in outdoor urban scenarios installed on the roof of the building of the School of Engineering of Seville. The aim of this system is to detect the presence of vehicles and lights at the surroundings of a building. The images in Fig. 4a-c show some of the training images. The desired threshold values for images in Fig. 4a-c (see Table 1a) were selected to fit the application specifications by using an image processing tool such as MATLAB™. The desired thresholded images are shown in Fig. 4d-f. Thresholding in a detection application is not a simple image/background pixel classification since the object of interest could not be present in the image. This aspect is considered in the method by considering training images with and without object of interest, for instance Fig.4c contains two objects of interest and Fig.4c does not contain any.

a)

b)

c)

d)

e)

f)

Fig. 4. a-c) Three training images of an infrared detection system in outdoor urban scenarios; d-f) images thresholded with the desired thresholded values (shown in Table 1a).

The number of levels of resolution of the multi-resolution analysis is L=4. The FSS Training is executed with 43 training images. The parameters of the method were α=0.5 and β=0.25. τ was set to 0.8 to increase the weight of detection error pixels in the training. Table 1b shows the threshold values resulting of applying the trained method to the images shown in Fig. 4a-c. The low difference between the desired and resulting thresholds shown in Table 1a and Table 1b are originated by the fact that the FSS is trained to best approximate the desired threshold values and it is not possible to eliminate low errors in the general case.

J. R. Martínez-de Dios y A. Ollero. A Multiresolution Threshold Selection Method Based on Training. Lecture Notes in Computer Science. Vo. 3211. 2004. Pgs. 90 - 97.

image

a)

b)

c)

image

a)

b)

c)

desired thn

126

105

118

resulting thn

128

102

118

a)

b)

Table 1. Desired threshold values for the training images in Fig. 4a-c and resulting threshold values provided by the method with trained FSS.

In this example the mean number of misclassified pixels per image is 4, which has no practical relevance as shown in the resulting thresholded images with the trained method (Fig. 5a-c). Fig. 6 shows one test image and its corresponding satisfactorily thresholded image.

a)

b)

c)

Fig. 5. a-c) Thresholded images provided with the trained FSS corresponding to the images in Fig. 4a-c.

a)

b)

Fig. 6. Test images and thresholded images by the method with the trained FSS.

Fig. 7 shows the images resulting of applying the threshold values computed by Otsu’s [14], KittlerIllingworth’s [9] and Ridler-Calvard’s [16] methods to the image shown in Fig. 6a. They contain a high number of background pixels classified as object pixels. The proposed method bases on object/background separability criteria learned from application images while existing techniques rely on generic criteria.

a)

b)

c)

Fig. 7. Resulting thresholded images by applying: Otsu’s, Kittler-Illingworth’s and Ridler-Calvard’s methods to the image shown in Fig 6a.

J. R. Martínez-de Dios y A. Ollero. A Multiresolution Threshold Selection Method Based on Training. Lecture Notes in Computer Science. Vo. 3211. 2004. Pgs. 90 - 97.

6. Conclusions This paper presents a new training-based threshold selection method for grey-level images. It represents a new procedure to design threshold selection methods particularized to specific applications. The application-adaptation capabilities of the method bases on the utilization of knowledge of the specific application extracted from a set of training images. The training process extracts knowledge from the training images and incorporates it in a supervising system via ANFIS method. This trained system is used for the supervision of a multiresolution threshold selection. The main contribution of the presented method with respect to existing techniques is that it relies on object/background separability criteria learned for the specific application while existing techniques rely on generic criteria. The method has been extensively applied in various computer vision problems such as automatic infrared detection in outdoor urban scenarios, which is described in the paper. Its performance exhibits considerable robustness to illumination conditions and noise in the images.

Acknowledgements The work described in this paper has been developed in the frame of the following projects “SPREAD Forest Fire Prevention and Mitigation” (EVG1-CT-2001-00043) “COMETS: Real-time coordination and control of multiple heterogeneous unmanned aerial vehicles” (IST-2001-34304) and “EURFIRELAB” (EVR1-CT-2002-40028). funded by the European Commission. Partial funding has been obtained from CROMAT project funded by the Spanish national Research and Development Plan (DPI2002-04401C03-03).

References 1. Abutaleb A. S., “Automatic Thresholding of gray level pictures using two-dimensional entropy”, Computer Vision, Graphics and Image Processing, vol. 47, (1989). 22-32. 2. Do M. and M. Vetterli, “Wavelet-based texture retrieval using generalized gaussian density and kullback-leibler distance,” IEEE Transactions on Image Processing, vol. 11, no. 2. (2002). 146–158. 3. Duda R.O, P.E. Hart and D.G. Stork, “Pattern Classification”, John Wiley and Sons, (2001). 4. Gonzalez R.C. and R.E. Woods, “Digital Image Processing”, Addison-Wesley (1992). 5. Iñesta J.M. and J. Calero, “Robust Gray-Level Histogram Gaussian Characterization”, SSPR&SPR 2002, Lecture Notes in Computer Science 2396, (2002), 833-841 6. Jang J.-S. R., “ANFIS: Adaptive-Network-based Fuzzy Inference Systems”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 23, no. 3, (1993). 665-685. 7. Kaiser G., “The Fast Haar Transform”, IEEE Potentials, May/April, (1998). 34-37. 8. Kapur J.N., P.K. Sahoo, and A.K.C. Wong, “A new method for gray-level picture thresholding using the entropy if the histogram”, Comput. Vision, Graphics, Image Processing, vol. 29, (1985) 273-285. 9. Kittler J. and J Illingworth, Minimum error thresholding, Pattern Recognition, 19, (1986).41-47. 10. Lee S. U., Y.S. Chung, and R. H. Park, “A comparative performance study of several global thresholding techniques for segmentation”, Computer Vision, Graphics, and Image Processing, 52(2), (1990). 171-190. 11. Ma W. Y. and B. S. Manjunath, “Texture Features and Learning Similarity”, In Proc. IEEE Conference on Computer Vision and Pattern Recognition, (1996), 425-430. 12. Mallat S., “A Theory for Multi-resolution Signal Decomposition: the Wavelet Representation”, IEEE Trans. PAMI, vol PAMI-11, (1989). 674-693. 13. O’Gorman L, “Binarization and multithresholding of document images using connectivity”, Graphical Models and Image Processing, vol. 56, no. 6. (1994). 494-506. 14. Otsu N., “A threshold selection method from grey-level histograms”, IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, (1979). 62-66. 15. Pun T., “A new method for grey-level picture thresholding using the entropy of the histogram”, Signal Processing, vol.2 , no. 3, (1980). 223-237. 16. Ridler T.W. and S Calvard, “Picture Thresholding using an Iterative Selection Method”, IEEE Transations on Systems, Man and Cybernetics. Vol. 8. (1978). 630-632.