Region Based Segmentation and Classification of ... - CiteSeerX

9 downloads 239 Views 245KB Size Report
fluor, (d) Red fluor, (e) Gold fluor and (f) Color M-FISH image. In this study region based segmentation and classification is proposed as a two-stage process.
Region Based Segmentation and Classification of Multispectral Chromosome Images

Petros S. Karvelis, Student member, IEEE, Dimitrios I. Fotiadis, Senior member, IEEE, Alexandros Tzallas, Student member, IEEE, Unit of Medical Technology & Intelligent Information Systems, Dept. of Computer Science, University of Ioannina, GR 45110 Ioannina, Greece {pkarvel, fotiadis}@cs.uoi.gr, [email protected]

Ioannis Georgiou Genetics Unit, Dept. of Obstetrics & Gynecology, Medical School, GR 45110 Ioannina, Greece [email protected]

Abstract Multiplex Fluorescent In Situ Hybridization (M-FISH) is a newly chromosome imaging technique where each chromosome class appears to have a distinct color. This technique although it makes the analysis of chromosome images easier, still exhibits misclassification errors that can be misinterpreted as chromosome abnormalities. A new method for the multichannel image segmentation and region classification is proposed. The segmentation of M-FISH images is based on a multichannel watershed segmentation method in order to define regions of same spectral characteristics. The region Bayes classification method which focuses on region classification is used. The classifier was trained and tested on nonoverlapping chromosome images and an overall accuracy 89% is achieved. The superiority of the proposed method over methods that use pixel-by-pixel classification is demonstrated.

1. Introduction Chromosomes are the condensed form of the genetic material and their images taken during cell division are useful for the diagnosis of genetic disorders and the study of cancer [1]. Normal human cells contain 46 chromosomes which consist of 22 pairs of similar, homologous chromosomes and two sex-determinative chromosomes (XY: male and XX: female). The procedure of assigning every chromosome to each class is called Karyotyping [1]. However, many images often have to be inspected and since visual inspection is time consuming and expensive, many attempts have been made to automate chromosome image analysis. In an attempt to make the process of imaging the chromosomes easier a newly developed cytogenetic technique was proposed [2]. In this technique all chromosomes are labeled with 5 fluors and a fluorescent DNA stain called DAPI (4’,6-Diamidino-2-phenylindole). DAPI attaches to DNA and thus labels all chromosomes. The other five fluors attach to specific sequences of DNA in a way that each class of chromosome absorbs a unique combination of fluors. Thus, at least five fluors are needed for combinatorial labeling to uniquely identify all 24 chromosome classes. Each of the fluors is visible in one of the spectral channels in a way that an M-FISH image consists of five images and each image is the response of the chromosome to the particular fluor (Fig. 1). Several methods have been proposed for M-FISH image segmentation and classification [3-7]. Most of these methods are based on pixel-by-pixel classification and tackle

classification as a problem with 5-features (five channels) and 24-categories [3,6,7]. Methods that make use of all (six) channels in order to incorporate the background as a new class have also been proposed [4,5]. Although the performance of these methods is very promising (accuracy ~90%) for some set of images [3-5,7], the average pixel classification accuracy for the whole set (200 images) was only 68% ± 17.5% [6]. (f )

(a) (b) (c) (d) (e)

Figure 1: Five channel M-FISH image data: (a) Aqua fluor, (b) Far red fluor, (c) Green fluor, (d) Red fluor, (e) Gold fluor and (f) Color M-FISH image. In this study region based segmentation and classification is proposed as a two-stage process. Initially the segmentation of the M-FISH chromosome image into regions is realized followed by the classification of the resulting regions [7]. A key difference between our method and other already developed is the use of a multichannel image segmentation method in order to produce homogenous regions with respect to measured characteristics such as intensity, color etc. This is quite different from the classical pixel-by-pixel approach since spatial context (image space) is not considered and classification is performed directly to the feature space. In this way a better segmentation scheme is provided which results in higher classification accuracy [8].

2. Materials and Method 2.1. First Stage: Multichannel Image Segmentation (MIS) In this stage a mask of pixels, to be classified is created. The watershed transform was used in order to decompose the multichannel image into a set of homogenous regions. To apply the watershed transform algorithm to the multichannel data, the gradient of the multichannel image must be defined. Instead of applying Sobel masks to each component of the M-FISH image and then combining the results we used a more sophisticated approach [9]. Consider the multichannel image I ( x, y ) : R 2 → R5 as a two-dimensional vector field with five components: I i : 1 ≤ i ≤ 5 and the direction n = [ cos ϕ ,sin ϕ ] defined by the angle ϕ . The directional derivative of the function I ( x, y ) consists of the directional derivatives of each component of I ( x, y ) . Since each component directional derivative is given by the projection of its gradient onto the direction of n we can define the first directional derivative of I ( x, y ) as: T

T

T ⎡ I1x I 2x K I 5x ⎤ ∂I 5 ⎤ ∂I ⎡ ∂I1 ∂I 2 T (1) , ,L, = = [∇I1 ⋅ n, ∇I 2 ⋅ n,K, ∇I 5 ⋅ n] = ⎢ y ⎥ ⋅ n = J ⋅ n. y y ∂n ⎢⎣ ∂n ∂n ∂n ⎥⎦ ⎢⎣ I1 I 2 K I 5 ⎦⎥ The matrix J contains the derivatives of each component, called the Jacobian. The

Euclidean norm is used in order to define the magnitude of change:

J ⋅ n = ( J ⋅ n)T ( J ⋅ n) = nT ( J T J ) n. 2

(2)

The term nT ( J T J ) n is equivalent to the Rayleigh-quotient of the matrix J T J and its extrema are given by the eigenvalues of the matrix J T J . As the image function I ( x, y ) is defined on two spatial directions we get a 2 × 2 matrix for J T J and thus, it is trivial to determine an analytical solution for the eigenvalue problem [9]. Due to the high sensitivity of the watershed algorithm to the gradient image intensity variations, the watershed transform (WT) produces image partitions containing a large number of regions. A widely used method to reduce the number of minima in a grayscale image is grayscale reconstruction which suppresses all minima with depth less than a threshold [10]. The computation of the watershed transform [11] is the next step of our method. WT is a popular segmentation method that originated from mathematical morphology. The image is considered as a topographical relief, where the height of each point is related to its grey level. Imaginary rain falls on the terrain. The watersheds are the lines separating the catchment basins. The output of the watershed algorithm is a tessellation of the input image into its different catchment basins, each one characterized by a unique label. The pixels that belong to the watershed lines are assigned a special label.

2.1.1. Creation of binary mask using Otsu method . Applying the WT on the greyscale reconstructed multichannel gradient magnitude, a number of regions is produced. However, the proposed method presents two major problems which must be handled effectively: (a) the M-FISH image often contains artefacts that appear in some channels but not in the greyscale DAPI channel and (b) regions on central areas of chromosomes centromeres usually fail to hybridize [12]. For this reason we superimpose the watershed lines on a binary mask of the DAPI channel. Let: ⎧0, if LW ( x, y ) ∈ Watershed Line ⎫ Wmask ( x, y ) = ⎨ ⎬, ⎩1, elsewhere ⎭

(3)

where LW ( x, y ) is the labelled watershed segmentation. BOtsu is the binary mask created by Otsu’s method [13] by the DAPI channel. A new mask is defined: (4) WB = WMask AND BOtsu , where AND is the Logical AND operator. Fig. 2, shows two examples where using the following procedure the mask WB corrects the aforementioned two sources of error.

BOtsu

BINARY MASK WB

Artefact elimination

WATERSHED SEGMENTATION

(a)

Centromere detection

M-FISH IMAGE

(b)

Figure 2: (a) Elimination of artefact (purple region) in an M-FISH image. The artefact is detected as chromosome regions by the multichannel watershed transform and eliminated using the binary mask WB . (b) Detection of unhybridized centromere using the binary mask WB , when the watershed segmentation fails to detect it.

2.2. Second Stage: Region Bayes Classification (RBC) For the classification stage we use a Bayes classifier modified to handle region instead of pixel classification. Suppose that a segmented region Ri , 1 ≤ i ≤ Q , where Q is the number of regions in the image, produced by the multispectral watershed segmentation, consists of l pixels. Each pixel is a vector z ∈ℜ5 measuring the fluorescent intensity of each of the five channels. Z is the set of l vector values of each region in the image Z = { z1 , z2 ,K , zl } and ωi : i = 1,K , 24 denote the 24 chromosome classes. The likelihood of the region p ( Z | ωi ) is computed as [8]: p(Z | ωi ) = p( z1 , z2 ,K, zl | ωi ) = l

⎛ ⎞ 1 ⎛ 1 l ⎞ ⎟ exp ⎜ − ∑( zk − µi )t Σi −1 ( zk − µi ) ⎟ , p( zk | ωi ) = ⎜ ∏ 1/ 2 5/ 2 ⎜ ⎟ 2 k =1 k =1 ⎝ ⎠ ⎝ ( 2π ) Σi ⎠ l

(5)

where µi is the 5 component mean vector, Σi is the 5 × 5 covariance matrix and Σ i and Σi −1 is the determinant and the inverse of each class ωi : i = 1,K , 24 , respectively. Working with the natural logarithm and dropping all terms that are the same for all classes, Bayes rule [14] assigns a test sample Z to the class ωi : (6) if ∀i ≠ j, DSi (Z ) > DS j (Z ), l 1 l where DSi ( Z ) = − ln Σ i − ∑ ( zk − µi )t Σ i−1 ( zk − µi ) + ln P (ωi ). 2 2 k =1

(7)

The prior class probabilities for each class ωi , must also be computed by training using a set of data. Then P (ωi ) is calculated as:

P (ωi ) =

(# pixels in class ωi ) 24

∑ (# pixels in class ωk )

.

(8)

k =1

3. Results We have evaluated our method using seventeen M-FISH images from the ADIR public database [15]. Each set of M-FISH images contains also a labeled class-map image in which each pixel is labeled according to the class to which it actually belongs. Two images were used for training and the remaining for testing. We evaluated both segmentation and classification stages, and their performance was measured by means of accuracy. More specifically, for the segmentation stage we run two experiments: in the first, we segment the M-FISH image using only the multichannel watershed (MIS) and then, in the second, we make additional use of the binary mask WB in order to eliminate artefacts and accurately detect un-hybridized regions (centromere). The best results (overall accuracy 83.59%) were obtained using both MIS and binary mask WB . The proposed classification method (RBC) was compared with a pixel-by-pixel classification technique [3-6] and shows better classification accuracy (89.88%). An example of the procedure is shown in Fig. 3.

Table 1: Segmentation and Classification rates of the proposed method and comparison of the classification performance with pixel-by-pixel approaches. MIS #Image

Without

WB

RBC With

WB

Pixel-by-Pixel [3]

Proposed Method

(%)

(%)

(%)

(%)

1

49.67

89.94

86.60

97.48

2

64.01

89.98

86.70

97.62

3

45.86

89.09

85.00

95.04

4

69.59

72.14

85.20

96.17

5

65.44

76.44

83.90

96.18

6

57.61

90.76

56.30

66.30

7

71.71

77.20

82.10

95.66

8

70.89

85.30

86.40

93.90

9

75.14

62.10

81.70

80.50

10

51.89

90.13

89.30

96.59

11

57.81

92.02

86.50

95.62

12

63.97

95.41

61.80

72.70

13

64.98

76.01

70.70

94.32

14

50.31

93.84

75.50

85.70

15

78.29

73.46

82.40

84.40

Overall

62.48 ± 9.94

83.59 ± 9.89

80.01 ± 9.78

89.88 ± 9.85

Region classification

Multichannel Segmentation

M-FISH Image

MIS with WB

RBC

Figure 3: Schematic of the procedure. A separate color was used to represent each chromosome class in the classification map (RBC).

4. Discussion An automated method for the segmentation and classification of multispectral chromosome images has been presented. The chromosome image is first decomposed into a set of homogeneous regions. Each region is then classified using a region Bayes classifier. Our methodology has been evaluated using the public available M-FISH database and an overall accuracy 83.59% and 89.88% was reported for the segmentation and classification respectively. The proposed method presents several advantages that could be summarized as follows: (a) the segmentation of each chromosome into regions imitates the procedure followed by an expert to identify chromosome rearrangements (abnormalities) [6]. (b) The use of the Otsu binarization method greatly simplifies the detection of chromosome regions that have not been hybridized (Fig. 4), providing a more accurate segmentation of the M-FISH image [12]. (c) Region Bayes classification (RBC) provides better classification accuracy than the pixel classifier [3,6]. (d) RBC proved to be more computational efficient since the average CPU

time for classification is 36.4sec ( ± 10.9) compared to the pixel-by-pixel classifier, which is 53.5sec ( ± 13.5). Our future work will focus on further testing of the proposed method in a larger image dataset. In addition, improvements in the segmentation stage are necessary to resolve problems of chromosomes which overlap.

Figure 4: Averaged fluor signals along the chromosomal axis.

4. References [1] [2] [3] [4] [5] [6]

[7]

[8] [9] [10] [11] [12]

[13] [14] [15]

M. Thompson, R. McInnes and H. Willard, Genetics in Medicine, 5th Edition, WB Saunders Company, Philadelphia, 1991. M.R. Speicher, S.G. Ballard, and D.C. Ward, “Karyotyping human chromosomes by combinatorial MultiFluor FISH,” Nat. Gen., vol. 12. 1996, pp. 341-344. M.P. Sampat, A.C. Bovik, J.K. Aggarwal, and K.R. Castleman, “Pixel-by-Pixel classification of MFISH images,” in Proc. of the 24th IEEE EMBS Ann. Intern. Conf., 2002, pp. 999-1000. H. Choi, K.R. Castleman, and A. Bovik, “Joint Segmentation and Classification of M-FISH Chromosome Images,” in Proc. of the 26th IEEE EMBS Ann. Intern. Conf., 2004, pp. 1636-1639. M.P. Sampat, A.C. Bovik, J.K. Aggarwal, and K.R. Castleman, “Supervised Parametric and nonparametric classification of chromosome images,” Patt. Recogn., vol. 38, 2005, pp. 1209-1223. W.C. Schwartzkopf, A.C. Bovik, and B.L. Evans, “Maximum-likelihood techniques for joint segmentation-classification of multispectral chromosome images,” IEEE Trans. Med. Imag., vol. 24, 2005, pp. 1593-1610. P.S. Karvelis, D.I. Fotiadis, M. Syrrou, and I. Georgiou, “A watershed based segmentation method for multispectral chromosome images classification,” in Proc. of the 28th IEEE EMBS Ann. Intern. Conf., 2006, pp. 3009-3012. D.A. Landgrebe, “The development of a spectral-spatial classifier for earth observational data,” Patt. Recogn., vol. 12, no. 3, 1980, pp. 165-175. C. Drewniok, “Multi-spectral edge detection - Some experiments on data from Landsat-TM,” Intern. J. Rem. Sens., vol. 15, 1994, pp. 3743-3765. L. Vincent, “Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms,” IEEE Trans. Imag. Proc., vol. 2, 1993, pp. 176-201. L. Vincent and P. Soille, “Watershed in digital spaces: An efficient algorithm based on immersion simulations,” IEEE Trans. Patt. An. Mach. Intelliig., vol. 13, 1993, pp. 583-598. O. Henegariu, P. Bray-Ward, S. Artan, G. Vance, M. Qumsyieh, and D. Ward, “Small Marker Chromosome Identification in Metaphase and Interphase Using Centromeric Multiplex FISH (CM-FISH),” Lab. Investig., vol. 81, 2001, pp. 475-481. N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Tran. Syst. Man Cybern., vol. 1, 1979, pp. 62–66.. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, 1990. The ADIR M_FISH Image Database. Available at: http://www.adires.com/05/Project/MFISH_DB/MFISH_DB.shtml