AN EFFECTIVE SYSTEM FOR OPTICAL MICROSCOPY CELL IMAGE SEGMENTATION, TRACKING AND CELL PHASE IDENTIFICATION Jun Yan1,2, Xiaobo Zhou2,3, Qiong Yang4, Ning Liu2,4, Qiansheng Cheng1, Stephen T.C. Wong2,3 1

LMAM, School of Mathematical Science, Peking University, Beijing, P.R. China, 100871 2 HCNR-Center for Bioinformatics, Harvard Medical School, Boston, MA 02115 3 Molecular Imaging Center, Brigham & Women’s Hospital/HMS, Boston, MA 02115 4 Microsoft Research Asia, 49 Zhichun Road, Beijing, P.R. China, 100871 ABSTRACT

The lacking of automatic screen systems that can deal with large volume of time-lapse optical microscopy imaging is a bottleneck of modern bio-imaging research. In this paper, we propose an effective automated analytic system that can be used to acquire, track and analyze cell-cycle behaviors of a large population of cells. We use traditional watershed algorithm for cell nuclei segmentation and then a novel hybrid merging method is proposed for fragments merging. After a distance and size based tracking procedure, the performance of fragments merging is improved again by the sequence context information. At last, the cell nuclei can be classified into different phases accurately in a continuous Hidden Markov Model (HMM). Experimental results show the proposed system is very effective for cell sequence segmentation, tracking and cell phase identification.

problem, we propose a novel hybrid fragments merging algorithm based on the traditional watershed segmentation [5]. Our hybrid merging algorithm classifies the features of cells into two classes: features with trend and features with no trend. And then we propose to compute scores of fragments based on the two classes of features respectively. At last, it combines these two kinds of scores to make the final decision. We also improved the segmentation results in the tracking procedure by contextual information. After distance and size based tracking, the cell phase identification is conducted based on a continuous Hidden Markov Model (HMM) [2, 8] which utilizes both feature information and contextual information. Pre-processing Segmentation

1. INTRODUCTION Time-lapse fluorescence microscopy is attracting more and more attention nowadays. Its significance surges mainly because of its potential in achieving new and high throughput ways to conduct drug discovery and quantitative cellular studies [3, 4]. However, a bottleneck of time-lapse fluorescence microscopy research and application is the lacking of automated high content screen system for optical microscopy time-lapse imaging analysis. In this paper, we propose a novel automated analytic screen system that can be used to acquire [5], track [6] and analyze [4] cell-cycle behaviors of a population of cells effectively. In the sequence of inputting time-lapse fluorescence microscopy images, our system includes four key steps: image preprocessing; cell nuclei segmentation and fragments merging; cell nuclei tracking; and finally cell phase identification. Fig. 1 shows a flow chart of our proposed system in this paper. In image preprocessing procedure, image enhancement, adaptive threshold, morphological filtering and distance transformation are conducted. Regarding the segmentation

Hybrid merging Tracking Yes Ambiguous?

Merging correction

No Phase identification Figure 1. System overview The rest of this paper is organized as follows. In section 2, we give a summary of image preprocessing and introduce the novel nuclei segmentation approach. In section 3, after a short introduction of distance and size based tracking, we discuss how to improve the segmentation accuracy by using contextual information. In section 4, the continuous HMM is used for the context based cell phase identification. In section 5, we show our experimental results. In Section 6, we list our conclusion and future works.

2. IMAGE PREPRECESSING AND SEGMENTATION Before segmentation we preprocess images to remove noise, discard undesirable features, and correct illumination artifacts. We firstly enhance the images by using the algorithm proposed in [1] with a radius 4 disk-shaped flat structuring element. And then adaptive threshold [7] is used in our system to achieve the binary images. After that, images were processed with a morphological opening filter to eliminate small islands and holes. At last, a distance transformation was conducted for better segmentation.

cell should be close to 1. However, the single feature based approaches ignores the information of other effective features. In this paper, we propose to classify the features into two classes: (1) features with trend such as the “roughness” and “f” in previous example. These features imply some assumptions; (2) features with no trend. If we can not tell a feature has trend or not, we classify this feature to features with no trend. We then compute the PDF scores of fragments based features with no trend and compute a score based on a feature with trend. At last, we combine the two scores based on the two different classes of features. We select roughness to compute a score as the feature with trend in this paper. In other words, our hybrid merging algorithm combines the PDF score on features with no trend and

After preprocessing, the common used watershed segmentation algorithm [5] is conducted. However, it introduces serious over-segmentation problem. To merge the fragments generated by watershed algorithm, we transformed all the fragments of an image into a vector space. 12 features are involved in this paper [4]. We represent all the fragments in frame t by a n × d matrix X, where n is the number of fragments in frame t and d is the number of features.

roughness score. Suppose 0 ≤ S Ri = exp{− ( S Ri − 1)} ≤ 1

Probability Density Function (PDF) merging is one of the most popularly used merging approaches in feature space. It computes a score S pi = ϕ ( xi , µ , Σ ) for each fragment xi ,

(c) shows the result of hybrid merging.

where ϕ is a Gaussian kernel, µ and Σ stand for the mean and covariance matrix of training data respectively. It selects the fragments with small scores, i.e. fragments far away from training data, to merge. However, it is not always true that a real cell should be close to the training data in some specific features. As an example, to the feature f = perimeter/perimeter of convex hull , it is obvious that we should assume that if a fragment i is a cell,

f i should be

close to 1. However, the feature f of training data are always larger than 1. Then mistakes are introduced by PDF merging algorithm. For instance, in our experiments the mean value of the training data in feature f is 1.2, if the f features of two fragments are 1.05 and 1.15 respectively, which one has a larger probability to be a cell? If we must choose one to merge, the traditional PDF merging will choose the first one and intuitively we should choose the second one to merge. In contrast to the PDF merging which utilizes all the features to compute scores of fragments, another category of approaches such as roughness merging [4] are single feature based algorithms. For instance, roughness merging computes a score S Ri = p / 4π Ai for each fragment xi 2 i

. As a criterion,

we merge fragment i and j if S Rij < S Ri and S Rij < S Rj , where S Rij is the roughness score of a fragment consisted of fragment i and fragment j. Most of the single feature based algorithms are based on the fact that the features imply assumptions. For example, the roughness merging assumes that the cell should be as circular as possible, i.e. S Ri of a real

,

S pi = S pi /( S pi + S pj + S pij ) , our hybrid score is represented as S i = w1 S Ri + w2 S pi , where w1 and w2 are weights which could

be computed from training data by regression. We merged cell i and cell j if and only if S Rij > S Ri and S Rij > S Rj . Figure 2

3. TRACKING After the watershed segmentation and the hybrid merging procedure, we match the correspondence between nuclei at time t and t+1 by computing the distance between them and the size of them. By our experiments, the matching results are not sensitive to the definition of distance. We use the Euclidean distance of the centroid as the distance between cells. Once we get the distance matrix, we can match the cell nuclei in frame t and t+1 based on the distance and their sizes. Suppose x (t ) is the i i

th

cell in frame t, we scan the

adjacent matrix to find all possible candidates to be its daughter cells and the candidates at frame t+1 are added one by one according to their distance to the nucleus at time t. A nucleus with the smallest distance is added first and the largest one last. Each time when a nucleus is added, the sum size is compared with the size of x (t ) . If the sum size is i

more than 10% larger than the size of x (t ) , we stop and i

discard the latest one. We choose 10% as the threshold since the area of a cell cannot increase more than 10% in the next frame in accordance to our observation. We ignore all the cells close to the border of an image. If a many to one correspondence is found, it should be overlap of cells, we record the position and size information of them. If a one to many correspondence is found, if the “many” cells in frame t+1 share borders and the sum of their size is close to the size of their parent cell, they may lead by over segmentation and we merge them; if their ancestor have many to one case, this one to many correspondence should be led by the

overlap of their ancestor, we match them by the recorded position and size information; otherwise we recognize this case as a cell splitting.

the categorization of cells could be denoted as p (θt = sm | X t , X t −1 ) .we classify cell X to phase sm* , i.e.

4. CELL PHASE IDENTIFICATION

sm* = arg max{ p (θ t = sm | X t , X t −1 )} .

t

θt

= sm* if and only if, m =1,2 ,

The performance of traditional feature based cell phase identification approaches highly depends on the extracted features and the design of classifiers. In this paper, we propose to improve cell phase identification performance by utilizing both the information of features and the contextual information in continuous Hidden Markov Model (HMM). Figure 2 shows four phases of our studied data.

M

From the biological point of view, the anaphase cell nuclei in a sequence cannot last more than 3 frames. In addition, no cell nucleus can jump from inter phase to anaphase directly or jump to prophase from inter phase and jump back without metaphase or anaphase. We apply this rule after the HMM classifier to improve the classification performance. We call it HMM with RULE in this paper. 5. EXPERIMENTS

(a) (b) (c) (d) Fig 2. A sampled cell cycle (cell with blue star). (a), inter phase; (b), prophase; (c), metaphase; (d), anaphase. The occurrence of phases in a sequence of cells can be regarded as a stochastic process; hence, the cell sequence can be represented as a Markov chain where phases are in hidden states. The occurrence of the first phase in the sequence is characterized by the initial probability of the Markov chain and the occurrence of the other phases, which was given by the occurrence of its previous phase, is characterized by the transition probability. Given a set of training cell sequences, the initial and the transition probabilities for Markov chains representing the cell sequences are calculated. In addition, we assume each hidden state can generate a group of continuous visible states which can be described by R Gaussian Mixtures. We optimize these Gaussian mixtures by ExpectationMaximization (EM) algorithm. Those initialized probabilities and the optimized Gaussian mixtures are regarded as a continuous Hidden Markov Model for the training sequences. And then the trained model is represented by a group of parameters, Λ = {Π , A, µ kr , Σ kr , ckr , k = 1, 2, M , r = 1, 2, R} where Π stands for the initial probability of each phase, A stands for the transition probability of Markov model among phases. µ = {µ kr } represent the mean of Gaussian mixtures and

={

kr

} are the corresponding covariance

matrix. C = {C kr } are coefficients of Gaussian mixtures. M is the number of Gaussian Mixtures for each phase. Given training data, we optimize Λ by EM algorithm. Suppose X is the feature vector of cell X in frame t

The data used in this paper are HeLa cells line. We took pictures of the cell nuclei every 15 minutes with a time-lapse fluorescence microscopy. Our experiments were conducted on the data over a period of 2 days. Both matlab platform and Windows based C++ software are developed. For an image with approximately 300 nuclei, the segmentation time by CellIQ was about 1.6 seconds on a Pentium IV 2.4GHz computer, note that less than 1 second was required to track a trace. An exception is that nuclei that left or disappeared in the field of view during the whole sequence were ignored. Figure 2 shows an intuitive example of segmentation and tracking. 5.1. Segmentation

To test the segmentation algorithm, 20 images were randomly chosen from the 192 images. This generates a test set consisting of 5596 nuclei. Four approaches are tested, (1) Watershed, simple watershed algorithm without fragments merging, (2) PDF merging, (3) Hybrid merging algorithm proposed in this paper and (4) Hybrid merging + Context based correction. The results are shown in figure 3.

100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%

t

According to continuous HMM, the probability of a cell X

t

belongs to phase sm , i.e. θ = sm , should only basically t

depend on X and X . Thus the probabilities we need for t

t −1

Figure 3. Results of segmentation

The ground truth is given by manual analysis. It can be seen that without fragment merging, the watershed algorithm can only correctly segment 87.79% of the nuclei. The popularly used PDF merging can correctly segment 92.94% of cell nuclei. On the other hand, our proposed hybrid merging algorithm can correctly segment 96.62% of the nuclei. In addition, the accuracy could be improved to 98.12% after the context based correction. 5.2. Tracking

We tracked all the possible sequences in our dataset. In total, 585 traces were detected. No cell nuclei were missed in the 585 sequences. We only considered the sequences with length 192 (full length sequence) in this paper. 426 full length sequences all together. 95.77% of them are all correct. 5.3. Cell Phase Identification

We selected 50 full length sequences as training data and all these 192 × 50 = 9600 cell nuclei were labeled manually. Among the 9600 cell nuclei, 9154 of them were in inter phase, 58 of them were in prophase, 267 of them were in metaphase; and 121 of them were in anaphase. Another 50 sequences were used as the test data to test the performance of continuous HMM for phase identification. Two approaches were used as baseline algorithms, one was Maximum Likelihood (ML) classifier and the other was K Nearest Neighbor (KNN, K=5, 7) classifier. Figure 4 shows the results of cell phase identification.

Figure 4. Results of cell phase identification It can be seen that the HMM can outperform all the other approaches involved in this paper and especially for the identification of prophase. 6. CONCLUSION AND FUTURE WORKS

In this paper, an automated high content screen system for dynamic cellular image analysis was proposed. We used watershed algorithm for cell nuclei segmentation, and then a roughness score was combined with PDF score for fragments merging. 98.12% segmentation accuracy was achieved by the proposed approach. After segmentation is the distance and size based tracking procedure, and the performance of fragments merging was improved by the context information from tracking. Also based on the context information, the cell nuclei can be classified into different phases accurately in a Hidden Markov Model. Superior performance has been showed by the context based cell phase identification. The next step of this work is to extract attributes from vast volumes of time-lapse images of cancer cell lines under different drug perturbation conditions to a large cellular imaging database. We then study the influence of various drug compounds in the mitotic process of cancer cells to identify effective lead candidates of anti-mitotic cancer drug compounds by our system. 7. ACKNOWLEDGEMENT

This research is funded by the HCNR Center for Bioinformatics Research Grant, Harvard Medical School and a NIH R01 LM008696 Grant to Wong, S.T.C. 8. REFERENCES [1] F. Meyer, “Iterative Image Transformations for an Automatic Screening of Cervical Cancer.” Histochemistry and Cytochemistry, vol. 27, pp. 128-135, 1979. [2] G.Gallardo, F. Lanzini, M. A. Mackey, Sonka, and F. Yang, “Mitotic Cell Recognition with Hidden Markov Models.” Medical Imaging vol. 5367, pp. 661-668, 2004. [3] G. Lin, U. Adiga, K. Olson, J. F. Guzowski, C. A. Barnes, and B. Roysam, “A Hybrid 3-D Watershed Algorithm Incorporatiing Gradient Cues&Object Models for Automatic Segmentation of Nuclei in Confocal Image Stacks.” Cytometry Part A, vol 56A, pp.23-36, 2003 [4] X. Chen, X. Zhou, and S. T. C. Wong, “Automated Segmentation, Classification and Tracking of Cancer Cell Nuclei in Time-lapse Microscopy.” To appear in IEEE Transactions on Biomedical Engineering, 2006 [5] N. Malpica, C. O. d. Solorzano, J. J. Vaquero, A. s. Santos, I. Vallcorba, J. M. Garcia-Sagredo, and F. d. Poze, “Applying Watershed Algorithms to the Segmentation of Clustered Nuclei.” Cytometry vol. 28, pp. 289-297, 1997. [6] O. Debeir, I. Camby, R. Kiss, P. V. Ham, and C. Decaestecker, “A Model-Based Approach for Automated In Vivo Cell Tracking and Chemotaxis Analyses.” Cytometry vol. 60A, pp.29-40, 2004. [7] P.K. Sahoo, S. Soltani, A.K.C. Wong, and Y. Chen, “A Survey of Thresholding Techniques.” Computer Vision Graphics Image Processing, vol 41, pp. 233-260, 1998. [8] X. Zhou and X. Wang, “Optimization of Gaussian Mixture Model for Satellite Image Classification.” Department of Electrical Engineering, Texas A&N Ybuversity 1998.

LMAM, School of Mathematical Science, Peking University, Beijing, P.R. China, 100871 2 HCNR-Center for Bioinformatics, Harvard Medical School, Boston, MA 02115 3 Molecular Imaging Center, Brigham & Women’s Hospital/HMS, Boston, MA 02115 4 Microsoft Research Asia, 49 Zhichun Road, Beijing, P.R. China, 100871 ABSTRACT

The lacking of automatic screen systems that can deal with large volume of time-lapse optical microscopy imaging is a bottleneck of modern bio-imaging research. In this paper, we propose an effective automated analytic system that can be used to acquire, track and analyze cell-cycle behaviors of a large population of cells. We use traditional watershed algorithm for cell nuclei segmentation and then a novel hybrid merging method is proposed for fragments merging. After a distance and size based tracking procedure, the performance of fragments merging is improved again by the sequence context information. At last, the cell nuclei can be classified into different phases accurately in a continuous Hidden Markov Model (HMM). Experimental results show the proposed system is very effective for cell sequence segmentation, tracking and cell phase identification.

problem, we propose a novel hybrid fragments merging algorithm based on the traditional watershed segmentation [5]. Our hybrid merging algorithm classifies the features of cells into two classes: features with trend and features with no trend. And then we propose to compute scores of fragments based on the two classes of features respectively. At last, it combines these two kinds of scores to make the final decision. We also improved the segmentation results in the tracking procedure by contextual information. After distance and size based tracking, the cell phase identification is conducted based on a continuous Hidden Markov Model (HMM) [2, 8] which utilizes both feature information and contextual information. Pre-processing Segmentation

1. INTRODUCTION Time-lapse fluorescence microscopy is attracting more and more attention nowadays. Its significance surges mainly because of its potential in achieving new and high throughput ways to conduct drug discovery and quantitative cellular studies [3, 4]. However, a bottleneck of time-lapse fluorescence microscopy research and application is the lacking of automated high content screen system for optical microscopy time-lapse imaging analysis. In this paper, we propose a novel automated analytic screen system that can be used to acquire [5], track [6] and analyze [4] cell-cycle behaviors of a population of cells effectively. In the sequence of inputting time-lapse fluorescence microscopy images, our system includes four key steps: image preprocessing; cell nuclei segmentation and fragments merging; cell nuclei tracking; and finally cell phase identification. Fig. 1 shows a flow chart of our proposed system in this paper. In image preprocessing procedure, image enhancement, adaptive threshold, morphological filtering and distance transformation are conducted. Regarding the segmentation

Hybrid merging Tracking Yes Ambiguous?

Merging correction

No Phase identification Figure 1. System overview The rest of this paper is organized as follows. In section 2, we give a summary of image preprocessing and introduce the novel nuclei segmentation approach. In section 3, after a short introduction of distance and size based tracking, we discuss how to improve the segmentation accuracy by using contextual information. In section 4, the continuous HMM is used for the context based cell phase identification. In section 5, we show our experimental results. In Section 6, we list our conclusion and future works.

2. IMAGE PREPRECESSING AND SEGMENTATION Before segmentation we preprocess images to remove noise, discard undesirable features, and correct illumination artifacts. We firstly enhance the images by using the algorithm proposed in [1] with a radius 4 disk-shaped flat structuring element. And then adaptive threshold [7] is used in our system to achieve the binary images. After that, images were processed with a morphological opening filter to eliminate small islands and holes. At last, a distance transformation was conducted for better segmentation.

cell should be close to 1. However, the single feature based approaches ignores the information of other effective features. In this paper, we propose to classify the features into two classes: (1) features with trend such as the “roughness” and “f” in previous example. These features imply some assumptions; (2) features with no trend. If we can not tell a feature has trend or not, we classify this feature to features with no trend. We then compute the PDF scores of fragments based features with no trend and compute a score based on a feature with trend. At last, we combine the two scores based on the two different classes of features. We select roughness to compute a score as the feature with trend in this paper. In other words, our hybrid merging algorithm combines the PDF score on features with no trend and

After preprocessing, the common used watershed segmentation algorithm [5] is conducted. However, it introduces serious over-segmentation problem. To merge the fragments generated by watershed algorithm, we transformed all the fragments of an image into a vector space. 12 features are involved in this paper [4]. We represent all the fragments in frame t by a n × d matrix X, where n is the number of fragments in frame t and d is the number of features.

roughness score. Suppose 0 ≤ S Ri = exp{− ( S Ri − 1)} ≤ 1

Probability Density Function (PDF) merging is one of the most popularly used merging approaches in feature space. It computes a score S pi = ϕ ( xi , µ , Σ ) for each fragment xi ,

(c) shows the result of hybrid merging.

where ϕ is a Gaussian kernel, µ and Σ stand for the mean and covariance matrix of training data respectively. It selects the fragments with small scores, i.e. fragments far away from training data, to merge. However, it is not always true that a real cell should be close to the training data in some specific features. As an example, to the feature f = perimeter/perimeter of convex hull , it is obvious that we should assume that if a fragment i is a cell,

f i should be

close to 1. However, the feature f of training data are always larger than 1. Then mistakes are introduced by PDF merging algorithm. For instance, in our experiments the mean value of the training data in feature f is 1.2, if the f features of two fragments are 1.05 and 1.15 respectively, which one has a larger probability to be a cell? If we must choose one to merge, the traditional PDF merging will choose the first one and intuitively we should choose the second one to merge. In contrast to the PDF merging which utilizes all the features to compute scores of fragments, another category of approaches such as roughness merging [4] are single feature based algorithms. For instance, roughness merging computes a score S Ri = p / 4π Ai for each fragment xi 2 i

. As a criterion,

we merge fragment i and j if S Rij < S Ri and S Rij < S Rj , where S Rij is the roughness score of a fragment consisted of fragment i and fragment j. Most of the single feature based algorithms are based on the fact that the features imply assumptions. For example, the roughness merging assumes that the cell should be as circular as possible, i.e. S Ri of a real

,

S pi = S pi /( S pi + S pj + S pij ) , our hybrid score is represented as S i = w1 S Ri + w2 S pi , where w1 and w2 are weights which could

be computed from training data by regression. We merged cell i and cell j if and only if S Rij > S Ri and S Rij > S Rj . Figure 2

3. TRACKING After the watershed segmentation and the hybrid merging procedure, we match the correspondence between nuclei at time t and t+1 by computing the distance between them and the size of them. By our experiments, the matching results are not sensitive to the definition of distance. We use the Euclidean distance of the centroid as the distance between cells. Once we get the distance matrix, we can match the cell nuclei in frame t and t+1 based on the distance and their sizes. Suppose x (t ) is the i i

th

cell in frame t, we scan the

adjacent matrix to find all possible candidates to be its daughter cells and the candidates at frame t+1 are added one by one according to their distance to the nucleus at time t. A nucleus with the smallest distance is added first and the largest one last. Each time when a nucleus is added, the sum size is compared with the size of x (t ) . If the sum size is i

more than 10% larger than the size of x (t ) , we stop and i

discard the latest one. We choose 10% as the threshold since the area of a cell cannot increase more than 10% in the next frame in accordance to our observation. We ignore all the cells close to the border of an image. If a many to one correspondence is found, it should be overlap of cells, we record the position and size information of them. If a one to many correspondence is found, if the “many” cells in frame t+1 share borders and the sum of their size is close to the size of their parent cell, they may lead by over segmentation and we merge them; if their ancestor have many to one case, this one to many correspondence should be led by the

overlap of their ancestor, we match them by the recorded position and size information; otherwise we recognize this case as a cell splitting.

the categorization of cells could be denoted as p (θt = sm | X t , X t −1 ) .we classify cell X to phase sm* , i.e.

4. CELL PHASE IDENTIFICATION

sm* = arg max{ p (θ t = sm | X t , X t −1 )} .

t

θt

= sm* if and only if, m =1,2 ,

The performance of traditional feature based cell phase identification approaches highly depends on the extracted features and the design of classifiers. In this paper, we propose to improve cell phase identification performance by utilizing both the information of features and the contextual information in continuous Hidden Markov Model (HMM). Figure 2 shows four phases of our studied data.

M

From the biological point of view, the anaphase cell nuclei in a sequence cannot last more than 3 frames. In addition, no cell nucleus can jump from inter phase to anaphase directly or jump to prophase from inter phase and jump back without metaphase or anaphase. We apply this rule after the HMM classifier to improve the classification performance. We call it HMM with RULE in this paper. 5. EXPERIMENTS

(a) (b) (c) (d) Fig 2. A sampled cell cycle (cell with blue star). (a), inter phase; (b), prophase; (c), metaphase; (d), anaphase. The occurrence of phases in a sequence of cells can be regarded as a stochastic process; hence, the cell sequence can be represented as a Markov chain where phases are in hidden states. The occurrence of the first phase in the sequence is characterized by the initial probability of the Markov chain and the occurrence of the other phases, which was given by the occurrence of its previous phase, is characterized by the transition probability. Given a set of training cell sequences, the initial and the transition probabilities for Markov chains representing the cell sequences are calculated. In addition, we assume each hidden state can generate a group of continuous visible states which can be described by R Gaussian Mixtures. We optimize these Gaussian mixtures by ExpectationMaximization (EM) algorithm. Those initialized probabilities and the optimized Gaussian mixtures are regarded as a continuous Hidden Markov Model for the training sequences. And then the trained model is represented by a group of parameters, Λ = {Π , A, µ kr , Σ kr , ckr , k = 1, 2, M , r = 1, 2, R} where Π stands for the initial probability of each phase, A stands for the transition probability of Markov model among phases. µ = {µ kr } represent the mean of Gaussian mixtures and

={

kr

} are the corresponding covariance

matrix. C = {C kr } are coefficients of Gaussian mixtures. M is the number of Gaussian Mixtures for each phase. Given training data, we optimize Λ by EM algorithm. Suppose X is the feature vector of cell X in frame t

The data used in this paper are HeLa cells line. We took pictures of the cell nuclei every 15 minutes with a time-lapse fluorescence microscopy. Our experiments were conducted on the data over a period of 2 days. Both matlab platform and Windows based C++ software are developed. For an image with approximately 300 nuclei, the segmentation time by CellIQ was about 1.6 seconds on a Pentium IV 2.4GHz computer, note that less than 1 second was required to track a trace. An exception is that nuclei that left or disappeared in the field of view during the whole sequence were ignored. Figure 2 shows an intuitive example of segmentation and tracking. 5.1. Segmentation

To test the segmentation algorithm, 20 images were randomly chosen from the 192 images. This generates a test set consisting of 5596 nuclei. Four approaches are tested, (1) Watershed, simple watershed algorithm without fragments merging, (2) PDF merging, (3) Hybrid merging algorithm proposed in this paper and (4) Hybrid merging + Context based correction. The results are shown in figure 3.

100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%

t

According to continuous HMM, the probability of a cell X

t

belongs to phase sm , i.e. θ = sm , should only basically t

depend on X and X . Thus the probabilities we need for t

t −1

Figure 3. Results of segmentation

The ground truth is given by manual analysis. It can be seen that without fragment merging, the watershed algorithm can only correctly segment 87.79% of the nuclei. The popularly used PDF merging can correctly segment 92.94% of cell nuclei. On the other hand, our proposed hybrid merging algorithm can correctly segment 96.62% of the nuclei. In addition, the accuracy could be improved to 98.12% after the context based correction. 5.2. Tracking

We tracked all the possible sequences in our dataset. In total, 585 traces were detected. No cell nuclei were missed in the 585 sequences. We only considered the sequences with length 192 (full length sequence) in this paper. 426 full length sequences all together. 95.77% of them are all correct. 5.3. Cell Phase Identification

We selected 50 full length sequences as training data and all these 192 × 50 = 9600 cell nuclei were labeled manually. Among the 9600 cell nuclei, 9154 of them were in inter phase, 58 of them were in prophase, 267 of them were in metaphase; and 121 of them were in anaphase. Another 50 sequences were used as the test data to test the performance of continuous HMM for phase identification. Two approaches were used as baseline algorithms, one was Maximum Likelihood (ML) classifier and the other was K Nearest Neighbor (KNN, K=5, 7) classifier. Figure 4 shows the results of cell phase identification.

Figure 4. Results of cell phase identification It can be seen that the HMM can outperform all the other approaches involved in this paper and especially for the identification of prophase. 6. CONCLUSION AND FUTURE WORKS

In this paper, an automated high content screen system for dynamic cellular image analysis was proposed. We used watershed algorithm for cell nuclei segmentation, and then a roughness score was combined with PDF score for fragments merging. 98.12% segmentation accuracy was achieved by the proposed approach. After segmentation is the distance and size based tracking procedure, and the performance of fragments merging was improved by the context information from tracking. Also based on the context information, the cell nuclei can be classified into different phases accurately in a Hidden Markov Model. Superior performance has been showed by the context based cell phase identification. The next step of this work is to extract attributes from vast volumes of time-lapse images of cancer cell lines under different drug perturbation conditions to a large cellular imaging database. We then study the influence of various drug compounds in the mitotic process of cancer cells to identify effective lead candidates of anti-mitotic cancer drug compounds by our system. 7. ACKNOWLEDGEMENT

This research is funded by the HCNR Center for Bioinformatics Research Grant, Harvard Medical School and a NIH R01 LM008696 Grant to Wong, S.T.C. 8. REFERENCES [1] F. Meyer, “Iterative Image Transformations for an Automatic Screening of Cervical Cancer.” Histochemistry and Cytochemistry, vol. 27, pp. 128-135, 1979. [2] G.Gallardo, F. Lanzini, M. A. Mackey, Sonka, and F. Yang, “Mitotic Cell Recognition with Hidden Markov Models.” Medical Imaging vol. 5367, pp. 661-668, 2004. [3] G. Lin, U. Adiga, K. Olson, J. F. Guzowski, C. A. Barnes, and B. Roysam, “A Hybrid 3-D Watershed Algorithm Incorporatiing Gradient Cues&Object Models for Automatic Segmentation of Nuclei in Confocal Image Stacks.” Cytometry Part A, vol 56A, pp.23-36, 2003 [4] X. Chen, X. Zhou, and S. T. C. Wong, “Automated Segmentation, Classification and Tracking of Cancer Cell Nuclei in Time-lapse Microscopy.” To appear in IEEE Transactions on Biomedical Engineering, 2006 [5] N. Malpica, C. O. d. Solorzano, J. J. Vaquero, A. s. Santos, I. Vallcorba, J. M. Garcia-Sagredo, and F. d. Poze, “Applying Watershed Algorithms to the Segmentation of Clustered Nuclei.” Cytometry vol. 28, pp. 289-297, 1997. [6] O. Debeir, I. Camby, R. Kiss, P. V. Ham, and C. Decaestecker, “A Model-Based Approach for Automated In Vivo Cell Tracking and Chemotaxis Analyses.” Cytometry vol. 60A, pp.29-40, 2004. [7] P.K. Sahoo, S. Soltani, A.K.C. Wong, and Y. Chen, “A Survey of Thresholding Techniques.” Computer Vision Graphics Image Processing, vol 41, pp. 233-260, 1998. [8] X. Zhou and X. Wang, “Optimization of Gaussian Mixture Model for Satellite Image Classification.” Department of Electrical Engineering, Texas A&N Ybuversity 1998.