A Novel Method for Identification of Genes Contributing to the ...

16 downloads 0 Views 107KB Size Report
Contributing to the Pathological Classification Using. cDNA Microarray ... Keywords: cDNA microarray, histological classification, gene selection, colon cancer.
Genome Informatics 11: 257–259 (2000)

257

A Novel Method for Identification of Genes Contributing to the Pathological Classification Using cDNA Microarray Koji Kadota1,3

Yasushi Okazaki1,2

Shugo Nakamura3

[email protected]

[email protected]

[email protected]

Hiroshi

Shimada4

[email protected] 1

2 3 4

Kentaro

Shimizu3

[email protected]

Yoshihide Hayashizaki1,2 [email protected]

Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan Genome Science Laboratory, RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan Department of Biotechnology, University of Tokyo,1-1-1 Yayoi Bunkyo-ku, Tokyo 1138657, Japan School of Medicine, Surgery-II, Yokohama City University, 3-9 Fukuura Kanazawa-ku, Yokohama, Kanagawa 236-0004, Japan

Keywords: cDNA microarray, histological classification, gene selection, colon cancer

1

Introduction

There are several pathological differences in colon cancer such as (1) liver metastatis (+, −), (2) lymph node metastatis (+, −), (3) depth of invasion, and such morphological appearance is not always recognized using former histological test. Identifying genes regulated in such pathological classification will be used efficiently for assigning tumors to separate classes and for prescribing the amount or species of anti-cancer drug. However, there is yet no general approach to select such genes [1]. Here, we propose a method to retrieve those genes from cDNA microarray experiments. We applied this method to the data of a 21168 genes × 22 specimen gene expression matrix to identify genes contributing to three pathological differences mentioned above. As a result, we successfully identified tens or hundreds of genes whose expression pattern can distinctly classified into two groups of specimen by hierarchical clustering method.

2

Materials and Methods

Materials: The 21,168 human cDNAs used in this study were obtained from Research Genetics. Various kinds of colon cancer samples were used and labeled by Cy-3 dye. A mixture of normal colon tissues from 3 patients was used as a reference and labeled by Cy-5. For obtaining a reproducible data, we performed the experiment twice under identical conditions. The value of the gene expression matrix was preprocessed with PRIM filtration method [2]. Methods: Our method for gene identification consists of two steps: (1) the first selection and (2) the second selection. The first selection refers to retrieving genes so that whose average correlation coefficient (Rs) among all combinations of the same group is higher than those average Ra among all combinations of a different group (Rs − Ra > 0.3). In the second selection, we focused on a specific specimen that was clustered with other specimens unexpectedly. Then, we tried to extract such genes as can give the lower average R value among the same group and can give higher average R value among the other group when the genes were omitted.

Kadota et al.

258

3

Results and Discussion

We applied the procedure to identify candidate genes in three cases: (1) liver metastatis (+, 5 specimens; −, 17 specimens), (2) lymph node metastatis (+, 5; −, 7), (3) depth of invasion (slight, 9; serious, 5). As a result, we could identify tens or hundreds of genes in case of (2) and (3). Two large branches of the dendrogram of selected genes with hierarchical clustering showed the feasibility of the method. Of course, for instance, one large cluster is derived from ‘serious’ and another is ‘slight’ in depth of invasion. As shown in Figure 1, where the specimens were clustered based on the depth of invasion, there remains one specimen of ‘serious’ in another group of cluster after the first selection. However, after the second selection, those two branches were correctly clustered. In Figure 2, where the specimens were clustered based on the liver metastasis, cancer 6, 12, 13, 14 were clustered together with liver 1–5. This may imply the possibility that cancer 6, 12, 13 and 14 could show undetectable liver metastasis or could lead to the future diagnosis of liver metastasis. These possibilities need to be answered. These diagnosis are very important for the decision of prescription of anti-cancer drugs. The feasibility of this method for the identification of genes contributing to the classification of lymph node metastasis were also tested using the jack-knife test. The more accurate we can diagnose, the more benefit we can get [3]. This method has a potential to be widely used for identifying genes that could contribute to classify samples based on specific phenotypes. (a)

(b)

Figure 1: Clustering results of selected genes in depth of invasion. (a), after first selection and (b), after second selection. The scale to the right of the tree depicts the correlation coefficient value represented by the length of the dendrogram branches connecting pairs of nodes. (a)

(b)

Figure 2: Clustering results of selected genes in liver metastasis. (a), after first selection and (b), after second selection.

References [1] Alizadeh, A. A., Eisen, M. B., Davis, R. E., et al., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403:503–511, 2000.

Novel Method for Identification of Genes

259

[2] Kadota, K., Miki, R., Bono, H., Shimizu, K., Okazaki Y., and Hayashizaki, Y., PRIM (Preprocessing Implementation for Microarray): an efficient method for processing cDNA microarray data, Physiological Genomics, (in press). [3] Young, R.A., Biomedical discovery with DNA arrays, Cell, 102:9–15, 2000.