Protein Expression Profiling Identifies Subclasses of ... - CiteSeerX

0 downloads 0 Views 979KB Size Report
Feb 1, 2005 - Nielsen TO, Hsu FD, O'Connell JX, et al. Tissue microarray validation of epidermal growth factor receptor and SALL2 in synovial sarcoma with ...
04-3155 Research Article

Protein Expression Profiling Identifies Subclasses of Breast Cancer and Predicts Prognosis 1,2

1

6

Jocelyne Jacquemier, Christophe Ginestier, Jacques Rougemont, 3 1,2,7 1 Vale´rie-Jeanne Bardou, Emmanuelle Charafe-Jauffret, Jeannine Geneix, 1 8 4 2,7 Jose´ Ade´laı¨de, Alane Koki, Gilles Houvenaeghe, Jacques Hassoun, 5,7 5,7 1 1,5,7 Dominique Maraninchi, Patrice Viens, Daniel Birnbaum, and Franc¸ois Bertucci

Q2

1 Institut de Cance´rologie de Marseille, De´partement d’Oncologie Mole´culaire, 2BioPathologie, 3BioStatistiques, 4Chirurgie, and 5Oncologie Me´dicale et Investigation Clinique, Institut Paoli-Calmettes and UMR599 Institut National de la Sante´ et de la Recherche Me´dicale; 6 ERM206 Institut National de la Sante´ et de la Recherche Me´dicale; 7Universite´ de la Me´diterrane´e, UFR de Me´decine; and 8Ipsogen S.A., Marseille, France

and St-Gallen; refs. 3, 4), decisions on whether to treat patients with node-negative cancer with or without adjuvant chemotherapy are currently being made with scant information on risk for metastatic relapse. In addition, identifying among the patients who receive chemotherapy, those who will benefit and those who will not benefit from standard anthracyclin-based protocols remains elusive. Large-scale molecular techniques such as DNA microarrays contribute to the understanding of the molecular complexity of breast cancer (5). Several studies have showed the potential clinical utility of gene expression profiles, including the identification of prognostic subclasses (6–15). Their clinical impact must be subsequently evaluated in larger studies, followed by the development of gene expression–based diagnostics adapted to the clinical setting. The cost, complexity, and interpretation of DNA microarrays are currently unsuitable for routine use in standard clinical settings. The sensitivity, specificity, reproducibility, and technical feasibility outside large academic centers have to be addressed, and experimental conditions have to be standardized and data compared in multicenter clinical trials. Additional opportunities to identify and/or validate molecular signatures are provided by alternative high-throughput approaches such as tissue microarrays (TMA; refs. 16–19). The technique can be coupled to immunohistochemistry to study hundreds of specimens simultaneously. Immunohistochemistry is applicable to paraffin-embedded samples, avoiding the requirement for frozen specimens. Immunohistochemistry is relatively inexpensive, straightforward, and well established in standard clinical pathology laboratories. Thus, immunohistochemistry on TMA may be a practical approach both in validation studies and in routine testing. However, analytic methods to efficiently process multiple-target immunohistochemistry data have not been previously developed. Most of the studies have applied unsupervised hierarchical clustering (20–26), and only one has addressed the prognostic issue in breast cancer (27). Supervised analysis, based on Cox regression model, was recently applied to other cancers (28, 29). Using immunohistochemistry and TMA, we have analyzed the expression of 26 proteins—selected for their relevance in breast cancer and availability of the corresponding antibody—in a retrospective panel of more than 1,600 cancer samples from 552 patients with early breast cancer. Classification of samples based on this multidimensional data set was first done using classic hierarchical clustering. We then developed a supervised method that further improved the prognostic classification.

Abstract Breast cancer is a heterogeneous disease whose evolution is difficult to predict by using classic histoclinical prognostic factors. Prognostic classification can benefit from molecular analyses such as large-scale expression profiling. Using immunohistochemistry on tissue microarrays, we have monitored the expression of 26 selected proteins in more than 1,600 cancer samples from 552 consecutive patients with early breast cancer. Both an unsupervised approach and a new supervised method were used to analyze these profiles. Hierarchical clustering identified relevant clusters of coexpressed proteins and clusters of tumors. We delineated protein clusters associated with the estrogen receptor and with proliferation. Tumor clusters correlated with several histoclinical features of samples, including 5-year metastasisfree survival (MFS), and with the recently proposed pathophysiologic taxonomy of disease. The supervised method identified a set of 21 proteins whose combined expression significantly correlated to MFS in a learning set of 368 patients (P < 0.0001) and in a validation set of 184 patients (P < 0.0001). Among the 552 patients, the 5-year MFS was 90% for patients classified in the ‘‘good-prognosis class’’ and 61% for those classified in the ‘‘poor-prognosis class’’ (P < 0.0001). This difference remained significant when the molecular grouping was applied according to lymph node or estrogen receptor status, as well as the type of adjuvant systemic therapy. In multivariate analysis, the 21-protein set was the strongest independent predictor of clinical outcome. These results show that protein expression profiling may be a clinically useful approach to assess breast cancer heterogeneity and prognosis in stage I, II, or III disease. (Cancer Res 2005; 65(3): 1-13)

Introduction Adjuvant systemic therapy has a favorable impact on survival in patients with breast cancer (1, 2). Despite the establishment of standardized histoclinical criteria (consensus conferences of NIH

Q3

Note: J. Jacquemier and C. Ginestier contributed equally to this work and should be considered as first authors. Supplementary data for this are available at Cancer Research Online (http:// cancerres.aacrjournals.org/). Requests for reprints: Daniel Birnbaum, UMR599 Institut National de la Sante´ et de la Recherche Me´dicale, 27 Boulevard Leı¨ Roure, 13009 Marseille, France. Phone: 33-4-9175-84-07; Fax: 33-4-91-26-03-64; E-mail: [email protected]. I2005 American Association for Cancer Research.

www.aacrjournals.org

1

Cancer Res 2005; 65: (3). February 1, 2005

Cancer Research

linear range and allowed the extinction of the negative control and the persistence of the positive control (Supplementary Fig. 1). In addition, the dilution took into account the expected topography of the immunostaining (nucleus, cell membrane, and cytoplasm). If signal-tobackground ratio was not acceptable for the dilution, the pretreatment or experimental conditions were readjusted. After staining, slides were evaluated by two pathologists (J.J. and E.C.J.). Results were scored by the quick score as previously done (30), except for ERBB2 status, which was evaluated with the Dako scale (HercepTest kit scoring guidelines). For each tumor, the mean of the score of a minimum of two core biopsies was calculated. Discrepancies were resolved under the multiheaded microscope. For methodologic reasons, quick scores (range, 0 to 300) were reformatted (positive-negative score) into a format suitable for both unsupervised (21) and supervised analyses. We chose a uniform and clear cutoff value of Q >0 for all antibodies, except for CCND1 (Q >10; ref. 32), MIB1/Ki-67 (Q >20, low and high; ref. 33), and ERBB2 (low, 0/1+; high, 2/ 3+) to facilitate inter- and intralaboratory reproducibility of the results and also to take into account the classic prognostic cutoff.

Materials and Methods Patients and Samples

ST1

More than 1,600 breast tumor specimens were studied using TMAs. They represented invasive adenocarcinomas from 552 consecutive patients with early (stage I, II, or III) breast cancer treated in our institution between October 1987 and December 1999 and with sufficient tissue available for TMA. The stage of disease was defined according to the tumor-node-metastasis classification (Union International Contre Cancer, 5th edition). Histologic types included ductal carcinomas (70%), lobular (13%), mixed (4%), tubular (8%), medullary (1%), and other types (4%). Their characteristics are summarized in Supplementary Table 1. The median age of patients was 60 years (range, 25 to 94). Women were treated according to guidelines used in our institution: all had primary surgery that included complete resection of tumor (modified radical mastectomy, 28%; lumpectomy, 72%) and axillary lymph node dissection; 96% (including all patients treated with breast conservative surgery) received adjuvant local-regional radiotherapy; 47% received adjuvant chemotherapy (anthracyclin-based regimen, most cases), and 42% adjuvant hormone treatment (tamoxifen, most cases). After completion of treatment, patients were evaluated at least twice per year for the first 5 years and at least annually thereafter. The median follow-up was 57 months (range, 2 to 182) after diagnosis for the 450 patients who did not experience metastatic relapse as a first event and 37 months (range, 4 to 151) for the 102 patients with metastasis as first event. The 5-year metastasis-free survival (MFS) rate was 80% [95% confidence interval (CI), 76.2-83.7]. This study was approved and executed in compliance with our institutional review board.

Data Analysis Expression profiles were analyzed by both unsupervised and supervised methods. First, we applied hierarchical clustering. Data was reformatted as follows: 2 designated negative staining, 2 positive staining, missing data was left blank in the scored table. We used the Cluster program (average linkage, Pearson correlation). Results were displayed with TreeView (34). Second, we did supervised analysis to identify the protein set that best distinguished between two classes of samples with different survival. The classifier was derived through learning on a subset of samples (two thirds of population, learning set) and then validated on the remaining subset (one third of population, validation set). The assignment of samples to each set was random but preserved the ratio between tumors with and without metastatic relapse. There was no significant difference between the learning and the validation sets for each histoclinical parameter, treatment, and follow-up (data not shown). All combinations of 1 to 5 proteins, as well as the complementary combinations of 21 to 25 proteins, were systematically tested for their ability to classify tumors in two classes (‘‘poor prognosis’’ and ‘‘good prognosis’’) in agreement with their clinical outcome. An oriented random search through all protein combinations was also done and each combination encountered was tested in the same way (see Supplementary Material for more details). Using the protein expression scores of each combination, we defined a ‘‘metastasis score’’ that assigned to each tumor a probability to belong to the poor-prognosis or the good-prognosis class (see Supplementary Material for details). The best classifier protein set was that with the minimal rate of misclassified tumors. Once identified on the learning set, the prognostic power of the classifier was tested on the validation set by classifying the tumors using the same approach. For each tumor set, the prognostic impact was estimated by univariate analyses that compared the rate of metastatic relapses within the two molecularly defined classes of tumors (Fisher’s exact test).

Tissue Microarrays

Q4

TMAs were prepared as previously described (30). For each case, three representative areas from the tumor were carefully selected from a hematoxylin-eosin-safran–stained section of a donor block. Core cylinders (0.6 mm diameter) were punched from each of them and deposited into three separate recipient paraffin blocks using a specific arraying device (Beecher Instruments, Silver Spring, MD). In addition to tumors, the recipient block also received internal controls including 10 normal, breast tissue samples from 10 healthy women that underwent reductive mammary surgery and pellets from cell lines. Five-micrometer sections of the resulting TMA blocks were made and used for immunohistochemistry analysis after transfer onto glass slides. We previously showed the reproducibility of the method notably between multiple interpreters and its reliability by comparison with the standard immunohistochemistry on full sections (n test f0.95; ref. 30). This high degree of concordance was in the same range as published studies reporting that TMA constructed with three cores per sample are representative of whole specimen (17, 31).

Immunohistochemistry

T1

Q5

The selection of the 26 proteins to be tested was based on known or putative importance in breast cancer as prognostic/predictive marker, and availability and suitability of a corresponding antibody for paraffinembedded tissues (Table 1). They included hormone receptors [estrogen receptor (ER), progesterone receptor (PR)], subclass markers (CK5/6, CK8/18), oncogenes and proliferation proteins (EGFR, ERBB2, ERBB3, ERBB4, BCL2, CCND1, CCNE, Ki-67, FGFR1, Aurora A/STK6, TACC1, TACC2, TACC3), tumor suppressors (P53, FHIT), adhesion molecules (CDH1, CDH3, CTNNA1, CTNNB1, Afadin/AF-6), proteins from amplified genomic regions (ERBB2, CCND1, STK6), and markers identified in previous studies (GATA3, MUC1). Twelve of these proteins were recurrent among the discriminator genes identified in the RNA expression profiling studies that addressed prognosis in breast cancer (5–14). Immunohistochemistry was done on 5-mm sections of tissue fixed in alcohol formalin for 24 hours and embedded in paraffin as previously described (30), using LSABR2 kit in the autoimmunostainer (Dako Autostainer). Details are given in Table 1. The dilution of each antibody was established based on negative and positive controls and staining with a range of dilutions. For each antibody, the selected titer was in the

Cancer Res 2005; 65: (3). February 1, 2005

Statistical Methods Distributions of molecular markers and other categorical variables were compared using either the m 2 or Fisher’s exact tests. The follow-up was calculated from the date of diagnosis to the time of metastasis as first event or time of last follow-up for censored patients. The end point was the MFS, calculated from the date of diagnosis, first distant metastasis being scored as an event. All other patients were censored at the time of the last followup, death, recurrence of local or regional disease, or development of a second primary cancer. Survival curves were derived from Kaplan-Meier estimates (35) and compared by log-rank test. The influence of molecular grouping, adjusted for other factors, was assessed in multivariate analysis by the Cox proportional hazard models (36). Survival rates and odds ratios (OR) are presented with their 95% confidence intervals (95% CI). Statistical tests were two-sided at the 5% level of significance. All statistical tests were done using SAS version 8.02.

2

www.aacrjournals.org

SF1

Tissue Microarrays and Prognosis of Breast Cancer

Results

low expression of the other proteins included in the ER cluster but a strong expression of some proteins included in the proliferation cluster. This discrimination also existed at the biological level: the ER-positive A2 and B samples were more frequently PR-negative (P = 0.008; Fisher’s exact test) and ERBB2-positive (P = 0.001; Fisher’s exact test) than ER-positive A1 samples. Similarly, the ER-negative samples from A2 and B clusters differed by a stronger expression of the mitosis and of the proliferation cluster, including CK5/6, in B cases. Correlation also existed with grade; in cluster A1, 40% of cases were grade 1 and 16% were grade 3 compared with 21% and 45% in cluster A2, and 9% and 59% in cluster B (P < 0.0001; m2 test), respectively. Finally, B samples were more likely to be ERBB2positive (35%) compared with 9% in cluster A1 and 13% in cluster A2 (P < 0.0001, m 2 test). No correlation existed with age, pathologic size, axillary lymph node status, and peritumoral vascular invasion. Importantly, the tumor clusters correlated with survival. The 5year MFS was significantly different (P < 0.0001, log-rank test) between A1 (86%; 95% CI, 82.2-89.7), A2 (62%; 95% CI, 48.7-75.3), and B (64%; 95% CI, 51.2-76.7; data not shown). MFS also significantly differed between the ER-positive samples from A1 cluster and those from merged A2-B clusters (86% versus 52%, P=0.001, log-rank test). A similar trend was observed between the ER-negative samples from A2 cluster and those from B cluster, but was not significant (64% versus 66%, P = 0.67, log-rank test).

Protein Expression Profiling The expression of 26 proteins was studied by immunohistochemistry on TMAs containing more than 1,600 cancer specimens F1 from 552 patients with breast cancer and controls (Fig. 1A ). As expected, staining for all antibodies was homogeneous among the 10 normal breast samples, but more heterogeneous for tumors. Sixteen proteins were underexpressed in 6% (CK8/18) to 60% (Aurora A) of cases, and 10 were overexpressed in 11% (Ki-67/ MIB1) to 66% (ERBB4) of cases in cancerous tissues compared with normal samples (Table 1). Examples of staining are shown in Fig. 1.

Unsupervised Hierarchical Classification

F2

The overall expression patterns for the 552 samples were analyzed with hierarchical clustering. The algorithm orders proteins on the horizontal axis and samples on the vertical axis based on similarity of their expression profiles. Despite heterogeneous expression, such analysis and color display highlighted groups of correlated proteins across correlated samples (Fig. 2A ). Figure 2B displays the dendrogram of proteins. As expected, the two interpretations of ER staining made independently by two pathologists were highly correlated (R 2 = 0.90) and clustered together; there was a high degree of concordance between immunohistochemistry on full sections and on TMA (P < 0.0001, m2 test; Fig. 2C, top to bottom). Four major protein clusters were identified (Fig. 2B), including a cluster (designated ‘‘ER cluster’’) of ER-associated proteins (PR, BCL2, GATA3) and a ‘‘differentiation cluster’’ (E-cadherin, a1-catenin, afadin). We (37) and others (38) have showed that Aurora A (STK6) and Taxins (TACC1-3) are interacting partners and involved in cell division. This translated in the formation of a cluster designated ‘‘mitosis cluster.’’ The fourth cluster, designated ‘‘proliferation cluster,’’ contained the Ki-67/ MIB1 marker and other proteins preferentially overexpressed in highly proliferating tumors (EGFR, ERBB2, P53, CCNE). The combined protein expression patterns defined two major tumor clusters designated A (n = 471) and B (n = 81) in Fig. 2A. Cluster A was subdivided in two subclusters, A1 (n = 409) and A2 (n = 62). Globally, A1 tumors displayed a strong expression of the ER cluster and the differentiation cluster and a low expression of the proliferation cluster in most of cases, whereas the mitosis cluster was strongly expressed in f50% of samples. B tumors displayed overall a low expression of the ER cluster but a strong expression of the other protein clusters. A2 tumors displayed an intermediate profile characterized overall by a strong expression of the differentiation cluster, a low expression of the proliferation cluster and the mitosis cluster, and a low to strong expression of the ER cluster. We identified correlations between tumor clusters and biopathologic data. In each cluster, the most frequent histologic type was the ductal type; however, in cluster A1, 18% of samples were of the lobular type compared with 12% in cluster A2 and 7% in cluster B (difference not significant, P = 0.06; m2 test). Figure 2C (middle) shows, in cluster A1, a subcluster of 22 tumors that includes 18 lobular carcinomas with, as expected (39), low expression of E-cadherin. A1 samples were more likely to be ER positive (96% of cases) compared with 39% in cluster A2 and 7% in cluster B (P < 0.0001, m2 test). However, ERpositive and ER-negative cases were scattered across all three clusters, suggesting further heterogeneity among each class. For example, the ER-positive samples from clusters A2 (n = 24) and B (n = 6) were distinguished from ER-positive A1 samples by a

www.aacrjournals.org

Q1

Supervised Analysis We developed a supervised analysis method to search for smaller sets of discriminator proteins that might improve our prognostic classification. Analysis was conducted using two equivalent but independent (learning and validation) tumor sets. Identification and Validation of a Prognostic Protein Signature. The learning set (n = 368) allowed the identification of a protein expression signature that correlated with MFS. The number of proteins in the signature was optimized by iteratively testing all combinations of 1 to 5 proteins and the complementary combinations and by assessing their ability for correct classification of samples using a metastatic score. The optimal combination contained 21 proteins (Fig. 3C ). Samples were ordered using the metastatic score and sorted in two classes (poor-prognosis class, positive scores; good-prognosis class, negative scores). As shown in Fig. 3A, this classifier predicted rather successfully clinical outcome: 47 (37%) of 128 patients with positive score displayed metastatic relapse, whereas 21 (9%) of 240 patients with negative score experienced metastasis (OR, 6.1; 95% CI, 3.3-11.3; P < 0.0001, Fisher’s exact test). We then tested the prognostic impact of this multiprotein signature in the validation set (n = 184). The same threshold for the metastatic score identified two classes that strongly correlated with survival with 21 metastatic relapses out of the 61 patients (34%) in the poor-prognosis class and 13 (11%) of 123 patients in the goodprognosis class (OR, 4.4; 95% CI, 1.9-10.5; P = 0.0001, Fisher’s exact test; Fig. 3B). Interestingly, the two best combinations identified by alternative algorithms did not improve the discrimination. The signatures identified by bottom-up (13 proteins) or top-down procedure (15 proteins), respectively, included 10 and 14 proteins of the 21-protein signature, but done less correctly in the validation set. Altogether, these results validated the predictive capacity of our 21-protein signature. Examples of staining for these proteins are shown (Fig. 1B).

3

Cancer Res 2005; 65: (3). February 1, 2005

F3

Cancer Research

Table 1. Proteins tested by immunohistochemistry: antibodies, experimental conditions, controls, results in 552 early breast cancers deposited on TMAs and Kaplan-Meier analysis of the MFS Protein (acronym)

Anti body

Origin*

Clone

Pretreatment

Titer

Adhesion molecule Afadin (AF6)

Mmab

Transduction Laboratory

35

DTRS (40 min, 988C)

1/50

Aurora A kinase (STK6/STK15)

Mmab

C. Prigent, Rennes

/

DTRS (40 min, 988C)

1/25

Catenin, a 1 (CTNNA1)

Mmab

Zymed Laboratory

a CAT-7A4

Citrate buffer (40 min, 988C)

1/200

Catenin, h 1 (CTNNB1)

Mmab

Transduction Laboratory

14

Citrate buffer (40 min, 988C)

1/2,500

Anti-apoptotic BCL2

Mmab

Dako Corporation

124

Citrate buffer (40 min, 988C)

1/100

Cyclin D1 (CCND1)

Mmab

Zymed Laboratory

AM29

Citrate buffer (40 min, 988C)

1/200

Cyclin E (CCNE)

Mmab

Novocastra Laboratory

13A3

Citrate buffer (40 min, 988C)

1/50

Cytokeratins 5 and 6 (CK5/6)

Mmab

Dako Corporation

D5/16B4

DTRS (40 min, 988C)

1/10

Cytokeratins 8 and 18 (CK8/18)

Mmab

Zymed Laboratory

Zym5.2

DTRS (40 min, 988C)

1/200

E-cadherin (CDH1)

Mmab

Transduction Laboratory

36

Citrate buffer (40 min, 988C)

1/2,000

Epidermal growth factor receptor (EGFR)

Mmab

Zymed Laboratory

31G7

Pepsin (30 min, 378C)

1/20

Tyrosine kinase receptor ERBB2

Mmab

Novocastra Laboratory

CB 11

Citrate buffer (40 min, 988C)

1/500

Tyrosine kinase receptor ERBB3

Mmab

NeoMarkers

SGP1

None

1/40

Tyrosine kinase receptor ERBB4

Mmab

NeoMarkers

HFR-1

None

1/50

Estrogen receptor (ER)

Mmab

Novocastra Laboratory

6F11

Citrate buffer (40 min, 988C)

1/60

Fibroblast growth factor receptor 1 (FGFR1)

Rpab

Santa Cruz Biotechnology

Sc-121

DTRS (40 min, 988C)

1/200

(Continued on the following page)

Cancer Res 2005; 65: (3). February 1, 2005

4

www.aacrjournals.org

Tissue Microarrays and Prognosis of Breast Cancer

Table 1. Proteins tested by immunohistochemistry: antibodies, experimental conditions, controls, results in 552 early breast cancers deposited on TMAs and Kaplan-Meier analysis of the MFS Internal controls

Tumor IHC status

Negative

48

Positive Negative

300 267

Positive Negative

177 105

66.9 (56.8-77.0)

Positive Negative

267 152

84.9 (80.1-89.7) 72.2 (64.2-80.1)

0.0046

Positive Negative

229 88

82.1 (76.9-88.8) 57.6 (45.3-69.9)

0.031

Positive Low

324 380

83.9 (79.4-88.4)

2 cm, ER and PR negative, SBR grade 2-3, or age 1cm for NIH). The molecular classification compared favorably in terms of positive (PPV) and negative (NPV) predictive values for metastatic relapse. Respective rates were 36%, 21%, and 20% for PPV and 91%, 96%, and 91% for NPV. Sensitivity was 73% and specificity 67% [receiver operating characteristic (ROC) curve in Fig. 4G]. The protein expression signature kept its prognostic value in different subgroups of patients. It classified the 255 node-positive patients in two classes that correlated with survival. In the goodprognosis class, 27 of 161 patients experienced metastatic relapse as compared with 44 of 94 patients in the poor-prognosis class (OR, 4.4; 95% CI, 2.4-8.1; P < 0.0001, Fisher’s exact test; Fig. 4B). The same was true for the 292 node-negative patients: the OR for metastasis was 9.7 (95% CI, 3.8-27.7; P < 0.0001, Fisher’s exact test) among the 92 women from the poor-prognosis class as compared with the 200 women from the good-prognosis class (Fig. 4B). Interestingly, there was no difference for the rate of metastasis between the 161 nodepositive patients from the good-prognosis class and the 92 nodenegative patients from the poor-prognosis class (P = 0.10, Fisher’s exact test). When compared with St-Gallen and NIH classification

6

www.aacrjournals.org

Q6 Q7

Tissue Microarrays and Prognosis of Breast Cancer

Table 1. Proteins tested by immunohistochemistry: antibodies, experimental conditions, controls, results in 552 early breast cancers deposited on TMAs and Kaplan-Meier analysis of the MFS (Cont’d) Internal controls

Type of alteration in c tumor samples, frequency of alteration,c cell sublocalization

Tumor IHC status

Negative

69

Positive Negative Positive Low

353 170 268 406

High Negative

53 53

Positive Negative

390 383

Positive Negative

132 248

Positive Negative

207 185

Positive Negative

333 208

Positive Negative

231 107

Positive Negative

288 184

Positive

286

Negative

Positive

Cell line (BR-CA-MZ-02)

NBE

Down-regulated, 16%, cytoplasm

Stromal cells

NBE

Down-regulated, 39%, nucleus

Stromal cells

Lymphocyte, mitosis

Up-regulated, 11%, nucleus

Stromal cells

NBE

Down-regulated, 12%, cytoplasm and membrane

NBE

Cell line (HCC1937)

Up-regulated, 26%, nucleus

NBE

MEC in NBE

Down-regulated, 55%, membrane

NBE

NBE

Down-regulated, 36%, nucleus

Muscle

NBE

Down-regulated, 47%, cytoplasm

Liver

NBE

Down-regulated, 27%, cytoplasm

Kidney, liver

NBE

Down-regulated, 39%, cytoplasm

No. of b patients

5-Year MFS (95% CI)

Px

0.37 69.7 (61.9-77.5) 85.1 (80.3-89.9) 83.4 (79.2-87.5)

0.0006

56.0 (39.4-72.5)

20), grade (SBR 1, 2, 3), ER status (negative, positive), PR status (negative, positive), peritumoral vascular invasion (negative, positive), chemotherapy (delivery or not), hormone therapy (delivery or not), and each of the proteins (negative, positive) significantly associated with survival in univariate analyses. Results are shown in Table 2. Independent prognostic factors included the 21-protein signature, pathologic size of tumors, axillary lymph node status (when dichotomized, V3 versus >3), and Ki-67/MIB1 status. However, the 21-protein signature was the strongest predictor with a

Discussion Protein Expression Profiling Identifies Subclasses of Breast Cancer Four recent studies analyzed by unsupervised hierarchical clustering the expression of 15 proteins in 166 breast tumors (23), 13 on 107 (20), 7 on 97 (26), and 31 on 438 (27). Several of

Figure 2. Hierarchical clustering analysis of global protein expression profiles in breast cancer as measured by immunohistochemistry on TMA. A, graphical representation of hierarchical clustering results based on expression profiles of 26 proteins in 552 early breast cancer samples. Rows, samples; columns, proteins. Protein expression scores are depicted according to a color scale: red, positive staining; green, negative staining; gray, missing data. Dendrograms of samples (to the left of matrix ) and proteins (above matrix ) represent overall similarities in expression profiles. In the dendrogram, the length of branch between two elements reflects their degree of relatedness. Three major clusters of tumors (A1, A2, and B) are shown. Colored bars to the right and colored branches in the dendrogram indicate the locations of three sample clusters of interest, zoomed in C . B, dendrogram of proteins. Four major protein clusters are identified and designated Proliferation , Mitosis , Differentiation , and ER-related , respectively. ER-1 and ER-2 represent two interpretations of ER staining made independently by two pathologists. C, Expanded view of selected sample clusters showing a partial grouping of tumors with similar ER status (positive, red bar; negative, orange bar ) and similar histologic type (LOB, lobular; DUC, ductal; blue bar ).

www.aacrjournals.org

9

Cancer Res 2005; 65: (3). February 1, 2005

Cancer Research

Figure 3. Classification of 552 breast cancer samples based on the expression of the 21-protein discriminator set identified by supervised analysis. A and B, correlations between the molecular grouping based on the combined expression of the 21 proteins and the occurrence of metastatic relapse in the learning (A) and the validation (B) set of samples. C, supervised classification of all 552 samples using the 21-protein expression signature. Data matrix (left ); rows, samples; columns, proteins. Immunostaining results are depicted according to the color scale used in Fig. 1. The 21 proteins, listed above the matrix, are ordered from left to right according to decreasing DP . ER*, means of two independent ER analyses. DP, difference between the probability of positive staining and the probability of negative staining in nonmetastatic samples. Tumor samples are numbered from 1 to 552 and are ordered from top to bottom according to their increasing metastasis score (right ). Orange dashed line, threshold 0 that separates the two classes of samples, poor prognosis (under the line ), and good prognosis (above the line ). Middle, occurrence (n) or not (5) of metastatic relapse for each patient.

high number of grade 3 in cluster B as well as the high number of ERBB2-positive samples agreed with the frequent strong expression of the proliferation cluster (which included ERBB2) and the mitosis cluster. Conversely, 99% of cluster A1 samples were ER positive and showed a frequent strong expression of the ER cluster and low expression of the proliferation cluster (40). Although ER expression is a key factor in our classification, ERpositive samples and ER-negative samples displayed heterogeneous expression profiles with the identification of at least two subgroups in each category as recently reported in large-scale expression studies (7, 9, 20, 26). It is probable that the two ERpositive categories represent two distinct groups with different outcome. The same was true for the ER-negative samples. Thus, the grouping of tumors based on the expression of multiple proteins (including ER) was more powerful than ER status alone to tackle the heterogeneity of disease. The tumor clusters correlated with a phenotypic classification recently proposed (23, 42, 43). ‘‘Basal’’ cells (including

these markers were included in the present work, allowing for comparison of results. In our analysis, clustering identified four major coherent protein clusters. Some coexpressed proteins were previously reported in expression profiling studies. For example, ER, PR, BCL2, and GATA3 clustered together (7–9). This ER cluster was negatively correlated with the mitosis and proliferation clusters, in agreement with the higher proliferation index in ER-negative tumors (40) and the known proliferationdifferentiation balance. The ER cluster was close to the differentiation cluster, which included other markers previously shown to correlate positively with ER expression such as FHIT (41), CK8/18 (20, 23), and MUC1 (7). The proliferation cluster was very similar to that identified by others with the presence of P53, Ki-67, CCNE, ERBB2, and CK5/6 (20) or CCNE, ERBB2, EGFR, and CK5/6 (23). The clustering sorted tumors in three clusters that correlated with histoclinical data, including grade, ER, and ERBB2 status, in close agreement with their expression profiles. For example, the

Cancer Res 2005; 65: (3). February 1, 2005

10

www.aacrjournals.org

Tissue Microarrays and Prognosis of Breast Cancer

expression of CK5/6, CDH3, and proliferation markers (9, 20). A2 tumors, with an intermediate profile, may represent a transitory ‘‘basoluminal’’ stage, or tumors that have lost ER function. The significant differences in survival observed between these three clusters are consistent with this model (8–10). In addition, we show that lobular carcinomas are luminal-like tumors. Thus, clustering based on expression of multiple proteins identifies relevant subtypes of disease.

progenitors) express keratins CK5/6. In contrast, differentiated ‘‘luminal’’ cells express keratins CK8/18. Gene expression analyses using DNA microarrays have identified subtypes of breast tumors corresponding to this phenotypic classification (8–10). In our study, cluster A1 may be approximated to a cluster of luminal cell–like tumors, with frequent strong expression of ER and CK8/18. Cluster B may consist of tumors with basal/progenitor, ER-negative characteristics, namely, strong

Figure 4. Kaplan-Meier analysis of the metastasis-free survival and ROC curves for predicting the metastatic relapse of patients with breast cancer according to the molecular classification based on the 21-protein expression signature or the St-Gallen and the NIH consensus criteria. Patients (pts ) were classified in the good-prognosis or the poor-prognosis class using the 21-protein signature identified by supervised analysis (A, B, E, F ) or in the low-risk or the high risk class using the St-Gallen (C ) and the NIH consensus criteria (D ). P values are calculated using the log-rank test. A, survival of all 552 patients. B, survival of 292 patients with node-negative cancer (N) and 255 patients with node-positive cancer (N+). The difference of survival is significant between the good-prognosis class and the poor-prognosis class for the node-negative patients, as well as for the node-positive patients. In contrast, survival is not significantly different between the node-positive patients from the good-prognosis class and the node-negative patients from the poor-prognosis class. C, survival of 292 patients with node-negative cancer (N) according to the St-Gallen criteria. D , survival of 292 patients with node-negative cancer (N) according to the NIH criteria. E , survival of 186 patients without any adjuvant chemotherapy (CT ) and hormone therapy (HT). F , survival of 133 patients who received adjuvant chemotherapy (CT) without hormone therapy (HT ). G and H, ROC curves showing sensitivity and specificity for the prediction of the metastatic relapse for the tumor classification based on the 21-protein signature for all 552 patients (G) and for all 292 node-negative patients (H ).

www.aacrjournals.org

11

Cancer Res 2005; 65: (3). February 1, 2005

Cancer Research

with St-Gallen and NIH classification in terms of PPV and NPV for metastatic relapse for all 552 patients as well as node-negative patients. This finding is of particular significance because f75% of node-negative patients candidate for adjuvant chemotherapy based on the St-Gallen/NIH criteria are currently thought to be unnecessarily overtreated. Our predictor assigned fewer patients to the poor-prognosis class, and their clinical outcome was more frequently unfavorable than it was for patients assigned to the high-risk class defined by St-Gallen/NIH criteria. Our predictor also did well in patients irrespective of ER status, suggesting it provides more accurate clinical information than ER status alone, possibly reflecting functional differences in the ER or interacting pathways. Our classification conserved its predictive impact for patients independent of adjuvant therapy. The results obtained in the group not exposed to systemic therapy suggest a true pure prognostic value, whereas those derived from the chemotherapy-treated group might be prognostic and/or reflect response to therapy. Thus, the 21-protein signature might facilitate the selection of appropriate treatment options. It may be an important clinical tool to circumvent unnecessary, toxic, and costly treatment of nodenegative patients, and it may help for selecting, among patients who need adjuvant chemotherapy, those who might benefit from standard protocol and those who would be candidates to other therapy. Both hypotheses will require additional retrospective and prospective studies.

Table 2. Cox proportional hazards multivariate analyses in metastasis-free survival (n = 552) Variable

Hazard ratio (95% CI)

Molecular classification (21-protein set) Good-prognosis class 1 Poor-prognosis class 2.96 (1.77-4.97) Tumor size (mm) V20 1 >20 2.87 (1.61-5.13) Axillary lymph node metastasis V3 1 >3 2.08 (1.24-3.51) MIB1/Ki-67 status Negative 1 Positive 2.16 (1.20-3.88)

P value