Irrational Exuberance in Clinical Proteomics - Clinical Cancer Research

0 downloads 0 Views 48KB Size Report
Nov 15, 2005 - Is the siren call of ''omics''-based tests luring researchers, physicians, patients ... lack of standardization of calibration and peak calling between.
Editorial

Irrational Exuberance in Clinical Proteomics Simon M. Lin and Warren Alden Kibbe

Is the siren call of ‘‘omics’’-based tests luring researchers, physicians, patients, and entrepreneurs onto the rocks of unproven and unapprovable diagnostics technology, or will the winds of technological change enable them to reach the beachhead safely? In the marketplace, patients and caregivers welcome minimally invasive tests enabling improved decision making and better outcomes. Thanks to rapid developments in high-throughput technologies, a single blood draw can be used to assess single nucleotide polymorphisms, build mRNA expression profiles, profile protein and metabolites levels simultaneously, at ever-reduced cost. The hunt for diagnostic biomarkers has also quickly moved from laborious hypothesisdriven techniques to high-throughput technologies resulting in biomarker development becoming an exercise in data organization and mining. However, the lure of lucrative financial return coupled with ‘‘early to market’’ strategies encourages rapid and risky investments at an early phase in the maturity of these technologies. This high-octane environment is very similar to previous booms of designing drugs through rational computation, identifying novel drug targets from genomic sequences and validating prostate-specific antigen as a biomarker for prostate cancer. The first two approaches have been remarkably unsuccessful in producing Investigational New Drug Applications. Although prostate-specific antigen has been approved by the Food and Drug Administration as a biomarker for prostate cancer, the debate over its value continues. The accompanying study by Hayashida et al. (1) represents a recent effort of evaluating the utility of proteomics in a clinical setting. History repeats itself in many different ways. By reflecting on the recent lessons learned in DNA microarray technologies and the application of microarray technology in the clinic, we can anticipate some of the problems that will arise in proteomics and perhaps avoid some pitfalls. Issues of measurement reproducibility, detection limit, small sample size versus high dimensionality, standardization of data representation, and experimental design and analysis are quite similar between DNA microarray experiments and proteomic experiments, and many of the solutions from microarray experiments are being successfully applied to proteomics. To commercialize a clinical test, the measurement must be reproducible across laboratories and the results are directly comparable regardless of instrumentation and personnel. Early microarray studies showed shockingly little concordance for measurements taken at different locations and platforms even when sample handling was highly controlled. After many

Authors’ Affiliation: Rober t H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, Illinois Received 8/10/05; accepted 8/10/05. Requests for reprints: Warren Alden Kibbe, Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL 60611. Phone: 312-6951334; Fax: 312-695-1352; E-mail: wakibbe@ northwestern.edu. F 2005 American Association for Cancer Research. doi:10.1158/1078-0432.CCR-05-1744

www.aacrjournals.org

years of technical improvements, a recent study has shown dramatically improved intralaboratory and interlaboratory reproducibility across microarray measurements (2). Similar reproducibility study has also been conducted using surfaceenhanced laser desorption/ionization time of flight platform at multiple laboratories (3). Similar to the early results with microarrays, variability attributable to instrumentation has been confounded by differences in data processing and analysis methods (4). Different algorithms, such as baseline subtraction, calibration, denoising, and peak finding, dramatically affect the interpretation of the raw instrument data. Given the current status of measurement reproducibility and lack of standardization of calibration and peak calling between instrument vendors, many researchers use proteomics in a discovery mode. These researchers view proteomics as a rapid screening tool for generating new hypotheses. Candidate proteins are selected for further evaluation using more traditional, lower-throughput techniques, such as ELISA assays (5). However, purifying and identifying proteins for further characterization from the peaks suggested by surface-enhanced laser desorption/ionization time of flight can be tricky, as discussed by Hayashida et al. (1). Moreover, after identifying the protein, there is no equivalent, universal mechanism for quickly validating protein abundance data similar to reverse transcription-PCR for measuring mRNA abundance in the DNA microarray world. Another important, unresolved issue in the application of mass spectrometry – based analysis of proteins is the quantification of the detection limit in a standardized, machineindependent manner. Again, taking the example from microarrays, it was the Latin-square spike-in data set from Affymetrix that was pivotal in the development of reproducible analysis algorithms and the establishment of confidence intervals on the reproducibility and reliability of measurements from Affymetrix chips. Results from studying this data set indicates that the detection of differentially expressed genes with low abundance at 0.125 to 0.25 pmol is challenging and defines the current detection limit (4). Thus far, we have not seen a similar, publicly available, Latin-square design experiment to characterize a protein mass spectrometry profiling machine. The establishment of the detection limit is of great interest, particularly in serum proteomics where the informative biomarkers might be circulating in very low abundance. As with the early days of microarray studies, the size of the cohort in proteomics studies is usually small, due to the cost of the measurements and/or the difficulty in procuring appropriate patient samples. A typical proteomics study may involve a few dozen samples and measure tens of thousands or even millions of variables. This ratio of samples to variable size is contrary to the traditional application of multivariate statistics, where the ratio between sample size and number of variables are suggested to be larger than 30:1. With such a small sample size and huge variable search space, the probability of finding associations by random chance is quite high even when

7963

Clin Cancer Res 2005;11(22) November 15, 2005

Editorial

analyzed at what are traditionally statistically stringent conditions. Appropriate cross-validation methods are necessary to reduce false positives that can lead to false optimism induced by overly optimistic predictions (6). Hayashida et al. (1) followed the paradigm of leaving 15 cases as an independent validation set from the 27 cases of training set. However, crossvalidation alone is not a panacea for small sample size because the variance of cross-validated error rate can be large enough to challenge its usefulness (7) and small studies are subject to other biases that are not understood or characterized in the individuals. There is a correlation between the sample size of the study and the reproducibility of findings in follow-up validation studies (7, 8). With the continued and rapid drop in cost of microarray-based experiments, we are now seeing microarray studies involving hundreds of patients. We expect to see similar trends in proteomics experiments, ameliorating this particular concern. Given the issues above, it is popular to conduct reanalysis or meta-analysis using raw data coming from other groups. Thus, the desire to share experimental data between research groups has resulted in the adoption of data standards, such as MIAME (9) and MIAPE (10). Unfortunately, these standards do not extend into clinical experiments. CDISC and HL-7 both have clinical data working groups with existing and proposed standards for clinical data elements but these efforts are not widely known by the omics research community. Also, CDISC and HL-7 are focused on data representation, not defining a ‘‘minimum useful data set’’ in the way that MIAME and MIAPE have done. Consequently, other than a verbal description in Materials and Methods section or a table in Results section of articles, no consistently detailed clinical data are captured and reported for clinical proteomics experiments, limiting the ability of investigators to independently verify or combine data from multiple experiments. For example, a recent breast cancer microarray study (11) was published as a later clarification of a potentially biasing effect of tumor size of patients (size is widely regarded as a primary characteristic of the tumor in cancer studies) that had not been included in the original

publication. For instance, in the Hayashida study (1), samples are annotated with gender, age, tumor location, and stage; however, no information is available on smoking history, Helicobacter pylori infection, and dietary factors that are known risk factors in esophageal cancer (12). Asymmetrical distributions of these risk factors among the small patient population in that study could, for instance, result in a remarkably different set of results and confound the attempt to generalize the results of small studies to broader populations. Omics studies generate data on a scale unprecedented in the traditional domain of biostatistics. Physician scientists have, therefore, consulted with computer scientists to handle these large data sets and seek interesting patterns with data mining methods. These teams have quickly found that building a reasonable classifier is not sufficient to interpret the data. The effect of study design, patient selection criteria, selection bias, and clinical utility/benefit must all be evaluated. This complexity warrants a joint effort by physician scientists, mass spectrometrists, clinical epidemiologists, biostatisticians, computer scientists, and bioinformaticists to address the complexity of proteomics. Inflated expectation fueled irrational exuberance in the financial market for Internet companies circa year 2000 and was followed by a bursting of the ‘‘Internet bubble.’’ In spite of this, the Internet has continued to change the way we conduct research and do business. Similarly, controversial news releases and debates of publications on clinical proteomics are challenges to reevaluate both the skepticism and the hype surrounding proteomics and lead to a better assessment of the clinical utility of proteomics. As reflected in the ‘‘Possible Prediction’’ portion of the title, the article by Hayashida (1) represents a timely report and a rational step toward a better assessment of clinical utility, given the many limitations a small, single-institution study represents. These and similar studies warrant a larger multi-institutional study to examine the reproducibility and robustness of the predictions using the current state of proteomic instrumentation and techniques.

References 1. Hayashida Y. Possible prediction of chemoradiosensitivity of esophageal cancer by serum protein profiling, this issue; 2005;11:8042 ^ 7. 2. Irizarry RA. Multiple Lab Comparison of Microarray Platforms. Johns Hopkins University, Dept. of Biostatistics Working Paper 71, http://www.bepress.com/ jhubiostat/paper71; 2004. 3. Semmes OJ, Feng Z, Adam BL, et al. Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 2005;51:102 ^ 12. 4. Cope LM, Irizarry RA, Jaffee HA,Wu Z, Speed TP. A benchmark forAffymetrix GeneChip expression measures. Bioinformatics 2004;20:323 ^ 31.

5. Howard BA, Wang MZ, Campa MJ, Corro C, Fitzgerald MC, Patz EF, Jr. Identification and validation of a potential lung cancer serum biomarker detected by matrix-assisted laser desorption/ionization-time of flight spectra analysis. Proteomics 2003;3:1720 ^ 4. 6. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 2003;95:14 ^ 8. 7. Ransohoff DF. Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 2004;4:309 ^ 14. 8. Ntzani EE, Ioannidis JP. Predictive ability of DNA microarrays for cancer outcomes and correlates:

Clin Cancer Res 2005;11(22) November 15, 2005

7964

an empirical assessment. Lancet 2003;362 : 1439 ^ 44. 9. Ball C, Brazma A, Causton H, et al. An open letter on microarray data from the MGED Society. Microbiology 2004;150:3522 ^ 4. 10. Orchard S, Hermjakob H, Julian RK, Jr., et al. Common interchange standards for proteomics data: public availability of tools and schema. Proteomics 2004;4:490 ^ 1. 11. Kopans DB. Gene-expression signatures in breast cancer. N Engl J Med 2003;348:1715 ^ 7; author reply 1715 ^ 7. 12. Lagergren J. Adenocarcinoma of oesophagus: what exactly is the size of the problem and who is at risk? Gut 2005;54 Suppl 1:i1 ^ 5.

www.aacrjournals.org