Bioinformatics: Does the U.S. System Lead to Missed Opportunities in ...

29 downloads 106434 Views 73KB Size Report
Table 1. Number of Distinct Entities Advertising Positions in Science. Sector ... (preferably molecular biology) or computer science with considerable programming or ... sciences five to six years after receipt of degree for the period 1973-1995.
Bioinformatics: Does the U.S. System Lead to Missed Opportunities in Emerging Fields? A Case Study

Paula E. Stephan* and Grant Black Department of Economics School of Policy Studies Georgia State University

Published in Science and Public Policy, Vol. 26, No. 6, December 1999, pp. 382-392.. An earlier version of this paper was prepared and presented at a workshop on the Role of Human Capital in Capitalizing on Research, sponsored by the National Academy of Engineering and the National Research Council’s Committee on Science, Engineering, and Public Policy, The Beckman Center January 20-21, 1998 Irvine, California

*Phone: 404-651-3988; e-mail: [email protected] School of Policy Studies Georgia State University Atlanta, Ga. 30303 The authors have benefitted from comments by participants of the workshop as well as those of Mary Frank Fox and Bill Amis. We have also benefitted from the comments of William Zumeta and Charlotte Kuh. We wish to express our appreciation to Bill Agresti, Jim Brown, Sean Eddy, Warren Ewens, Dan Gusfield, Gene Myers, Gerald Selzer, and Judy Willis for their ready willingness to speak with us while we were writing this paper.

I. Introduction By all accounts the field of bioinformatics/computational biology is booming. The scientific press stresses the high salaries paid to new hires ($65,000 for persons with top masters training; $90,000 or more for Ph.D.s) and the intensity with which headhunters seek out possible candidates (Marshall 1996a, 1996b; Gershon 1997). Universities complain that their students are “grabbed” before they are able to complete their degrees and that their faculty and students are lured to industry, creating the concern that the bioinformatics field is “eating its seed” (Marshall 1996b; Wickware 1997).1 It is difficult to get a firm fix on the number of new positions in the field, in part because the small number of university producers means that many firms make direct contact with key departments rather than place an ad. Moreover, the area is so new -- and at this time, so small -- that it is not included as a field in any of the surveys of the scientific workforce. But evidence collected from position advertisements in Science suggests that the reports that demand exists and is growing should be taken seriously. 2 In 1996, 209 positions were advertised; in 1997 this had increased by 96% to 354.3 These counts include two special advertising supplements

1

For example, the bioinformatics staff at Johns Hopkins University’s Genome Data Base fell from 35 to 20 during Fall 1997 due to corporate recruitment (Kaiser 1998). 2

Science and Nature are the two scientific journals that consistently publish employment ads related to computational biology. Our index was computed by examining job advertisements in every issue of Science for the years 1996 and 1997. A position was counted if the ad specifically asked for a computational biologist or a bioinformatist or the position announcement explicitly mentioned experience in computational biology or bioinformatics. Counts are lower bounds of actual position announcements in Science because some advertisements do not state the specific number of position openings but instead indicate more than some specified number. In such instances the lower bound was recorded. Within each calendar year every effort was made not to count repeated ads for the same position. 3

This finding of growth is consistent with Yee’s report (1996) that from 1995-96 the number of ads related to bioinformatices tripled. 1

Figure 1

Number

Job Openings in Bioinformatics and Computational Biology from Science Ads, 1996 & 1997 80 60 40 20 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 1996

2

1997

Table 1

Number of Distinct Entities Advertising Positions in Science.

Sector

Firms

Not-for-profit Universities

Other not-for-profit

Total

Year 1996

1997

Number of distinct position

Growth between

(share)

(share)

announcements in ‘96 and ‘97

‘96 and ‘97 (%)

44 (62.8)

75 (63.6)

90

70.4

17 (24.3)

22 (18.6)

36

29.4

9 (12.9)

21 (17.8)

27

133

70

118

153

68.6

focused on biotechnology, one in June of 1996 the other in July of 1997.4 Both supplements were dominated by ads from SmithKline.5 Figure 1 charts the number of new ads by month. In a typical month (ignoring special supplements) the journal averaged 12 position announcements in 1996. This had more than doubled by 1997, rising to 25. Table 1 organizes the information in terms of type of entity placing the ad, rather than number of position announcements. Three categories are listed: firms, universities and other not-for-profits,

4

There was also a special supplement in July of 1996, apparently an addendum to the June supplement.

5

In the 1996 supplement SmithKline said that they wanted to about double their staff of 30. In the 1997 supplement they again said they wanted to about double their staff, this time reported at 40, suggesting that SmithKline has plans to grow and is experiencing difficulty filling positions in bioinformatics/computational biology. 3

including government.6 We see that the number of entities placing ads grew from 70 to 188 between 1996 and 1997, representing a growth of 68.6%. In both years the majority--about 63% of entities placing ads-- were firms, and the number of firms placing ads grew by 70%.7 The 90 distinct firms placing ads are listed in Appendix A. In addition to large firms, such as Bristol-Myers Squibb, Eli Lilly, SmithKline-Beecham, Pfizer, Merck, Abbott, Bayer, and Monsanto, a number of smaller biotech firms, such as Regeneron, Immunex, and Zenez, have position announcements. A substantial number of firms--29 to be exact--placed ads in both 1996 and 1997. In contrast, only 36 universities ran position announcements during the period, and the growth rate for university ads was slightly less than 30%. Universities placing ads included UCLA, UC Irvine, UCSF, University of Pennsylvania, USC, and California Institute of Technology. Only three universities placed ads in both years. A number of not-for-profit entities (such as the Center for Disease Control) also ran position announcements and the number of ads from this sector more than doubled. Based on the position announcements,8 jobs in computational biology range from entry-level data analysts and programmers to senior-level scientists and research directors. Lower-level positions that are more directly computer-oriented call for as little as an undergraduate science degree, and some state no degree requirements. The majority of positions call for a doctorate degree in either a science (preferably molecular biology) or computer science with considerable programming or bioinformatics experience, although a number of positions explicitly advertise for individuals with a masters degree. The recognition of computational biology as a distinct field is borne out by its growing impact on the scientific literature. A field that was nonexistent twenty years ago, computational biology has become a key component of a wide spectrum of biological research. Research articles involving computational biology are regularly published in more than ten journals, ranging from the Journal of Molecular Biology to Nature Genetics. The Journal of Computational Biology was launched in

6

Note that institutes that are affiliated with universities, such as Whitehead, are here counted as “other notfor-profit.” 7

Note: these are firm, not enterprise counts.

8

This analysis is based on 1996 ads only. 4

1994 by a group of researchers who felt the need to create a forum to specifically address scientific issues in the emerging field. Moreover, a number of annual conferences have evolved that focus specifically on computational biology, including the International Conference on Computational Molecular Biology that held its second meeting in New York in March of 1998. The International Society for Computational Biology was formed to unite members in the new field, giving further recognition to its distinction from other fields. The demand for computational biologists stems in large part from the immense amount of genetic data that is emerging as a consequence of the mapping of the human genome. Mapping and sequencing the human genome, a linear chain of about 3 billion pairs of nucleotides, is the greatest research endeavor to date in the biological sciences. Begun in October 1990, the Human Genome Project is producing an exponentially increasing amount of biomolecular data. It is estimated that from 1996-2000 the maximum sequence output per day will increase by 500 percent (Yap et al. 1996).9 Moreover, informatics is needed to analyze other types of data which are becoming available in the life sciences. For example, the field of neuroscience faces the daunting task of analyzing ten trillion brain neurons. The tremendous amount of life science data that are available, and the knowledge that this is but the tip of the data iceberg, means that biological researchers must increasingly rely on an interdisciplinary approach not only to succeed but merely to proceed with their research agendas. One result is that a large number of companies, both large and small, have begun to place heavy emphasis on the use of computational science and see modeling as a vehicle for drug development. Chris Rawlings, UK director for bioinformatics at SmithKline Beecham, states that bioinformatics is a necessary component of the company’s drug development future (Gavaghan 1997). This frenzy of hiring in computational biology comes at a time when the prospects of young life scientists look less than promising. Despite the “hot” reputation of the life sciences (and in part because

9

A skilled scientist using manual equipment can sequence in the magnitude of kilobases per day, but with the aid of an automated high-speed sequencing system output increases to a magnitude of megabases per day. 5

of its hot reputation), a large and increasing number of early-career life scientists are unable to find the type of job that will permit them to become independent researchers and establish their own lab. This is readily seen from Figure 2 which gives the proportion of biomedical Ph.D.s from U.S. institutions who report holding a post doctorate position for selective years. Three distinct cohorts are examined: those who have been out 1-2 years; 3-4 years; and 5-6 years. The results indicate that while the proportion of those with a postdoc who have been out 1-2 years has changed little in recent years, approximately 30 percent of those who have been out 3-4 years currently hold post doc positions, compared to approximately 7 percent 22 years ago, and an alarming 15% of those who graduated 5-6 years ago now hold postdoctorate positions--a rate that is more than five times higher than the rate in 1973 when the data first began to be collected.

Not only are individuals training

in the biomedical sciences staying longer in postdoctoral positions. If and when they move out of these positions, they are considerably less likely to hold a tenure-track position (at either a Ph.D. or nonPh.D.-granting institution) and increasingly more likely to hold non-tenure-track positions in academe. This is shown in Figure 3, which gives the sector of employment for Ph.D.s in the biomedical life sciences five to six years after receipt of degree for the period 1973-1995. Six categories are reported: tenure track, Ph.D.-granting institution; tenure track, non-Ph.D. granting institution; nontenure track (“other” academic position and post doctorate position); industry; federal labs and other government; and other (working out of science, “other jobs,” unemployed, and seeking and part-time employed). The changes over time are dramatic. In 1973 over 45% of Ph.D.s who had been out five to six years held tenure-track positions at Ph.D. granting institutions; 13% held tenure-track positions at non-Ph.D. granting institutions and 6% held non-tenure track positions. By 1995, only 21% of the fiveto-six year cohort held tenure-track positions at Ph.D. granting institutions, 9% held tenure-track positions at other institutions and 28% held non-tenure track positions. Sufficient concern over the career outcomes of young life scientists exists to have warranted the establishment by the National Research Council in 1995 of the Committee on Dimensions, Causes, and Implications of Recent Trends in the Careers of Life Scientists. The Committee issued its report in September 1998, stating that “The imbalance between the number of life-science Ph.Ds being produced and the availability of positions that permit them to become independent investigators 6

concerns the committee.” (National Research 1998, p. 4) The committee concludes that “Intense competition for jobs has created a ‘crisis of expectation’ among young life scientists (p.4). Much of this imbalance is the result of the professional structure of the life sciences research enterprise where “the important work of conducting experiments rests almost entirely on the shoulders of graduate students and postdoctral fellows.” (p. 4). Major recommendations of the committee include: (1) restraint of the rate of growth of the number of graduate students in the life sciences; (2) dissemination of accurate information on the career prospects of young life scientists by departments; (3) improvement of the educational experience of graduate students, facilitated by an increased number of training grants; and (4) the enhancement of opportunities for independence of postdoctral fellows. Furthermore, the committee recommends that graduate programs work to identify specific fields of the life sciences for which “masters degree training is more appropriate” (p. 87). Is it not contradictory that career opportunities in the biomedical sciences appear modest but yet the area of computational biology is booming? More to the point, why are there but nine doctoral programs in the United States in computational biology,10 while there are approximately 194 programs in biochemistry and molecular biology and over 100 in molecular and general genetics (Goldberger et al. 1995)? More generally, the contrast of the two fields leads one to ask if the structure of the U.S. science enterprise leads to missed opportunities in emerging fields, particularly when the demand is heavily centered in industry.

10

The nine programs (as of December 1997) are at Baylor College of Medicine, Carnegie Mellon University, George Mason University, Rice University, Rutgers University, University of Houston, University of Pennsylvania, University of Pittsburgh, and Washington University. 7

Figure 2

Fraction of Biomedical Life Science Doctorates in Postdoctoral Appointments in Academe, Industry, and Government (1973-1995) 60

50

Percent

40

30

20

10

0 1-2 Years Since Ph.D.

3-4 Years Since Ph.D.

5-6 Years Since Ph.D.

Interview Date 1973

Source: National Research Council (1998, pp. 168-169)

1977

1981

1985

1989

1993

1995

Figure 3

9

Percent of Biomedical Life Science Ph.D.s in Selected Sectors (5-6 Years Since Degree) 50 45 40 35

Percent

30 25 20 15 10 5 0 Tenure-Track Faculty Position at PhD Institutions

Tenure-Track Faculty Position at Other Institutions

Non-Tenure Track

Industry

Federal Labs and Other Government

Employment Sector 1973

1977

1981

10

1985

1989

1993

1995

Other

Source: National Research Council (1998, pp. 169)

11

This paper explores these topics. We are particularly interested in why universities appear slow to start programs in bioinformatics or computational biology and whether changes are needed in the incentive structure to encourage institutions to be more responsive to changing opportunities in the future. We examine four interrelated explanations of what appears to be a sluggish response by academe: (1) Individual faculty have no incentive to establish such programs. (2) The educational system responds differently when demand is driven by industry as opposed to universities and research labs. (3) The interdisciplinary nature of the field creates disincentives to the establishment of programs. (4) The quick fix--turning life scientists into computational biologists--is not possible, given the skills and quantitative abilities of individuals in the life sciences, nor is the incentive present for computer scientists to opt for additional training in the life sciences.

II. Lack of Incentive The research structure that has evolved in academe in the U.S. means that faculty are extremely responsive to research funding opportunities since it is external grants that provide resources to purchase equipment and support graduate students and post docs--the collaborators who are absolutely essential to the lab of the principal investigator (PI). Furthermore, at many medical schools in the U.S. the funding not only supports the collaborators--it also supports the PI, which means that the PI can only retain his or her academic appointment as long as the PI has funding to cover the cost of the lab and the PI’s salary. 11 This suggests that an effective way to alter the educational mix of graduate students is to alter the amount of research funds directed to an area and thus provide the incentive for faculty to recruit students into the field. To what degree has this occurred in computational biology/bioinformatics? The evidence, which is difficult to assemble due to its fragmented nature, is presented in Table 2. It is organized according to funding source, starting with federal agencies, and is subdivided into research support and training support. The table indicates that research grants are available through NSF and NIH. NSF targets computational biology research in part through its Computational Biology Activities (CBA) program within the Division of Biological Infrastructure. CBA typically has $7-8 million annually allocated for research support, part of which funds doctoral and post doctoral positions. The CBA program primarily encourages inter-disciplinary collaboration for research, workshops and 11

At Baylor College of Medicine, for example, 100 percent of the faculty in the biomedical department receive 80 percent of their salary from grants. 12

training in the field, with emphasis on improving computational methods and tools. NIH has funded some investigator- initiated projects in the area and Howard Hughes Medical Institute supports several investigators in computational biology. In only one instance, however, are the funds targeted to the field. Moreover, by the science-funding yardstick, the sums directed by the public and private sector to academic research in the area is a pittance. For example, at a time when NIH has supported more than 25,000 active research grants (RAGS) a year, only 96 R01 grants listed the key words “computational and biology” and only 11 R29s listed these key words.12 A similar statement can be made with regard to NSF CAREER grants. Of the approximately 400 active grants in 1996, only six appear to be directly related to the area of computational biology. Federal and not-for-profit agencies also provide training grants to universities. Such grants, as the title indicates, are designed to provide training opportunities to individual students rather than to support the research of faculty. Table 2 indicates that a number of training grants exist to provide support for study at the undergraduate, graduate and postgraduate level, but only the Sloan and W. M. Keck initiatives are targeted directly to the training of individuals in this field. In the other instances, training funds can be used for computational biology but are not targeted directly to individuals in the area. Universities that have particularly benefitted from these grants--and which to date account for most of the viable programs in computational biology--are Boston University, Carnegie Mellon University, University of Houston, Rice University, the University of California-Santa Cruz, the University of Pennsylvania, the University of Pittsburgh, and the University of Washington. Two multiuniversity centers have

12

The R01 grant is the long-standing, principal research project grant awarded by NIH for extramural research performed by principal investigators from academic or research institutions. The R29 grant was introduced in 1986 as the First Independent Research Support and Transition (FIRST) Award to support the extramural research of early-career and newly independent researchers who had no formal connections to established principal investigators; the R29 was discontinued in 1998. 13

Table 2 Extramural Funding Sources for Computational Biology Entity

Research

Training

Federal Agencies National Institutes of Health, including the National Library of Medicine (NLM) and its National Center for Biotechnology Information (NCBI) National Human Genome Research Institute

Grants of varying amounts available through the P01, R01, and R29 programs;1 (During the period 1995-97, 96 active R01 grants and 11 active R29 grants in computational biology)

Fellowships of $19,608-32,300 available through NIH institutes and centers to individuals and institutions for postdoc support in biomedical related research; postdoctoral fellowships of $19,608-32,300 per year from the NLM for any informatics field (3 awarded in 1997); grants available from the NCBI for the research involvement and training of visiting scientists1

Grants of varying amounts available for research, some of which were in computational biology 1

Grants for up to five years available to U.S. universities for developing training programs; grants available for predoc, postdoc, and senior (up to 50% of current salary) training fellowships; grants available for minority student and postdoc travel to meetings, workshops, or courses relevant to genomic science; support for attendance at a genomics issues course for minority-institution faculty1

National Science Foundation Analysis of Biological Systems Program

Grants available for research related to computational biology 1

CAREER Program

Grants available to junior academic faculty (in 1996, approximately 1.5% of all CAREER grants were in computational biology)1

Computational Biology Activities (CBA)

Grants (approximately 30 awards per year of about $100,000 each)

Computational Neuroscience Activities (CNA)

Grants available for research related to computational biology (since 1989, CNA has awarded 123 grants, some of which are in computational biology)1

Computer and Information Science and Engineering Directorate

Funds available for support of students involved in research (part of CBA’s $7-8 million annual research budget)

Annually, 15 two-year awards of $33,200-46,200 for postdoc research in computational science1

Database Activities in the Biological Science (DABS)

Grants available for research related to computational biology (since 1989, DABS has awarded 54 grants)1

Grand Challenge Program

Grants available for research related to computational biology 1

Grant Opportunities for Academic Liaison with Industry (GOALI)

Grants of $25,000-50,000 for up to one year available for faculty visits to industry or industry presence on campus for research; grants of unspecified amounts available for university-industry research collaboration (in 1997, around $25 million was awarded for all GOALI grants)1

Grants of up to $42,000 per year for up to two years available for postdoc research involvement in industry; grants of $20,000-25,000 per year available for graduate student research involvement in industry; grants of up to $75,000 per year available for a group of graduate students to be involved in industry research1

Integrative Graduate Education and Research Training Program (IGERT)

Approximately 20 grants of up to $500,000 per year for up to five years available to U.S. academic institutions for training graduate students; stipends of up to $200,000 for the first year are available for instrumentation and materials enhancement (IGERT replaced Research Training Grants; IGERT awards will begin in 1998)1

Research Training Group

Grants (discontinued program replaced by IGERT in 1998)1

U.S. Department of Defense/Office of Naval Research

Grants for research related to computational biology through the Bioengineering Program1

U.S. Department of Energy

see Alfred P. Sloan Foundation

Non-government Alfred P. Sloan Foundation

Approximately 10 postdoc fellowships of $100,000 for up to 2 years specified for computational molecular biology (10 were awarded in 1996 and only 7 in 1997)

Burroughs Wellcome Fund

5-6 institutional grants of up to $2.5 million over 5 years available for inter-disciplinary graduate and postdoc programs 1,2

Howard Hughes Medical Institute

Investigator appointments (currently fewer than 5 in computational biology)

From 1992-97, 80 predoctoral fellowships in biological sciences of up to $150,000 over 5 years (6 awards were for mathematical and computational biology)

National Center for Genome Resources

5 year, $8.5 million partnership to develop a biotechnology information facility at New Mexico State University (NMSU)

Offers training to unspecified number of graduate students and postdoctoral fellows at NMSU

Pfizer Corporation

Grants of $50,000-100,000 available for research related to complex data-set analysis 1

W. M. Keck Foundation

6 one-year fellowships for either predoctoral or postdoctoral training at the W. M. Keck Center in Pittsburgh

Not specifically targeted at computational biology. The 1996 award recipients were California Institute of Technology, La Jolla Interdisciplinary Training Program Consortium (University of California-San Diego, Scripps Research Institute, Salk Institute for Biological Studies, and the San Diego Supercomputer Center), the Program in Mathematics and Molecular Biology (a national research and training 1 2

consortium composed of 17 laboratories), and Rockefeller University.

been financed by the W.M. Keck Foundation. One is located in Houston, the other in Pittsburgh. 13 Although the evidence is somewhat sketchy, it suggests that federal and private agencies have placed their computational eggs in the training basket instead of the research basket. Signals imbedded in training grants are not as likely to be heard by individual faculty members as are signals imbedded in research grants for computational biology. Unlike research grants, training grants cover neither the cost of equipment nor the salary of the PI. Moreover, they encourage the trainee to move from one faculty member’s lab to another, a condition that further blunts the incentive for individual faculty to respond to training grants. This suggests that for training grants to be effective, they must signal a need to some collective body. But it may be increasingly difficult for academic units, composed of competitive PIs focused on where their next grant will come from, to engage in the collective response required to succeed in creating new programs.

III. The Educational System Responds Differently When Demand Is Driven by Industry as Opposed to Universities and Research Labs As demonstrated in the introduction, demand for computational biologists is substantially driven by industry, which sees genetic data as “the major driving force” in drug discovery (Marshall 1996b). SmithKline Beecham is a case in point. Their June 7, 1996, full-page ad in Science reports that they had 30 individuals working in the area with plans to double that number by 1997 ( p. 1527). Six months later, they ran another ad in Science, again saying that they had plans to double their staff, this

13

The Keck Center in Pittsburgh is a collaboration between the University of Pittsburgh, Carnegie Mellon University, and the Pittsburgh Supercomputing Center, while the Center in Houston joins Baylor College of Medicine, Rice University, and the University of Houston. In addition to the Keck Foundation, NSF’s research training program has also provided Rice University with support estimated to exceed $2 million between 1993-98. The University of California-Santa Cruz supports its computational biology group in part through funds from several federal agencies, including NSF and the Department of Energy. Boston University’s BioMolecular Engineering Research Center (MERC), a computational biology center within the College of Engineering, was established in 1985 under a five-year grant from the National Institutes of Health (NIH); NIH support for the program has since continued. (The Bimolecular Engineering Research Center began as the Molecular Biology Computer Research Resource at Harvard University under the direction of Dr. Temple Smith before relocating to Boston University’s MERC under Dr. Charles deLisi.) 16

time reported to be at 40. Moreover, SmithKline has aggressively hired established researchers from academe and the non-profit research sector. In 1995 they succeeded in attracting David Searls away from the University of Pennsylvania, and shortly after Searls’s arrival they hired James Fickett from the Los Alamos National Laboratory, Randall Smith from Baylor College of Medicine, and Chris Rawlings from the Imperial Cancer Research Fund in London (Marshall 1996a). Does it make a difference that the demand is industry-driven, as opposed to driven by academe? We are inclined to say yes for two reasons. First, every time industry hires a faculty member it means that there is one less professor to train future computational biologists. Thus, while the practice of recruiting faculty from academe provides a ready source of knowledge, and hence spillovers from academe to industry, the practice--where replacement is difficult--impairs academe’s capacity to continue the training initiatives it has already begun.14 The Baylor program reportedly experienced difficulty when Randall Smith left to join SmithKline, and, while the program at the University of Pennsylvania survived despite Searls’ departure, the remaining faculty were stretched as a result. Second, academic departments in the life sciences are arguably not as responsive to demand driven by industry as are departments in engineering and computer sciences, which have long had a tradition of placing a sizeable number of their graduates in industry. Few life science Ph.D.s head

14

Industry arguably knows that, despite the fact that bioinformatics is likely to be a foundation for the next generation of pharmaceuticals, it is eating its own seed. This raises the question of why industry is not doing more to replenish the crop. The “winner-take-all” nature of competition in pharmaceuticals and the rapid pace of discoveries in the pharmaceutical industry undoubtedly lead industry to offer high premiums for the seed to abandon universities to take jobs in industry. But the answer as to why industry is investing so little in training future researchers in the area may rest, not on the intensity of competition, but instead on habit. The large number of research grants that have flowed into biomedical research, and the ability of researchers to support postdocs and graduate students on these grants, has made for a steady supply of individuals entering the life sciences in the past ten years or more. To the extent industry had a problem during this period it was in convincing individuals to abandon their hopes of becoming independent investigators in academe, not in locating individuals trained in the biomedical sciences. Training, except for postdoc positions within their firms, has thus not been anything that the pharmaceutical industry has felt that it needed to foster. 17

directly to industry upon completing their Ph.D.s.15 The reason stems from the fact that it is research funding --much more than the availability of jobs for graduates-- that drives the size of academic programs in the life sciences. This is because research funding provides ready support for Ph.D. students in the life sciences and funding for the post doctorate positions that recent Ph.D.s (and not-sorecent Ph.D.s) hold with such proclivity. In the biotech world of the late twentieth century, life science departments have found a ready supply of aspiring students who are willing to commit eight to ten years of their life to becoming life scientists so that they can have a shot at becoming a PI to continue working on the frontiers of knowledge. And, while it is viewed as both honorable and profitable for established faculty to work with industry, the profession would appear to still stigmatize the individual whose early career goal is to work in industry.

IV. The Interdisciplinary Nature of Computational Biology Creates Disincentives to the Establishment of Programs Computational biology requires training in computer and information science, mathematics, and the life sciences. Coordination among these three fields can often be an institutional nightmare since it involves not only cooperation across department lines but also across colleges. The department of computer sciences is often located in a college of engineering while mathematics and life science departments are generally located in the college of arts and sciences. The situation is further complicated by the fact that universities that have medical schools often have an additional department of life sciences in the medical school. The problems in working across department lines are difficult enough when departments are within the same college. They are compounded when departments are in different colleges or universities. For example, how are students to be advised? How are courses to be numbered and shared? How are contributions to be valued across college/university lines? And these are the simple

15

In 1996, for example, the percentage of Ph.D. recipients with definite postgraduation plans for U.S. industry employment was 48.5 percent in engineering, 43.4 percent in computer science and a mere 4.7 percent in the biological sciences (National Science Foundation 1997b). 18

questions. The harder questions concern which department/college will get “credit” for the new field. How will resources be shared? Who will get the new positions if individuals trained in computational biology are hired? Who will evaluate individuals promotion and tenure?16 The fields also differ in terms of career goals and opportunities for students. Michael Ashburner, director of research at the European Bioinformatics Institute, argues that more resources should be funnelled into masters programs to provide uniform, specialized training since almost all those involved in bioinformatics come from another field (Gavaghan 1997). Yet, terminal masters programs have historically been unpopular in the life sciences, in part because the ready supply of Ph.D. students and postdocs provided the needed assistance in the lab and in part because the field often stigmatized those with a terminal masters degree.17 This stands in marked contrast to the fields of engineering and computer science where, a masters education is looked favorably upon and employment is found (and encouraged) in industry.18

V. Is there a Quick Fix? A plausible fix to the “shortage” of individuals in computational biology is to take young life scientists and turn them into computational biologists--or to take those with degrees in mathematics or computer information systems and augment their skills. Indeed, without a proactive strategy, this is what an economist would predict would occur. The number of postdoctoral grants offered in the area by Sloan, NSF and Burroughs Welcome suggests that they have adopted such a strategy.

16

The fragmentation of fields within institutions not only creates problems in meeting the demand to train students in new areas. It also has negative consequences for the productivity of science and the ability of an institution to respond to changes in science over time. Studies (Hicks and Katz 1996; Hollingsworth 1995; Katz et al. 1995) indicate that breakthrough research significantly benefits from intense interdisciplinary activities across fields. 17

Furthermore, it is commonly believed that Ph.D.s and post docs provide new ideas to the lab and that to replace them with permanent masters-level technicians would rob the lab of this important source of ideas. 18

Andy Brass, director of a masters-level bioinformatics course at the University of Manchester (UK), maintains that there is a wage premium for a masters in bioinformatics compared to molecular biology (Gavaghan 1997). 19

How effective is such a strategy/response? Several reasons lead us to suspect that it is less effective than might originally appear at face value. First, at the doctoral level the market for individuals trained in computer science appears to be sufficiently strong to retain computer scientists in that field. Table 3 reports the 1995 median annual salary of recent Ph.D.s employed full time in the broad areas of computer and information sciences and life and related sciences. The sub-category of biological and health sciences is also included. Three cohorts are identified: 1993-94 graduates, 1990-92 graduates and 1985-89 graduates. The large difference between salaries in computer and information sciences and salaries in life science for those who have been out for one to two years reflects the fact that a majority of individuals in the life sciences hold postdoctoral positions upon graduating. The difference narrows as the life scientists move out of these positions, but a substantial differential of 25% exists for those who have been out six to ten years. This suggests that job prospects in computer science are sufficiently strong to preclude computer scientists from seeking additional formal training in the biological sciences. The quick-fix strategy is more attractive to those trained in the biological sciences where the market, as we have already indicated, is considerably weaker. Table 3 (as well as figures 2 and 3) suggest that life scientists may have the incentive to seek additional training to become computational biologists. Do they have the background and aptitude to transform themselves into computational biologists? The response to the University of Pennsylvania’s training initiative in computational biology was remarkable. Over 200 individuals applied for the two postdoctoral positions. Yet, according to the faculty member who directs the program at the University of Pennsylvania, less than a handful qualified for the program precisely because the applicants had so little background in mathematics/statistics. There is reason to believe that this lack of quantitative background is generic to those with Ph.D.s in biology--not specific to the applicants to the University of Pennsylvania program. An examination of the requirements of five highly rated biology departments demonstrates that none have formal mathematical requirements for entry into their graduate programs; only a handful of graduate courses have a mathematics prerequisite up to introductory calculus.19 19

The five institutions reviewed are Harvard University, Johns Hopkins University, Massachusetts Institute of Technology, Stanford University, and the Universityof California-Berkeley. Three of the institutions state that they expect entering students to possess some mathematics knowledge, preferably at least introductory calculus. 20

Table 3 Median Annual Salary of FTE Recent Ph.D. Graduates, 1995

1993-94 Graduates

1990-92 Graduates

1985-89 Graduates

Computer and Information Sciences

$54,000

$61,000

$65,000

Life and Related Sciences

$30,400

$40,000

$52,000

$30,000

$38,600

$52,000

Biological and Health Sciences

Source: NSF/SRS, Characteristics of Doctoral Scientists and Engineers in the United States: 1995

It is not just that life scientists lack training in math and statistics. A credible argument can be made that the typical life scientist lacks interest and exceptional aptitude in these areas. This is somewhat borne out by data supplied by the Graduate Records Exam. Table 4 presents data on the scores of GRE test-takers from 1993-96 by their intended field of graduate study. The data indicate that individuals intending to pursue graduate study in the biological and health sciences test substantially lower in the quantitative area than those intending to study computer and mathematical sciences.20 While

20

It should be noted that this is for all test-takers intending to pursue graduate education, not those actually in graduate programs. Many of these test-takers will not receive admission into graduate programs, let alone leading programs in their intended field of study. A discussant suggested that the large differential may be due to the “Asian factor.” Specifically, Asian students score extremely well on the quantitative portion of the test and the fields of computer and information sciences and mathematical sciences attract a disproportionate number of Asian students compared to the life sciences. This Asian factor could lead to a lower mean test score among individuals intending to study in the life sciences compared to the other fields. This is undoubtedly true but, we suspect, does not explain away the differential. Although data are not reported for GRE scores by country of origin and intended field of study, some indication of the magnitude of the “Asian factor” is given by examining the scores by ethnicity and intended field for U.S. citizens. According to the GRE Board, the mean test score of Asian/Pacific American U.S. citizens intending to do graduate work in the life sciences was 590 compared to 694 for Asian/Pacific American citizens intending to do graduate work in engineering and 665 for those intending to do graduate work in the physical sciences (Educational Testing Service 1997b, p. 16). Making the heroic assumption that the test scores of the U.S. Asian population reflects test scores of Asians who are non-citizens, these numbers suggest that the lower quantitative scores in the life sciences are due at least in part to the fact that Asians 21

the mean quantitative score for biological sciences was 595, the score for computer sciences was 672-a difference of 77 points. Moreover, less than 21 percent of test-takers intending to enter the biological sciences achieved a score above 700 compared to 52 percent in computer and information sciences.

Table 4 GRE Scores by Intended Field of Graduate Study, for Seniors and Nonenrolled College Graduates, 1993-96

Intended Graduate

Mean Score

Field of Study

Percent of Test-

Percent of Test-

takers with Score

takers with Score

above 700

of 800

Biological Sciences

Verbal Quantitative

501 595

3.6 20.7

0.1 1.1

Health and Medical Sciences

Verbal Quantitative

449 515

0.7 5.8

0.0 0.1

Computer and Information Sciences

Verbal Quantitative

483 672

5.4 52.2

0.2 5.6

Mathematical Sciences

Verbal Quantitative

502 698

6.5 60.6

0.2 8.8

Source: 1997-98 Guide to the Use of Scores, Educational Testing Service, 1997

VI. Conclusion This paper explores four reasons why the current educational system appears to be sluggish in responding to the increased demand for individuals trained in computational biology. The first and

students who seek out training in the life sciences have lower quantitative scores than Asian students who go into engineering or the physical sciences. The same thing can be said for whites. Educational Testing reports that white U.S. citizens who intend to study in the life sciences have a mean quantitative score of 537--117 points lower than white citizens who plan to enter the physical sciences, and 152 points lower than white citizens who plan to enter engineering (p. 16). (The broad fields of engineering, life sciences and physical sciences are used in this note because scores are not reported by ethnicity for the narrower fields given in Table 4). 22

second reasons are interrelated. Specifically, we argue that the size and direction of Ph.D. programs in the life sciences are more responsive to signals embedded in funding opportunities for faculty research than to the signals provided by the job market for graduates. While this may appear perverse, it is the logical consequence of a research regime that places great emphasis on having doctoral students and postdoctoral students in the lab and can persist as long as there is an adequate supply of applicants. Such a supply has been forthcoming in the U.S. in recent years because of (1) the “hot” reputation of biotechnology; (2) the availability of immigrant scientists and (3) the ready supply of postdoctoral positions which permit graduate schools to provide placement for graduates. The third reason for the sluggish response relates to the interdisciplinary nature of computational biology. Given the fields involved (mathematics, computer science, and biology), collaboration typically requires working across college lines within a university. While this is not impossible, the bureaucracy and incentive structure of academe act to discourage cooperation across disciplines. Finally, we have argued that there is no “quick fix.” Individuals trained in computer science have few economic incentives to change their stripes by acquiring additional training in biology. And, if they did, the response would be far from quick since they would require a substantial amount of training in biology. In contrast, the dismal state of affairs for young life scientists means that the incentive is there for life scientists to augment their skills and become computational biologists. But for life scientists the path may be difficult since many lack both the mathematical training and the inclination to become successful computational biologists. Perhaps it is not surprising that the University of Pennsylvania has decided that the best way to meet the demand for computational biologists is to “grow their own,” offering undergraduate and master’s programs in computational biology in order to attract the “right” kind of mind and integrate the curriculum at an early stage.21 This response may also say something about which level of education in the United States is most responsive to changes in the demand for its graduates: the undergraduate/masters level or the doctoral level. Characteristic of exploratory research, we leave a number of questions unanswered. First and 21

Rensselaer Polytechnic Institute joined the sparse ranks of institutions offering undergraduate training, starting an undergraduate degree program in bioinformatics and molecular biology in the Fall of 1998 that is funded in large part by a $1.2 million grant from Howard Hughes Medical Institute for undergraduate education in the life sciences. 23

foremost is the question of whether bioinformatics/computational biology is really a field. Second, and related, is the question of whether this is but a flash in the pan or a sustained shift in demand for individuals with these skills. While the analysis of position announcements suggest it is more than a flash, as with question number one, it is perhaps too soon to know. Third, how does the situation in bioinformatics/computational biology differ from other fields, such as bioengineering, which is interdisciplinary, draws from the life sciences, and has had a remarkable amount of success in defining itself as a field.22 Without such benchmarks, it is difficult to know whether, by the academic yardstick, academe is responding sluggishly. Fourth, and perhaps most important, is why the research funding apparatus of the US has been slow to encourage PIs to do research in this area and hence has not encouraged training future students.

22

The National Research Council ranking of Ph.D. programs lists 32 doctoral programs in bioengineering (National Research Council 1995) and the Survey of Doctoral Recipients indicates that, by 1995, there were 2165 individuals trained in bioengineering in the U.S., compared to 173 in 1979, the first year that it was noted as a field in the survey. 24

Appendix A

Sources of Employment Ads in Science in 1996-97

Private, for-profit

1996

1997

Ceres

Abbott Laboratory

X

X

CIBA

Acacia Biosciences

X

Clontech

X

Acadia Pharmaceuticals

X

Corixa

X

Aeiveos

X

Affymetrix

X

X X

CuraGen

X

X

X

DEKALB Genetics

X

X

Alcon Laboratories

X

Digital Gene Technologies

X

American Type Culture

X

DNAX Research Institute

X

Dupont Merck

X

X

Eisai Research Institute of Boston

X

Applied Biosciences

X

Eli Lilly

X

X

Ariad

X

Exelixis

X

X

Collection Amgen

X

Arris

X

Astra

X

Genaissance Pharmaceuticals

X

X

gene/Networks

X

Astra, Arcus

X

GeneLogic

X

Astra, Bioinformatics Center

X

GeneMedicine

X

Astra, Boston

X

Genetech

Astra, Canada

X

Genetics Computer Group

X X

Astra, Draco

X

Genetics Institute

X

X

Astra, Hassle

X

Genome Therapeutics

X

X

Avigen

X

Genomed

X

Barlex

X

Glaxo Wellcome

X

X

Base4

X

X

Hoechst Marion Roussel

X

Bayer

X

X

Horst-Ariad Genomics Center

X

Bios Laboratory

X

Bochringer Ingelheim

Human Genome Sciences

X

X

Human Genome Services

X

X

Bristol-Myers Squibb

X

X

Immunex

X

X

Cadus Pharmaceuticals

X

X

Incyte Pharmaceuticals

X

X

Canadian Genomic Biotech

X

Jackson Laboratory

X

X

25

LeukoSite Lexicon

X

Searle

X

X

Sequana Therapeutics

X

X

X

X

LifeSpan

X

SmithKline-Beecham

Mendel Biotechnology

X

Structural Bioinformatics

X

Mercator Genetics

X

Synteni

X

Merck KgaA

X

Texas Biotechnology

X

Merck Sharpe and Dohme MetaXen

X

X

Wyeth Ayerst

X

X

X

Zeneca

X

X

X

ZymoGenetic

X

X

X

Microside Pharmaceuticals Millennium Pharmaceuticals

Versicor

X

Molecular Informatics

X

Molecular Simulations

X

Not-for-profit/Academic

X

Battelle

Nema Pharmaceuticals

X

Baylor College

X

Novartis

X

Biomedical Research Institute

X

Novo Nordisk Biotech

X

Boston University

X

Ontogeny

X

California Institute of

Onyx Pharmaceuticals

X

Parke-Davis

X

Centers for Disease Control

PE Applied Biosciences

X

Chinese University in Hong

Monsanto

Pfizer

X

X

X

QBI Enterprises

X

Qiagen

X

Technology X X

Kong

X

Pharmacia and Upjohn

X

City of Hope/National Medical

X

Center Clark Atlanta University

X

X

Regeneron Pharmaceuticals

X

Columbia University

X

Rhone-Poulenic Rorer

X

Cornell University

X

Ribozyme Pharmaceuticals

X

Ernest Orlando Lawrence

X

Roche

X

RW Johnson Pharmaceutical

Berkeley National Laboratory European Bioinformatics Institute

X

Research Institute Schering-Plough Research

X

X

Institute Scios Scriptgen

X

X

Florida State University

X

Fox Chase Cancer Center

X

Geneva Biomedical Research

X

Institute

X

George Mason University

X

26

X

Georgia Institute of Technology

X

University of California, Los

Harvard Medical School

X

Indiana State University

X

Iowa State University

X

Johns Hopkins University

Angeles University of California, San

X

University of Florida

X X

X

University of Houston

Los Alamos National Laboratory

X

University of Massachusetts,

Michigan State University

X

University of Michigan Medical

Molecular Research Institute

X

National Biotechnology

X

X

X

University of New Mexico

X

X

University of Pennsylvania

X

X

X

University of Southern California

X

University of Sydney

Biotechnology Information National Center for Genome

X

School

Information Facility

National Center for

X

Amherst

X

National Cancer Institute

X

Francisco

Karolinska Institute

Missouri Botanical Gardens

X

X

X X

University of Texas, Austin

X

University of Texas, Southwest

X

Medical Center

Research National Human Genome

X

Institute National Institutes of Health

X

University of Virginia

X

University of Washington

X

Vanderbilt University

X

National Science Foundation

X

W. M. Keck Center

Oak Ridge National Laboratory

X

Washington University

X

Pomona College

X

Whitehead Institute

X

Rochester Institute of

X

Technology Sandia National Laboratory

X

Sante Fe Institute

X

Scripps Research Institute

X

South African National

X

Bioinformatics Institute Stanford University

X

Stine-Haskell Research Center

X

The Ohio State University

X

University of California, Irvine

X

27

X

X

X

References

Carmichael, H. Lorne. “Efficiency Wage Models of Unemployment: A Survey,” Economic Inquiry, 1990.

Educational Testing Service. 1997-98 Guide to the Use of Scores, New Jersey: Educational Testing Service, 1997a.

. Sex, Race, Ethnicity, and Performance on the GRE General Test. Educational Testing Service, 1997b.

Gavaghan, Helen. “Running to Catch Up in Europe,” Nature, Vol. 389, September 25, 1997, pp. 42022.

Gershon, Diane. “Bioinformatics in a Post-genomics Age,” Nature, Vol. 389, September 25, 1997, pp. 417-18.

Goldberger, Marvin L., Brendan A. Maher, and Pamela E. Flattau (eds.). Research-Doctorate Programs in the United States: Continuity and Change, Washington, D.C.: National Academy Press, 1995.

Hicks, D. and J.S. Katz. “Science Policy for a Highly Collaborative Science System,” Science and Public Policy, 23, 1996, pp. 39-44.

Hollingsworth, Rogers. “Major Discoveries and Biomedical Research Organizations: Perspectives on Interdisciplinarity, Nurturing Leadership, and Integrated Structure and Cultures,” prepared exclusively for Interdisciplinarity Project and to be published by University of Toronto Press, 1998.

28

Holmstrom, Bengt. “Contractual Models of the Labor Market,” American Economic Review, Papers and Proceedings, 61, May 1981, pp. 308-13.

Kaiser, Jocelyn (ed.). “Hopkins’s Genetic Database to Close,” Science, vol. 279, January 30, 1998, p. 645.

Katz, J.S., D. Hicks, M. Sharp, and B. R. Martin. The Changing Shape of British Science. Brighton: Science Policy Research Unit at the University of Sussex, 1995.

Lazear, Edward. “Why is There Mandatory Retirement?,” Journal of Political Economy, 87, December 1979, pp. 1261-84.

_____. “Agency Earnings Profiles, Productivity and Hours Restrictions,” American Economic Review, 71, September 1981, pp. 606-20.

Marshall, Eliot. “Hot Property: Biologists Who Compute,” Science, June 21, 1996, pp. 1730-32.

_____. “Demand Outstrips Supply,” Science, June 21, 1996, p. 1731.

National Research Council. Survey of Doctorate Recipients. 1973-95.

. Research Doctorate Programs in the United States: Continuity and Change. Eds.: M. Goldberger, B. Maher, and P. Ebert. Washington, DC: National Academy Press, 1995.

.Trends in the Early Careers of Life Scientists. Committee on Dimensions, Causes, and Implications of Recent Trends in the Careers of Life Scientists. National Academy Pres, Washington, DC 1998.

29

National Science Foundation, Division of Science Resources Studies. Characteristics of Recent Science and Engineering Graduates: 1995, Detailed Statistical Tables, 1997a.

_____, Division of Science Resources Studies. Science and Engineering Doctorate Awards: 1996, Detailed Statistical Tables, 1997b.

Sobral, Bruno W. S. “Common Language of Bioinformatics,” Nature, Vol. 389, September 25, 1997, p. 418.

Stephan, Paula and Sharon Levin. “The Importance of Implicit Contracts in Collaborative Scientific Research,” presented at the Conference on the ‘Need for a New Economics of Science,’ University of Notre Dame, March 13-16 1997a.

_____. “The Critical Importance of Careers in Collaborative Scientific Research,” forthcoming in Revue d’Economie Industrielle, no. 70, First Trimester, 1997b, pp. 45-61.

Wickware, Potter. “Choices and Challenges,” Nature, Vol. 389, September 25, 1997, p. 420.

Yap, Ting K., Frieder Ophir, and Robert L. Mantino. High Performance Computational Methods for Biological Sequence Analysis, Boston: Kluwer Academic Publishers, 1996

Yee, Wendy. “The Top Five Career Trends of 1996: Informatics Anything,” http:\www.nextwave.org/server-java/SAM/pastloop/trend2.htm.

30