How to use indicators to measure scientific performance: a ... - CiteSeerX

0 downloads 0 Views 143KB Size Report
Mar 1, 2010 - patents, knowledge transfer). Yet, besides this ... This free- dom of choice ... astronomy is better in WoS (about 50 titles) than in ... introducing field dummies or by working with sen- ...... groups.pdf>, last accessed 1 March 2010.
Research Evaluation, 19(1), March 2010, pages 2–18 DOI: 10.3152/095820210X492477; http://www.ingentaconnect.com/content/beech/rev

How to use indicators to measure scientific performance: a balanced approach Ulrich Schmoch, Torben Schubert, Dorothea Jansen, Richard Heidler and Regina von Görtz

Scientific performance should not be measured by a one-dimensional metric such as publication, since it is a multi-dimensional phenomenon. A quantitative analysis of the activities of research groups in three scientific fields demonstrates in particular the importance of sufficient numbers of PhD graduates and of contributions to the infrastructure of the scientific community, in terms of editorships or memberships of boards, etc. The results of a quantitative analysis are largely confirmed by a parallel qualitative investigation; however, both approaches complement each other by highlighting different aspects. For example, the qualitative approach conveys explicitly the demand structure for intermediary and final outputs in the qualitative approach that interlinks the activities of different research units. The results show that it is important for science policy to set appropriate incentives for all dimensions of scientific activities, i.e. not publication output exclusively, as this entails considerable hazard of distortion, endangering the sustainability of scientific research.

I

N THE LAST DECADES, the situation in advanced industrial countries has been characterised by a strong trend towards knowledge-based industries, on the one hand, and an increasing scarcity of public funds for research, on the other hand. Building a knowledge-based economy obviously requires a strong science base. Thus, public policy has to consider appropriate ways to provide this base despite financial restrictions. The most obvious solution to this dilemma is to improve the efficiency of scientific research, but the methods for achieving this aim are debatable (Schubert, 2009). Consequently, many governments have launched various activities linked to what is known as new public management, which comprises various

Ulrich Schmoch (corresponding author) and Torben Schubert are at the Fraunhofer Institute for Systems and Innovation Research, Breslauer Strasse 48, D-76139 Karlsruhe, Germany; Email: [email protected]; Tel +49 721 6809114. Torben Schubert is also at the Institute for Economic Policy Research, Karlsruhe University, Germany. Dorothea Jansen, Richard Heidler and Regina von Görtz are at the German Research Institute for Public Administration, Speyer, Germany. Dorothea Jansen is also at the German University of Administrative Sciences, Speyer, Germany.

2

structural changes to the governance models imposed on the public science sector; among others collecting indicators on scientific research performance. To this end, the responsible policy and university institutions often use data which are already available for other purposes or which can be easily collected, while the appropriateness of these data is hardly examined. As a first step, it is necessary to define the dimensions relevant for scientific performance and then in a second step, to look for data or indicators which best describe these dimensions. However, the discourse on the different aspects of scientific performance is still ongoing, and the debate on adequate indicators is more controversial than ever. In this article, we present the results of an analysis of a data set of public research institutions, in particular universities, in Germany. In a survey, we collected a broad set of indicators on the different dimensions of scientific research suggested in the academic discourse on this topic, in order to validate their appropriateness. The second section starts with a brief general consideration of dimensions of scientific performance, the collection of referring indicators and potential impacts on the resource allocation of scientific groups. In the third section, after a short

0958-2029/10/01002-17 US$12.00 © Beech Tree Publishing 2010

Research Evaluation March 2010

Indicators to measure scientific performance

introduction to our dataset, we present various quantitative analyses. In particular, we will deal with the suitability of third-party funding (external funding) as a performance indicator, the multi-dimensionality of performance and the typical output profiles of scientific institutions. In the fourth section, we compare the grouping resulting from the cluster analysis based on quantitative questionnaire data with the self-concepts of the research groups that emerge in qualitative content analyses complementing qualitative interviews. In addition, we address the question whether the groups’ specialisation patterns are reflected in their collaboration strategies resulting from complementarities of academic work. The fifth section concludes. Scientific production and resource allocation The scientific production process has a complex structure which is shaped by technical and social influences. In the following, we will attempt to describe the main features of this process. We start with four theses, which will be explained subsequently: T1.Production in research has multi-input-multioutput characteristics. T2.The production process is (a) vertically and (b) horizontally integrated. T3.The returns on the outputs are not fully appropriable by the research groups and thus are partly externalised. T4.Specialisation advantages arise because of inherent abilities and learning effects. Thesis T1 is, at least implicitly, supported by many authors (Rousseau and Rousseau, 1997; Nagpaul and Roy, 2003; Warning, 2004; Johnes, 2006). Scientific production is a process in which manifold inputs (e.g. capital equipment, trained scientists) are transformed into various outputs (e.g. publications, patents, knowledge transfer). Yet, besides this widely recognised fact, scientific output generation must also be seen as a vertically integrated or even partly self-dependent production process (T2a). This can be illustrated by the training of new scientists who are an input to the scientific production process. However, at the same time, they are an intermediary output, as they are trained inside the system with the means of other input factors. So, types of output from different stages of the production process are combined. For new scientific knowledge which is a final output, on the one hand, and also an input for distinct research efforts, on the other hand, horizontal integration occurs (T2b). Other examples may be found in activities such as editorships which could be described as maintaining the scientific infrastructure. Thus types of outputs from the same stage of the production process are combined.

Research Evaluation March 2010

Besides the final output of research — new knowledge — numerous intermediate outputs are also created which serve as inputs in later stages. The system becomes self-dependent, because it relies to a large extent on inputs that have been created inside the system. Therefore, the efficiency of the system as a whole is based on its ability to provide an optimal mix in the activities driving the production agenda. These considerations direct attention to the incentives for the different activities inside the system and especially their appropriateness. Thesis T3 states that the returns on the produced output may not be appropriated completely by the producing groups. The correctness of this claim clearly depends on the output considered. For instance, it is well known in sociology that a very important reward for scientists is reputation (Merton, 1957, 1973; Luhmann, 1990). Meanwhile this view is also widely shared by other disciplines (see e.g. Dasgupta and David, 1994; Van Raan, 1997). In any case, reputation can best be achieved by conducting extraordinary research; it is usually not gained, for the production of intermediary outputs such as the training of young scientists, or only to a lesser extent. Additionally, as young scientists freely move between different research groups, most of the returns in form of capable scientists may be appropriated by others. Similar arguments hold for activities linked to academic self-organisation (e.g. working as dean, editing journals). Therefore, problems of incomplete appropriation of the returns are likely to be more severe, in the case of intermediate and infrastructural outputs, while they are less important for genuine research. This statement must be seen in the context of the social shaping of scientific production, especially the effects of the de facto self-deployment of scientists to specific tasks, that is, generally scientists can individually choose which tasks they want to perform (Krohn and Küppers, 1989). This freedom of choice in combination with the disincentives for the production of intermediate outputs may lead to an undersupply of such intermediary goods because of positive externalities. From a basic economic perspective, it might be most effective if research groups, like firms, specialise in those activities where they excel; so some groups will mainly conduct research, some will primarily train young researchers and others might focus on infrastructure activities such as editing journals or organising scientific associations (T4). Disincentives for intermediary outputs would imply a non-optimal specialisation pattern. Groups which are more effective in producing intermediary outputs may shift resources to research in order to gain some of the reputation-based rewards. At this point the incentives provided by indicator systems enter the scene. A well-balanced indicator system could potentially serve as a corrective for the unintended external effects linked to scientific production. If an indicator-based system does not

3

Indicators to measure scientific performance

provide appropriate incentives for intermediate outputs, it will enforce the latent trend towards their undersupply. In consequence, resource allocation based on indicators should enforce incentives for all types of outputs and comprise intermediate ones, thus imposing positive effects on financial resources and reputation. In the following section — after a short introduction of the data and design of our study — we will deal in depth with one specific indicator often used, third-party funding (TPF), and present an alternative (preliminary) model to measure performance by a multi-dimensional indicator system. Empirical discussion The data set

Three disciplinary subfields were chosen for this study according to a most dissimilar case design. The first divide is along the dimension of natural sciences (astrophysics, nanotechnology) vs. social sciences (economics). The second divide is along the dimension of basic (astrophysics, economics) vs. strategic research (nanotechnology) (see also Schmoch and Schubert, 2008). A bibliometric search provided a list of all German institutions which published at least one article in the selected fields in 2002 or 2003 as documented in the databases Web of Science (WoS) by Thomson Reuters and ECONLIT by the American Economic Association respectively. These institutions had to be disaggregated to the level of research groups with the help of directories and other information available in the web. We selected the research group as our unit of analysis, defined as the smallest unit in an organisation which conducts a more long-term research program. In the university this generally coincides with a chair. Our web search resulted in 122 astrophysics groups, 225 nanotechnology groups, and 56 microeconomics groups. After a validation with the help of experts from academia and funding institutions, samples were drawn for each field.1 Expert interviews with the leaders of these research groups were conducted in 2004 and 2005. All in all, 77 groups were interviewed (astrophysics: 25, nanotechnology: 27, economics: 25). Twenty of the participating groups were (public) non-university research groups affiliated to institutions such as the Max Planck Society or the Fraunhofer Society. The remaining 57 were located in universities. The interviews consisted of a semi-structured qualitative part and a network inventory. In addition, the interviewees were asked to fill in a standardised questionnaire on their groups’ input and output data (Wald, 2005; Franke et al, 2006). Bibliometric data on publications, citations and co-publications for the period 1998–2003 were collected at the level of the members of each research group and aggregated to a

4

bibliometric profile of each group. The WoS database was used for astrophysics and nanotechnology, and SCOPUS offered by Elsevier for microeconomics — with wider journal coverage for this field. Comparing the coverage of both databases for the disciplines, we find that in economics and related disciplines (such as management and business economics) there are 350 journals for WoS and about 700 for Scopus. Scopus includes many German policy-oriented journals (e.g. Wirtschaftsdienst, Perspespektiven der Wirtschaftspolitik) not indexed by WoS. The coverage concerning astrophysics and astronomy is better in WoS (about 50 titles) than in Scopus (about 40 titles). No such comparison is possible for nanotechnology, because there is no separate classification code in Scopus, even though there is one for WoS with 52 journals.2 The variables from the questionnaire and the bibliometric analysis are described in Table 1. Without going into detail, there is considerable heterogeneity in this data set. For instance, the largest research group comprises 150 scientists,3 while the smallest one contains only two. Some groups conduct 100% third-party research while others do none. A second observation concerns the heterogeneity across the disciplines. It became apparent that this heterogeneity is captured to a large extent by the disciplines (see Jansen et al, 2007, 2010). This is very important for the following analyses, since we have to deal with this condition suitably, either by introducing field dummies or by working with sensible normalisations (see below for both). Constructing a set of indicators for scientific performance

Third-party funding In many countries, indicators based on TPF received by a research unit are used in science evaluation schemes, where for example in Australia and Germany the importance of TPFrelated indicators received weights of 70% or above. The main justification is the suspected implicit evaluation of the reference projects (Hornbostel, 2001: 524; Wissenschaftsrat, 2005). In addition, their use is encouraged because TPF is quite easy to measure. In this context, Elton (1987) ironically stated that ‘whatever is easily measurable becomes a PI [performance indicator].’4 Unfortunately, the suitability of this indicator has not been proven comprehensively. For instance, Hornbostel (2001) finds a positive correlation between number of publications and TPF. However, he does not control for the trivial size effect of TPF implying larger research units. In a former study by Stolte-Heiskanen (1979), no obvious relationship became apparent. Apart from some theoretical arguments, we will give empirical evidence on the inappropriateness of TPF as a measurement of performance. For this purpose, it is helpful to develop a theory of minimal requirements for performance indicators.

Research Evaluation March 2010

Indicators to measure scientific performance Table 1. Input and output variables used in this study

Variable

Selfreported

Time period

Scale/unit

Mean

S.D.

Min

Max

No Yes No No Yes

1998–2003 2001–2002 1999–2001 1998–2003 2001–2002

Count Count Rational Percent Count

81.67 29.57 4.52 42.22 0.51

150.00 47.31 5.80 34.19 0.81

0 0 0 0 0

900 300 34 100 3

Yes Yes Yes

2001–2001 2001–2002 2001–2002

Count Count Count

0.25 1.94 3.84

0.86 3.47 4.86

0 0 0

5 17 20

Yes Yes Yes Yes

2001–2002 2001–2002 2201–2002 2001–2002

Count Count Count Count

3.10 0.49 0.58 2.29

3.36 0.79 1.44 3.24

0 0 0 0

15 3 10 16

Yes Yes

2001–2002 2001–2002

Real Percent

13.00 50.76

20.17 34.75

2 0

150 100

Research outputs Reputation-linked WoS/Scopus Publications Conference papers Received citations per WoS/Scopus publications Fraction of internationally co-authored WoS/Scopus papers Received professional job offers Transfer-linked Advisory services for companies Cooperation with companies Memberships in scientific advisory boards Maintenance-linked Doctoral theses State doctoral theses Editorships Conferred scholarships for research group members Research Inputs and structured variables Scientists in fulltime equivalents Research time spent on third party projects

Decision theory gives some advice on this topic. It should be clear that high TPF is not a fundamental objective of research, but rather — if at all — an instrumental objective. Thus third-party funds may only be regarded as a desirable goal, inasmuch as they contribute positively to the production of knowledge, which is the ultimate objective. Now, unless the instrumental goal is included for the sake of measurability (Winterfeldt and Edwards, 1986), decision theory argues that instrumental goals should be removed from objective hierarchies for mainly three reasons (Eisenführ and Weber, 2004). For further reading see also Keeney (1992): 1. It may be possible that the instrumental goal affects one fundamental goal positively, but another one negatively. 2. If both the fundamental goal and its corresponding instrumental goal are included in the evaluation, this implies a ‘double count’ of a performance dimension. 3. It is possible that the relation between the instrumental and the fundamental goal is either statistically instable or non-monotonous. Most criticisms concerning the use of TPF found in the literature correspond to one of the first two requirements. For example, Laudel (2005) argues that including both TPF and publications induces double counting, if some of the publications were made possible only by the external funds received. It is also possible that TPF emphasise the applied nature and therefore improve applied research, but are detrimental to basic research. In a different context, Mayntz (1998: 789) wrote that the tendency: to derive research tasks increasingly from actual production problems hinders science in discovering new chances of technological innovation.

Research Evaluation March 2010

Our empirical goal is to show that indicators based on TPF fail the third condition, that is, their relationship to at least one fundamental goal is non-monotonous. In practice, other arguments against TPF indicators are also raised. For instance, Geuna (1997, 1999) is afraid of incomplete markets — in particular monopolistic markets — for external funds that may emerge and cause severe problems, such as high entry barriers for new participants.5 As explained above, the objective of this section is to show that the functional relationship linking the instrumental goal with the fundamental one (in fact there may be several of them) is not monotonous. As a major final objective of research is to generate new knowledge, it seems adequate to measure this by the number of publications. Therefore, the empirical method is to determine the relationship between a research unit’s TPF and its publications. At first sight, a regression-based approach seems to be appropriate; however, there are at least two technical problems associated with this. First, publications are a count variable. Second, we cannot expect TPF to be exogenous in the regression on publications, because although the number of publications is (partly) driven by TPF, the TPF is also driven by publications. This is a likely hypothesis, because funding agencies (e.g. the Research Councils in Great Britain or the German Research Association [DFG] in Germany) give funds especially to groups which have successfully published in the past. We will have a severe simultaneity problem here, leading to inconsistent regression estimates. The first objection is easily solved by estimating a count regression model. Specifically, we use the quasi-Poisson model developed by Breslow (1984). The second problem can be tackled as well, but it is harder to cope with conceptually, because we will have to introduce instrumental variables into a nonlinear Maximum-Likelihood approach.6

5

Indicators to measure scientific performance

To define an appropriate model, we assume that besides the level of TPF, the disciplinary background affects the publication output as well as the group size and that the effect of third-party funding is non-linear.7 In summary, we employed the following structural model: E ( PUBS | X ) = exp(α 0 ⋅ ASTRO + α1 ⋅ NANO + α 2 ⋅ MICRO

+α 3 ⋅ ASTRO ⋅ TPF + α 4 ⋅ NANO ⋅ TPF + α 5 ⋅ MICRO ⋅ TPF +α 6 ⋅ ASTRO ⋅ TPF 2 + α 7 ⋅ NANO ⋅ TPF 2 + α8 ⋅ MICRO ⋅ TPF 2 + α8 ⋅ SCT )

(1)

wherein ASTRO , NANO , and MICRO are dummies which are 1, if the group belongs to astrophysics, nanotechnology or microeconomics respectively, and zero otherwise. TPF and TPF2 are the fraction of research time spent on third-party research and the corresponding squared term to allow for non-linearities, while SCT is the number of scientists belonging to the group.8 We already argued that TPF is probably endogenous, which naturally applies to the squared terms of TPF as well. Additionally, the number of researchers might be endogenous, too, since group size is likely to depend on bargaining processes between the research group and the management level. The outcome of these processes might be affected by past publication output, because researchers with high research reputations are likely to have high bargaining power as well. Then in total we have seven endogenous variables, for which we have to estimate reduced form equations. Therefore we needed to determine quite a large number of instruments: first, it is plausible to assume that variables increasing the reputation of a group also increase the chances of a successful application for TPF, as well as the bargaining power. Such variables are the number of editorships of scientific journals and the citation rates of a group. Second, we think that the institutional setting will influence both the TPF as well as the number of researchers to some degree. Therefore we used the dummies for Max-Planck groups and other nonuniversity groups as instruments, too. Third, the field dummies are considered to be exogenous and included in the structural model, since they are likely to influence the endogenous variables. Finally, because TPF enters in squared terms, then so might some of our instruments. Thus we included a couple of squared instruments and cross-products. The results of the estimation of the structural equation (1) are shown in Table 2.9 First of all, the overall fit of the model is very good. A version of the R² which works also for nonlinear models and also varies between 0 and 1 (compare Hayfield and Racine, 2008) is very high. As a remarkable outcome of this estimation, both the linear and the squared effect are highly significant for astrophysics as well as nanotechnology. Additionally, the squared terms have negative coefficients, while the linear ones are positive. This means that

6

Table 2. Regression results for influence of third-party funds on publication activity

Dependent Variable: Number of SCI (Astro/Nano) and Scopus (Micro) Publications 1998-2003 Variable

Coefficient

Std. error

Field dummies ASTRO NANO MICRO

0.3503 -1.2770 3.1201

2.4701 2.5123 6.0664

Third part funding effects ASTRO-TPF NANO-TPF MICRO-TPF 2 ASTRO-TPF 2 NANO-TPF 2 MICRO-TPF

0.1799* 0.1670** 0.3135 -0.0018** -0.0120*** -0.0083

0.0935 0.0668 0.3336 0.0008 0.0004 0.0097

0.0586*

0.0319

Size effect SCT Null deviance (df 66) Residual deviance (49 df) AIC 2 Nonlinear R N Note:

558.95 50.31 108.47 0.8247 70

*** significant at 1% level; ** significant at 5% level; * significant at 10% level

publication counts increase with higher fractions of third-party research for low values and decrease (also absolutely) beyond a field-specific critical value. In consequence, TPF on a moderate level increases publication activity, while too many thirdparty funds tend to a decrease therein. This effect cannot be shown for microeconomics, where both terms are statistically insignificant. This effect may occur for several reasons. The positive influence might result from higher resources, allowing for a critical level of research. Furthermore, some of the third party funds are targeted towards performing high-level research, such as those from the DFG. The negative effect might be due to higher transaction costs in growing groups as well as high acquisition costs that divert resources from research. In any case, it would be useful to know at what level this critical value actually is. It turns out by straightforward calculus that the critical level of TPF is given by an easy function of parameters of the model. Turning to equation (1), it can be shown that the optimal value for astrophysics and nanotechnology are: −α 3 / ( 2α 6 ) and −α 4 / ( 2α 7 ) .

For microeconomics any level is equally desirable because there is no stable statistical relationship. The question of estimation precision is somewhat more difficult to address. However, we provide the derivation of asymptotic normality-based confidence intervals as an application of the delta-method in Appendix 1. We present the results in Table 3.

Research Evaluation March 2010

Indicators to measure scientific performance Table 3. Optimal levels of third-party funds

Field

Est. optimal TPF

S.E.

Low. conf (95%)

Up. conf. (95%)

48.92% 67.02% Any

5.69% 12.44% NA

37.77% 42.63% NA

60.07% 91.41% NA

Astrophysics Nanotechnology Microeconomics

According to this analysis, the optimal share of third-party research is quite high for nanotechnology (67%) and somewhat lower for astrophysics (49%). For both fields, the confidence intervals do not entail the border solutions of 100% or 0%. Although the confidence intervals are quite wide, the existence of a field-specific optimal level for third party funding in astrophysics and nanotechnology can be confirmed. With regard to the use of third-party funds as a performance indicator, these analyses show their unsuitability. TPF — as an instrumental indicator — has a non-monotonous effect on at least one of the fundamental indicators (publication counts). Nonetheless, it may serve as a structural indicator, describing the financial and budgetary constraints of the research groups. Against the background of scarce public funds, and linked to that decreasing basic funding, TPF acquisition might be perverted to a fundamental goal, but this should not be confused with scientific performance. Determining the output dimensions In the previous subsection, we showed that one of the most prominent indicators in research evaluation, TPF, might be flawed. However, the question of appropriate approaches to describe scientific performance is not answered yet. As discussed in the second section, the performance of a research group is closely linked to its specialisation. However, specialisation in science is generally not explicit, as it is the outcome of a decentralised decision process. Therefore we can identify and describe specialisation empirically, but not by analysing formal business reports or reading door-plates. In this context, Laredo (1999) distinguished four major types of activities in laboratories in human genetics, that is, research training, academic activities, industrial activities and clinical activities and clustered the labs along their profiles. On this basis, he identified labs with ‘no marked’ involvement in specific activities (22%), ‘all embracing’ labs (33%), ‘socio-economic only’ labs (22%), and ‘scientific only’ labs (23%). So Laredo found labs without a clear profile, labs with a clear focus either on academic/scientific or on application-oriented human genetics research and only a limited share of labs which are strong in all types of activities.10 As a basic conclusion, scientific performance is multi-dimensional and cannot be described by a single indicator. Some of the dimensions primarily

Research Evaluation March 2010

refer to final outputs, in particular the academic and industrial ones, whereas research training may be characterised as an intermediate output, as discussed in the second section above. Against this background, we collected a variety of different indicators documented in Table 1. In addition to Laredo’s training dimension, we see a broader range of activities concerning the maintenance of the infrastructure of the scientific community. All in all, we collected 12 output measures, with five referring to the knowledge and reputationgenerating dimension: WoS publications (astrophysics, nanotechnology) respectively; Scopus publications (microeconomics) per researcher, citations per publication, conference articles per researcher, fraction of international co-publications, professorial job offers per researcher. Three indicators refer to interaction with business and governmental bodies (expert reports for companies per researcher, cooperation with companies per researcher, membership in advisory boards per researcher), and the remaining four dimensions refer to the maintenance dimension (number of doctoral theses per researcher, number of state doctoral theses per researcher, editorships per researcher, and scholarships per researcher). With these output measures we were able to empirically determine research groups which are similar in their activity profile. If research groups do specialise, we would expect them to have clearly distinct output profiles and to be able to be classified into typical groups or clusters. The empirical methodology followed two steps. First, in order to reduce dimensionality we extracted five factors out of the 12 output variables.11,12 Fortunately, these factors were easily interpretable and linked to typical dimensions of research performance. The first one can be described by high publication activity; the second one is characterised by high impacts of the publications. These two factors mainly reflect dimensions which can be ‘measured’ by bibliometric indicators. The third factor highly correlates with graduate teaching. The fourth one corresponds to keeping up scientific infrastructure as measured by activities such as editorships and memberships in scientific advisory boards. Finally, the fifth one is characterised by a focus on cooperation with industry. In the second step we clustered the research groups along these five dimensions,13 the resulting specialisation profiles of which are depicted in Figure 1.

7

Indicators to measure scientific performance 3.00 2.50

Scientific infrastructure (editorships, memberships adv. boards)

2.39

2.00 Graduate teaching (doctoral theses)

1.50

1.28

1.00 0.72

n = 20 0.54

0.50

n = 20

n = 10

0.6 1

n = 16

Publications and conference papers

0.00 –0.07 –0.19

Business cooperation (incl. advisory report

0.39 0.16

0.00 –0.20

–0.50 –1.00

–0.15

– 0.13

–0.25

–0.50

–0.20

–0.58 –0.89

–0.80

–0.73

Impact (citation rates, int. co-publication

–1.50 Networkers

Graduate teachers

Frequently publishing scientists

High impact publishing scientists

Figure 1. Activity profiles of research groups

As a major result of this analysis, it is possible to identify distinct profiles (Figure 1). Thus there is specialisation with regard to different academic activities. However, specialisation is never ideal in the sense of exclusive activities in one performance dimension. As a first important finding, the dimension of knowledge production in terms of publications is associated with two cluster groups which may be labelled as frequent (n = 10) and high-impact publishing scientists (n = 16). In the cluster of the frequently publishing scientists, the high specialisation on publications is dominant, and cooperation with business is moderately above average. In contrast, the level of citations is slightly below average. In the case of the high-impact publishing scientists, all other output dimensions are less prominent, in particular the engagement with the scientific infrastructure. As a simplified characterisation, the frequently publishing scientists could be associated with younger scientists who need a high number of publications for their academic career, whereby the quality or impact is less relevant. The typical highimpact publishing scientist is obviously a person with a strong focus on his or her research and all other performance dimensions are less relevant, so he or she represents the ideal picture of the introverted academic researcher. The dimension of external interaction is primarily fulfilled by groups of another cluster which we called networkers (n = 20). It is striking that, in this cluster, the dimensions of both scientific infrastructure and business cooperation exhibit a distinctly above-average specialisation. The people in this cluster can obviously be described as extroverted

8

with good social relations to all types of partners, not just academics. The remaining graduate teaching cluster shows a clear focus on graduate teaching and activities distinctly below average in business cooperation and publication impact (n = 20). Although no cluster follows only one performance dimension, the specialisation in primarily one dimension is obvious, so that the labels of the clusters are similar to those of performance dimensions. The major exception is the networkers’ profile, with strong activities in scientific infrastructure as well as business cooperation. It is further remarkable that only about 40% of the groups belong to the frequently or high-impact publishing scientists. Thus only 40% are specialised in the final output, that is, the production of new knowledge. For the construction of an indicator system this finding implies that neglecting intermediate outputs would result in an almost complete devaluation of the outputs of 60% of groups which specialise in the production of such intermediaries. However, the intermediate output producing theses is crucial for the mid- and long-term maintenance of the scientific infrastructure and thus for the production of relevant final outputs in the long run. As the production of intermediaries rarely implies an increase in reputation, which would provide incentives in its own right, an indicator set should cover these intermediaries to generate new incentives in material (additional funds) as well as symbolic (reputation) terms. As a follow-up to the present study, we examined a broader set of 473 research groups including the groups analysed in this article. An analogue factor and cluster analysis led to largely similar results

Research Evaluation March 2010

Indicators to measure scientific performance

(Schmoch and Schubert, 2009). The main difference is the identification of a large cluster — almost half of the sample — with flat profiles, that is, with no marked specialisation. This difference can be primarily explained by the selection procedure of the present sample, in which the advice of experts was sought, thus implying a bias towards better performing groups. So in a broader set, the share of groups without clear specialisation is relevant and it is necessary to reflect the assessment of these groups with reference to the specialised ones in more detail.14 Performance profiles of research groups, self-concepts and exchange of intermediary and final outputs In this part of the article, we ask whether research groups characterised by different activity profiles are aware of these patterns of specialisation. For one thing, we ask whether they perceive themselves as networkers, frequently publishing scientists, highimpact publishing scientists or graduate teachers. Next, we expect specialisation to follow a typical pattern of exchange and collaboration. Here we have to differentiate between different types of exchange. The demand for complementary resources can either be satisfied by an ex ante or by an ex post type of coordination, where by ex ante coordination we mean that it takes place before the actual research work, and by ex post coordination that it takes place afterwards. An example of ex ante coordination is a research leader who asks colleagues for a welltrained post-doc for a specific project (see also Thesis T2a). Ex post coordination takes place when journal editors ask reviewers for reports on manuscripts. They usually receive papers after the scientific production is done (Gläser, 2006: 295–298). What can be measured by semi-structured interviews is ex ante coordination as well as strategic networking in search of attractive partners for conferences, publication activities or the placement of post-docs. We expect here to find the typical pattern of homophily, that is, strong in-group relations. Particularly research groups with high reputations, that is, with a strong publication/citation profile, tend to look for partners from other highly reputed groups. In addition, a centre–periphery pattern is predicted, where highly reputed groups attract many collaboration offers from less reputed groups (Crane, 1969; Mullins et al, 1977; Hargens et al, 1980; Shrum and Mullins, 1988; Stokes and Hartley, 1989; Wagner and Leydesdorff, 2005). Most relevant for our question of long-term maintenance of research systems is the supply of intermediate goods. If high-impact or frequently publishing scientists do not carefully train graduate students themselves, they have to hire post-docs from outside for their research programmes. Thus they would exchange reputation (giving placement success to the graduate teachers) for well-trained

Research Evaluation March 2010

junior researchers. If high-impact publishing scientists are reluctant to organise conferences and do not engage in journal editing, they have to rely on colleagues who may ask them to contribute to a conference. They will have to count on editors to find knowledgeable referees who agree to review their papers. If these goods are not properly honoured and competition for funds increases further, research systems run the risk of losses because of a lack of human resources and collective infrastructure, because researchers are likely to act strategically on evaluation schemes. This might be true especially because we believe that researchers are aware of their profile, which is stated in the next thesis. T5. Leaders of research groups with a specific specialisation profile are aware of their profile. Concerning the question of exchange patterns we have two conflicting tendencies, the first being an incentive for exchange of complementary resources induced by complementarities and specialisation in research, the second being the phenomenon of homophily triggering in-group relations. The latter tends to be particularly strong for the publicationoriented clusters. From this we derive another two theses: T6a. Groups with a specialised profile display a typical demand-and-exchange pattern. In particular, we expect exchanges of intermediate goods between the publication-oriented profiles and the two other profiles, networkers and graduate teachers. T6b. Ex ante coordination and awareness of the need for coordination is expected for explicit collaboration looking for highly reputed partners. Demand for partners here tends to follow a pattern of homophily within the publicationoriented groups. In addition, a stratification effect is predicted, i.e. high-impact publishing scientists receive many offers from all other profiles. Data and operationalisation of the concepts

The data we present here are based on a qualitative content analysis of 77 semi-structured interviews with the leaders of the research groups. Next to their network strategies, specialisation in research and the changing governance of research at the micro level, including management of resources, was discussed in the interviews. Unfortunately, we did not particularly inquire about ways of recruiting personnel. Thus the focus of the respondents was on their research strategies and on their network strategies with respect to research. Another important point is that we did not find indicators for a self-concept marked by the task of technology transfer. In the cluster analysis of the

9

Indicators to measure scientific performance

output factors, the profile of ‘networkers’ was included in the dimension of technology transfer (business cooperation). In the interviews, however, interviewees rarely mentioned technology transfer as an objective of their work. They described it as a demand of funding agencies or the management of the university, and not as a vital role of the research group. Hence, it is not reflected in their self-concept and has not been coded as such. When technology transfer was mentioned, the interviewees referred to their science–industry collaborations and the problems connected with them (for details of this, see Wald, 2007; Jansen et al, 2010). This observation of the interviews agrees with the statistical findings, showing a very low level of cooperation with enterprises. On average, the number of cooperations per scientist was lowest in astrophysics, with a value of 0.001 and highest in nanotechnology with 0.18.15 Technology transfer is never dominant compared to other activities. However, the fact of business cooperation, even at a low level, may be a distinctive factor for a research group compared to other ones, so it was identified as relevant in the factor analysis. The awareness of the four different profiles was operationalised by anchor text sequences indicating a specific self-image. Table 4 gives the definition of the categories of content analysis and the anchor examples.

The coding of the categories was done in parallel by two coders. Divergent codes were discussed and resolved. The interview was scanned for the categories which are not exclusive. Per interview/research group we counted the number of times a category was mentioned, indicating a specific profile in order to measure its relevance in comparison to other tasks. These data were combined with the quantitative data file. In the same way, we defined categories for a qualitative content analysis of the exchange pattern of research groups with respect to five specialised intermediary or final outputs identified in the cluster analysis. With respect to networking, we found two different types of articulation of demand for networkers’ output. The first type relies on ex ante coordination in a face-to-face way. The other type is a more market-like coordination where demand is raised ex post by going to conferences or being a member of an association. Table 5 displays the definition of the categories and gives some anchor examples. Analysis of self-concepts and their relation to disciplinary origin and specialisation profile

In Figure 2 we compare the three disciplinary fields studied with respect to differences in self-concepts, where a preliminary word of caution is needed.

Table 4. Self-concepts of research groups

Definition of category • Organisation of conferences • Editor of journals or books • Reviewer for third-party funding

Networkers

Graduate teachers

16

• • • •

Anchor examples • ‘I feel obliged to support colleagues from the field, in order to give those from institutions less well endowed the chances to do good research’ • ‘And of course there is much coordination work to be done at the chair. Relations are important to acquire resources, get along with administration’

Doctoral students play important role in research • ‘I voluntarily engage in a colloquium for doctoral students even from colleagues, which is not Organises funding for doctoral students honoured by the university. But these are Organises ‘blue sky’ for doctoral students sometimes magic moments for me, while Mentoring of doctoral students, career planning teaching in general often can be boring’ • ‘An individual project is really an individual project; it takes a doctorate student a maximum of three years, that’s important to me. After three years they are finished. Some have even handed in their theses after two and a half years’

Frequently publishing scientists

• Publications and conference papers as important goal of research • Publications as main goal of external collaborations • Maximises number of publications per project • Numbers more important than quality

• ‘Yes, we indeed write papers because there is a chance for publication. Although I am not always convinced that this is the best use of my time’ • ‘Every single probe results in a publication. Whether this is really relevant may be questionable. But you got a publication’

High-impact publishing scientists

• Respect themselves as number one in the world • International orientation/international reference group • Research awards • Publication in top journals • Orientation towards WoS • Role of principal investigator, coordinator in larger projects

• ‘In such matters we are an important reference group. We are asked to do the computing for others’ • ‘Yeah, my group has really a high standing in publications and citations. This year we even had a paper in Nature’

10

Research Evaluation March 2010

Indicators to measure scientific performance Table 5. Coordination of exchange of resources

Definition of category

Anchor examples

Demand for networkers’ output, ex ante

• Search for colleagues who organise conferences and group/network meetings • Search for colleagues who take on administrative tasks

• ‘In 10 years I’ve never written a paper with them but … I would regard them as cooperation partners. We organise conferences together and things like that’ • ‘We have a management structure of sorts; that is, one person who manages everything, who helps us organise meetings and conferences, for example’

Demand for networkers’ output, ex post

• Emphasising the importance of going to conferences, fairs, etc. • Relying on referees and editors, on representatives of scientific associations, etc.

• ‘I met her at a meeting at the Astronomical Society ... You know, we met in front of a poster and she said: “Oh, yes I’m building a small telescope and I need to test it” and I said: “Bring it to [our lab].” So really you make most of these contacts personally in conferences’ • ‘Conferences are extremely important for meeting new people’ • ‘We meet at conferences regularly. Myself, I do not organise conferences. I am happy that others do that’

Demand for graduate students/ post-docs teaching, ex ante

• ‘N is my relevant partner at (UK university). One • Search for colleagues who organise research of my graduate students is over there. Actually I stays or further qualification opportunities for send all my graduates to him currently’ graduates • ‘When a real partnership emerges that is very • Joint supervision of graduates, involvement in valuable since you can easily exchange students graduate schools and graduates then’ • Search for colleagues who supervise own junior • ‘Not everyone is successful all the time. If you researchers (from non-university institutes) want to stay at the top you need to look for strong partners who have access to very good junior researchers. If you got a partner in “place in Poland” your chance to find good junior researchers is worse than when you have a partner at Oxford’ • ‘We need the students from the universities. At the same time our contacts with our colleagues at the university help us with state doctoral theses and such things’

Demand for frequently publishing scientists, ex ante

• Search for productive co-authors

• Answer to the question what is most important advantage of collaboration: ‘Joint publications’ • ‘My research naturally leads to exchange of people and international publications … This is not really a strategy. You just try to find partners who collaborate well. Successful collaboration results in joint publications’

Demand for high-impact publishing scientists, ex ante

• Search for collaboration with highly reputed/ranked researchers characterised by high-quality publications in top journals • Search for people who coordinate larger (EU) projects

• Answer to the question of why offers to collaborate are rejected: ‘Quality, mainly quality’ • ‘My field is not really broad but a very specific problem. I look here for the best one in the world to collaborate with, the most knowledgeable one’ • ‘It is usually a leading scientist … taking the lead in a project. This sort of principal investigator … assembles all the work … when it comes to publication … ready to publish data and disseminate all the information. So basically the PI is a sort of team leader’

When talking about self-concept here, this addresses what persons claim they think about themselves, not necessarily what they really think about themselves. Furthermore, even when the self-concept and the expressed self-concept agree, a self-concept does not necessarily reflect the reality, since people tend to overestimate their performance. In consequence, it is hard to state a priori whether the self-concept or the specialisation profile is more reliable, because the first may be subject to perception and expression biases, while the latter is potentially subject to measurement biases.

Research Evaluation March 2010

Looking at the agreement of the two approaches, it is striking that in accordance with our discussion in the second section above, the training of junior researchers is indeed seen as core academic business for the maintenance of the system. In all three fields graduate teaching is the task most often mentioned in the interviews. Astrophysicists are particularly strong in graduate teaching and this is in line with the proportion of research groups attributed to this cluster (41%) (Figure 3). There is less agreement on self-concept and objective data for the other two fields. The difference is particularly strong for

11

Indicators to measure scientific performance

3.00

Self-image ‘Networker’

Mean frequency of code

2.50

Self-image ’Graduate teacher’ 2.00

Self-image ‘Frequently publishing scientists’ 1.50

Self-image ‘High-mpact publishing scientists’

1.00

0.50

0.00

Astrophysics

Nanotechnology

Microeconomics

Figure 2. Self-concepts by academic field

nanotechnologists who consider themselves strong in graduate teaching, while actually only 14% of the cases are attributed to this cluster. The difference is smaller for microeconomics. According to the cluster analysis, 36% belong to the profile graduate teaching, while the strongest cluster is the networker profile (41%).17 Their image of themselves is the exact opposite compared to astrophysics and nanotechnology.

Overall, the ranking of the two bibliometric dimensions in the three fields in self-images corresponds to the results for the cluster affiliation. Obviously, there is some blurring between the measurement of self-concepts of frequently publishing scientists and high-impact publishing scientists. Microeconomists correctly perceive themselves least often as high-impact publishing scientists and stronger in frequent publishing. Indeed, the share of

50.00 45.00

Cluster networkers

40.00

Cluster graduate teachers Cluster frequently publishing scientists

Percentages

35.00

Cluster high impact publishing

30.00 25.00 20.00 15.00 10.00 5.00 0.00 Astrophysics

Nanotechnology

Microeconomics

Figure 3. Profile clusters by academic field

12

Research Evaluation March 2010

Indicators to measure scientific performance

frequently publishing scientists is even larger (18%) than in either nanotechnology or astrophysics (both 14%). Nanotechnologists most often belong to the high-impact publishing cluster (45%) and verbalise this role accordingly. The large difference in the share of high-impact publishing scientists in astrophysics (23%) and nanotechnology (45%) is probably due to the status of nanotechnology as a ‘new science’ (Bonaccorssi and Thoma, 2007) with almost exponential growth in mostly international coauthored papers and many new and divergent research subfields. In their self-perception, highimpact publishing is their most important activity next to graduate teaching. Figure 4 presents the cross-tabulation of selfconcepts by cluster affiliation. Again, we see that training young researchers is a universal value of the scientific community that all researchers are indebted to, at least in their self-concepts. Consequently, the clarity of specialisation that we saw in Figure 2 is much more blurred in Figure 4. Only the cluster of graduate trainers is clearly characterised by this self-concept. If we disregard the dominance of graduate teaching for the other clusters, there is also a correspondence of self-concept and cluster affiliation for the networkers with respect to the second largest bars in Figure 4. At the same time, they regard themselves also as strong impact publishing scientists with a little less intensity. Like all groups, both bibliometrically defined groups perceive themselves as stronger in impact publishing than in frequent publishing. The cluster of the frequently publishing scientists does not define itself only as frequently, but as high-

impact publishing scientists. This group even displays the lowest value for frequent publishing among all four clusters. To publish seems to be so self-evidently the genuine task of an academic that it is rarely mentioned in an interview. If publishing is talked about, this is with respect to high-quality publications. The dominance of this self-concept is not surprising, as not only it is self-serving but also objective impact is largely unobservable for a specific researcher unless he or she has access to bibliometric databases. Researchers of all clusters perceive themselves as graduate teachers, so that this self-concept does not differentiate between the clusters, and is likely to be due to overly positive self-presentation. Leaving this self-concept aside, we see that, in the cluster of networkers, the networking dimension is the most important one, that publication-related self-concepts are most important in the cluster of frequently publishing scientists, and that — even taking graduate teaching into account — the cluster of impact publishing scientists actually perceives itself as such. The dominant self-concept of the cluster of graduate teachers is graduate teaching itself. With respect to Thesis T5 we conclude that the degree of awareness differs by dimensions. The agreement between objective measurement and subjective perception is most pronounced in the cluster graduate teachers and high-impact publishing scientists. However, they are also present in the other two cases, at least ‘below the surface’. To investigate the overlap between self-concepts and the actual output indicators (the output indicators were reasonably normalised to account for the

3.00

Mean frequency of code

2.50

2.00 Self-image ‘Networker’

1.50

Self-image ‘Graduate teacher’

1.00

Self-image ‘Frequently publishing scientists’

0.50

Self-image ‘High-impact publishing scientists’

0.00 Cluster networkers (n = 20)

Cluster graduate teachers (n = 20)

Cluster frequent publishers (n = 20)

Cluster impact publishers (n = 20)

Figure 4. Self-concepts and affiliation to profile clusters

Research Evaluation March 2010

13

Indicators to measure scientific performance

Average number of times mentioned

disciplines and differences in group size), we performed a correlation analysis. Ideally we would expect self-concepts and the output measures forming the bases for the corresponding cluster groups (see Figure 1) to be highly correlated. Conversely, we should not see high correlations with outputs which are constituent for other clusters, since such correlations would indicate a systematic misinterpretation of one’s own role. To put it at the beginning, such a misconception was not observable from the data. However, the self-concept of networkers and the self-concept of frequently publishing scientists do not correlate with any of the corresponding output measures. Neither do we find a significant correlation of networkers with advisory services for companies respectively with editorships, nor is a scientist who perceives himself as a frequently publishing scientist characterised by above-average numbers of publications. However, there are significant correlations between the self-images of high-impact publishing scientists and copublications (r = 0.29, p < 0.05) as well as citations per publication (r = 0.34, p < 0.01). In addition, those with a high output in publications understand themselves as high-impact publishing (r = 0.28, p = 0.05). Graduate teachers by selfconcept indeed correlate with a high output of PhD theses (r = 0.27, p < 0.05).18 Thus, even though the fit between self-understanding and measurable outputs is not perfect, there still are some reasonable parallels. In the second part of the qualitative analysis we look for the relationship between specialisation profile and the demand for complementary resources. While self-concepts can only measure those interdependencies that researchers are aware of, the tabulation by cluster profiles measures ex post and ex ante coordination demands arising from specialisation and the differentiation of the tasks of the profiles.

Figure 5 shows the mean frequency of mentioning the demand for five types of intermediary and final outputs produced by different specializations for each self-concept. The self-concept type was measured by the number of remarks related to this selfconcept in the interview. Here we again use a dichotomisation of this variable (mentioned at least twice, mentioned once or not mentioned).19 As a consequence of this coding procedure, the subjective self-concepts are not exclusive, unlike the cluster profiles, that is, researchers may have more than one self-concept. All self-concept groups have a considerable demand for intermediary infrastructural outputs, such as journal editing, peer reviews or conference organisation (ex post coordinating information). In addition, there is a high demand for graduate students and post-docs from both publication-oriented profiles. Thus the general thesis of a cospecialisation can be corroborated. The level of use of information infrastructure is higher in the two publication-oriented self-concepts than among graduate teachers. Ex ante demand for infrastructural outputs, such as volunteers for advisory committees or for positions in academic associations, is much lower, except in the case of networkers among themselves. Networkers also have a high demand for informational input of the ex post type. Obviously, networkers know each other and exchange with one another on a face-to-face interaction basis. The degree of differentiation of demand is least for the networkers. They are also interested in graduate students and particularly in high-impact publishing scientists. This is what one would expect, since they should have typical brokerage roles in the system. The exchanges between the publication-oriented self-concepts partly follow a pattern of homophily, partly a centre–periphery pattern. Frequently publishing scientists quite often look for other frequently

1.40 Demand for networkers, ex ante 1.20 Demand for networkers, ex post 1.00 Demand for graduate teachers 0.80 Demand for frequently publishing scientists 0.60

Demand for high impact publishing scientists

0.40 0.20 0.00 Self-concept ‘networker’ (n = 13)

Self-concept ‘graduate teacher’ (n = 42)

Self-concept ‘frequent publisher’ (n = 20)

Self-concept ‘impact publisher’ (n = 25)

Self-concept mentioned at least twice during an interview

Figure 5. Self-concepts and demand for intermediary and final outputs

14

Research Evaluation March 2010

Indicators to measure scientific performance

publishing scientists, but even more often for highimpact publishing scientists. High-impact publishing scientists look for other high-impact publishing scientists and much less for frequently publishing scientists. Stratification can also be seen in the demand structure of the networkers. High-impact publishing scientists rank clearly ahead of frequently publishing scientists here. This ranking can be found at different levels for all profiles. Next to the high-impact publishing scientists, graduate teachers are characterised by the dominance of in-group choices in their demand pattern. As mentioned in the footnote to Table 4, the respondents characterised by a graduate teacher selfconcept very often mention training and exchange of graduates in the same breath. Graduate teachers also relatively often mention demands addressed to the information infrastructure and exchanges with highimpact publishing scientists. The latter may be important to organise research stays for graduates or to find adequate placements for them. With respect to our hypotheses, the qualitative data corroborate Thesis T6a. Publication-oriented profiles have high demands for graduates and use the information infrastructure intensively. The latter is also true for graduate teachers. The pattern of collaboration among and with the publication-oriented profiles shows a strong stratification. Those specialised in intermediary outputs address collaboration offers more often to the highly reputed impact publishing scientists than to frequently publishing scientists. A clear pattern of homophily is displayed only by the top group of high-impact publishing scientists, not by the frequently publishing scientists who also address high-impact publishing scientists more often than their own group. Contrary to our hypotheses, we find the pattern of exchanges of like with like for the producers of intermediaries, too. Networkers coordinate with each

other by face-to-face interaction. They use this type of ex ante coordination much more often than all other groups. In addition, the graduate teachers are strongly connected to one another, collaborate in graduate and summer schools and exchange their doctoral students. Overall, when we compare subjective data and output clusters based on objective data, we find a quite similar demand structure (Figure 6). An important difference between objectively and subjectively defined profiles is the higher degree of homophily in the clusters defined by the output structure. The largest differences can be found for the networkers. When we compare the numbers in the four profiles defined either by self-concepts or by the cluster analysis of output data, we find the largest divergence here. Thirteen cases were attributed to the networking profile according to the intensity of this self-concept type (Figure 5). This is 13% of 100 codings of self-concepts overall (multiple codings possible). Twenty cases were attributed to this cluster exclusively, that is, 30% of 66 cases (Figure 6). While the self-concept networkers prefer highimpact publishing scientists to frequently publishing scientists, it is the other way round for the cluster groups. This is also true for the demand of the frequently publishing scientists. The cluster group does not prefer high-impact publishing scientists over the own in-group, while the self-concept group does. Thus while three groups follow a homophily pattern in Figure 6, only two did in Figure 5. As a consequence, the degree of stratification is lower. An explanation for this difference is that in the qualitative data on self-concepts, there is a considerable overlap between the networkers’ self-concept and the high-impact publishing scientists’. This is also reflected in the shares of respective selfconceptualizations in the corresponding clusters based on objective output structure. Those with a

1 0.9 Demand for networkers, ex ante

Average number of times mentioned

0.8 0.7

Demand for networkers, ex post

0.6

Demand for graduate teachers

0.5

Demand for frequently publishing scientists

0.4

Demand for high impact publishing scientists

0.3 0.2 0.1 0 Cluster networkers (n = 20)

Cluster graduate teacher (n = 20)

Cluster frequent publisher (n = 10)

Cluster impact publisher (n = 10)

Figure 6. Profile clusters and demand for intermediary and final outputs

Research Evaluation March 2010

15

Indicators to measure scientific performance

self-concept of networkers most often belong to this cluster (45.5%), but a relevant share belongs to the high-impact publishing cluster (36.4%).20 For the self-concept of high-impact publishing scientists we also find a trend towards strength in networking outputs (30.0%). Conclusions The quantitative examination of a broad set of research groups with regard to different dimensions of performance implies various conclusions which are highly relevant for science policy. First, the data show that the use of third-party funding as a main indicator of scientific performance gives rise to distorting incentives, because above a certain threshold, the effect of additional TPF is even negative. Beyond the debate on TPF as an instrumental or fundamental goal, we recommend qualifying the weight of TPF by introducing a variety of other performance indicators. Second, the analysis shows that the strict focus of evaluation on research performance and scientific excellence implies misleading incentives, as other important dimensions of performance are neglected. More than half of the groups analysed pursue activities which are relevant for the infrastructure of the scientific community and thus for the sustainability of scientific performance in the long run. In consequence, a broader set of indicators is necessary to arrive at an adequate evaluation process. The qualitative analysis of the science system’s functionality with respect to the four identified dimensions of performance makes clear that the teaching of doctoral students in particular is a task that all disciplines and all specialised clusters consider to be central. But it also shows that in fields characterised by the strong dynamics of new sciences, such as nanotechnology, the lack of incentives to engage in graduate teaching — compared with the strong incentives connected to final outputs such as publications and patents — may result in a shortage of postdocs in the long run. If we set aside the strong lip service of all specialised clusters to graduate teaching, the content analysis of the interview corroborates our thesis that researchers are quite aware of their own specialisation profile. Thus we do have to take into account that they will probably react strategically to evaluation schemes that do or do not set incentives for intermediary and final outputs of research. In addition, the analysis shows that while there is a functional specialisation in output structure and self-concepts, other complementary tasks are still part of selfconcepts of the cluster groups to a high degree. Differences between the two approaches emerge mostly with respect to the profile of networkers. So the analysis of the qualitative data yields a strong interaction and blurring boundaries between the profiles of the networkers and the high-impact publish-

16

ing scientists, since networkers may be frequent copublishing scientists too; this may be due to the inclusion of international co-publications as an indicator for the definition of high-impact publishing scientists in the statistical analysis. This result of the factor analysis is due to the fact that international copublications are cited above average, but for the present purpose, the inclusion of this indicator into the high-impact publishing profile may be misleading. This might be concluded from a recent analysis of data on the relation of citation and co-publication quotes in four disciplines (Schmoch and Schubert, 2008) showing above average citations for individual international co-publications, but not for the referring groups. All in all, the qualitative and quantitative approaches complement each other very well by highlighting different aspects and by pointing to shortcomings of the other approach. The analysis of the demand-and-exchange pattern of specialised groups, either defined by the cluster analysis or by their subjective self-concepts, highlights the dependence of the science system on intermediary and final outputs. In line with Thesis 6a, we find a large demand for intermediary outputs by the two publication-oriented clusters and selfconcepts. This is directed towards the information infrastructure and towards junior scientists. By contrast, networkers often mention their collaboration with high-impact publishing scientists as do graduate teachers looking for interesting outlets for their students. High-impact publishing scientists are highly valued by all clusters and self-concepts. This effect is even stronger for the subjective data. This observed stratification in the system is driven very much by recent evaluation and funding schemes. It may even increase further. In order to secure a balance between different specialisation profiles, science policy should take better account of the special production structure of the science system. Effective incentives must address all specialised groups: in particular, graduate teachers and networkers need increased attention and esteem, which should be reflected in evaluation and funding schemes.

Notes 1.

2.

3. 4. 5. 6.

As we were able to show in a follow-up study with larger samples, choosing research units with the help of experts implied a certain selection bias with a preference for more productive and specialised groups. It should be noted that the figures presented serve to give an intuition of WoS and Scopus coverage. They do not reflect our bibliometric strategy, where we did not make use of classification codes. The papers attributed to a research group were identified on the basis of a list of their employees’ names. Thus a publication for an economic research group would have been counted as such, even it had been classified as, say, sociology. In full-time equivalents. Cited according to Ball and Wilkinson (1994: 418). For a more encompassing discussion, compare Jansen et al (2007). Instead of giving too many technical details, we refer the

Research Evaluation March 2010

Indicators to measure scientific performance

Appendix 1. We can show by simple calculus in equation (1) that the optimal level of the

TPF opt ≡ −α j / ( 2α sqj ) , where α j

TPF

is the coefficient of the linear effect of

variable for some discipline

TPF

and

α sqj

j

is given by

is that of the squared effect. By

g : \ k → \ l is a measurable function and α ∗ is a vector of the ML estimates, then g (α ∗ ) . In our case g (α ) ≡ TPF opt : \ 2 → [0,100] is clearly measurable. Thus, the ML

the invariance principle for ML estimates: if

g (α ) is n opt = −α ∗j / ( 2α sqj * ) . Further, it is clear that under the usual regularity conditions the ML theory, including asestimate is TPF the ML estimate of

ymptotic normality, applies. In fact, one of these conditions is of special importance: set

Θ = [0,100] , i.e. TPF

opt

TPF opt

must be in the interior of the parameter

∈ (0,100) . We can then derive an expression for the variance using a mean value expansion

and write: ∗

TPF

opt ∗

where

= TPF

(

opt

∂TPF opt + Δα ∗ + o p (1) ' ∂α

(

2 n ∂TPF opt = −1/ ( 2α sqj * ) ,α ∗j / 2 (α sqj * ) ∂α ′

is the gradient,

(A1)

))

Δα ∗ = (α ∗j − α j ,α sqj * − α sqj ) .

It is straightforward to conclude from equation (A1) and the asymptotic normality of the ML estimate opt ⎛ ∂TPF opt d ⎞ ⎞ −1 ⎛ ∂TPF ( α ) n (TPF opt * − TPF opt ) → N ⎜ 0, I ⎜ ⎟ ⎟ 1 ⎜ ∂α T ∂α T ⎠ ⎟ ⎝ ⎝ ⎠

m Δ α

that:

T

I1 (α ) is the Fisher information. Thus we can treat TPF opt* two-tailed (1 − α ) -confidence interval is given by: where

∂TPF opt I1 (α ∗ ) −1 ⎛ ∂TPF opt ⎜ ∂α T n ⎜⎝ ∂α T ∗

TPF

opt ∗

± u1−α /2



(A2)

as being approximately normally distributed. A valid asymptotic

T

⎞ ⎟⎟ ⎠

(A3)

which is exactly the confidence interval given in Table 2.

7. 8. 9.

10. 11.

12. 13. 14. 15. 16.

interested reader to Wooldridge (2002, ch. 17) who describes non-linear IV estimation (NLIV) in count models. So we added a quadratic TPF term. Measured in full-time equivalents. According to the NLIV approach we also included the seven residual terms from the estimation of the reduced form equations. These were jointly significant, so endogeneity of the TPF and the number of researchers was confirmed. For the sake of simplicity they are not depicted in Table ?. The identification requirements in all seven reduced-form equations were satisfied as well. See also Laredo and Mustar (2000). The observations were pooled because of the small sample size. In order to avoid the trivial cluster shape (astrophysics, nanotechnology, microeconomics) which might have resulted from pooling, we standardised the output values along the discipline-specific means and standard deviations. This is consistent with the rule of letting the number of extracted factors be equal to a number of Eigen-values greater than 1. Clustering was performed by the Ward method. This reflection is beyond the scope of this article. The implications are discussed in Schubert (2008). For comparison: the number of publications per scientist was 2.4 in nanotechnology. The joint supervision of graduates and an engagement in graduate schools was also coded as part of the self-concept ‘graduate teachers’. Frequently the two were mentioned together when talking about graduates and graduate teaching.

Research Evaluation March 2010

A typical example was: ‘A field thrives on the input from young people. So, naturally, it is vital for us to work with the universities and to train graduates ourselves, to have doctorate students. That is a really important point and we exchange the young people. To give an example, we currently have a young lady here from the University of Turin, she is writing her thesis here … on the other hand, one of our former doctorates has gone to … and is working as a post-doc there.’ 17. Be aware that the clusters were defined on the basis of discipline specifically standardised output data. A potential explanation for the divergence of cluster affiliation and self-concept of graduate teaching may be that respondents were thinking about absolute numbers in graduate teaching and their relative standing. Here indeed, the absolute output of nanotechnology in doctorates is higher than in astrophysics (4.39 compared to 3.88). Another difference is of course that clusters are defined in an exclusive way, while self-concepts are defined as more or less intense. Another reason for differences may be that while all cases were subjected to the content analysis, the whole set of performance indicators including the bibliometric data could only be collected for a sub-sample of n = 66. 18. We also ran the correlation analysis with self-concept variables normalized for disciplinary differences (ztransformation). Results are quite similar except that the correlation between Self-concept High-Impact Publishing and number of publications and copublications loses significance. 19. The picture does not change with a cut-off value of 1. The

17

Indicators to measure scientific performance only difference is a change in the average number of selfconcepts each research group is associated with (cut-off value of 2: 1.3 self-concepts; cut-off value of 1: 1.8 selfconcepts). 20. One explanation for the high share of groups with the selfassessment ‘networker’ in the cluster ‘high-impact publishing scientists’ may be due to the fact that the cluster is partly defined by the number of international co-publications, as the latter are cited above average. However, international copublications imply a relevant level of networking.

References Ball, R and R Wilkinson 1994. The use and abuse of performance indicators in UK higher education. Higher Education, 27, 417–427. Bonaccorsi, A and G Thoma 2007. Institutional complementarity and inventive performance in nano science and technology. Research Policy, 36(6), 813–831. Breslow, N 1984. Extra-poisson variation in log-linear models. Applied Statistics, 33, 38–44. Crane, D 1969. Social structure in a group of scientists: test of invisible college hypothesis. American Sociological Review, 34, 335–352. Dasgupta, P and P David 1994. Towards a new economics of science. Research Policy, 23, 487–507. Eisenführ, F and M Weber 2004. Rationales Entscheiden, 4th edn. Berlin: Springer. Elton, L 1987. Warning Signs. Times Higher Education Supplement. Franke, K, A Wald and K Bartl 2006. Die Wirkung von Reformen im deutschen Forschungssystem. Eine Studie in den Feldern Astrophysik, Nanotechnologie und Mikroökonomie. Speyer Forschungsberichte, 245, Speyer. Geuna, A 1997. Allocation of funds and research output: the case of UK universities. Revue d’Economie Industrielle, 79, 143–162. Geuna, A 1999. The Economics of Knowledge Production: Funding and the Structure of University Research, New Horizons in the Economics of Innovation. Cheltenham, UK and Northampton, MA, USA: Elgar. Gläser, J 2006. Wissenschaftliche Produktionsgemeinschaften – Die soziale Ordnung der Forschung. Frankfurt am Main and New York: Campus. Hargens, L, L Mullins, and N C Hecht 1980. Research areas and stratification processes in science. Social Studies of Science, 10, 55–74. Hayfield, T and J Racine 2008. Nonparametric econometrics: the np package. Journal of Statistical Software, 27(5). Available at , last accessed 1 March 2010. Hornbostel, S 2001. Third party funding of German universities: an indicator of research activity? Scientometrics, 50, 523–537. Jansen, D, A Wald, K Franke, U Schmoch and T Schubert 2007. Third party research funding and performance in research: on the effects of institutional conditions on research performance of teams. Kölner Zeitschrift für Soziologie und Sozialpsychologie, 59, 125–149. Jansen, D, R von Görtz and R Heidler 2010. Is nanoscience a Mode-2 field? Disciplinary differences in modes of knowledge production. In Governance and Performance in the German Public Research Sector: Disciplinary Differences, ed. D Jansen, pp. 59–85. Dordrecht: Springer (forthcoming). Johnes, J 2006. Data envelopment analysis and its application to the measurement of efficiency in higher education. Economics of Education Review, 25(3), 273–288. Keeney, R L 1992. Value-focused Thinking: a Path to Creative Decisionmaking. Cambridge, MA: Harvard University Press. Krohn, W and G Küppers 1989. Die Selbstorganisation der Wissenschaft. Frankfurt am Main: Suhrkamp Verlag. Larédo, P 1999. Changing Structure, Organisation and Nature of Public Sector Research: the Development of a Reproducible

18

Method for the Characterisation of a Large Set of Research Collectives: a Test on Human Genetics Research in Europe. Armines: CSI. Larédo, P and P Mustar 2000. Laboratory activity profiles: an exploratory approach. Scientometrics, 47, 515–539. Laudel, G 2005. Is external research funding a valid indicator for research performance? Research Evaluation, 14(1), April, 27–34. Luhmann, N 1990. Die Wissenschaft der Gesellschaft. Frankfurt am Main: Suhrkamp. Mayntz, R 1998. Socialist academies of sciences: the enforced orientation of basic research at user needs. Research Policy, 27, 781–791. Merton, R 1957. Priorities in scientific discovery: a chapter in the sociology of science. American Sociological Review, 22(6), 635–659. Merton, R 1973. The Sociology of Science. Chicago: Chicago University Press. Mullins, N C, L L Hargens, P K Hecht and E L Kick 1977. The group structure of cocitation clusters: a comparative study. American Sociological Review, 42, 552–562. Nagpaul, P S and S Roy 2003. Constructing a multi-objective measure of research performance. Scientometrics, 56(3), 383–402. Rousseau, S and R Rousseau 1997. Data envelopment analysis as a tool for constructing scientometric indicators. Scientometrics, 40(1), 45–56. Schmoch, U and T Schubert 2008. Are international copublications an indicator for quality of scientific research? Scientometrics, 74 (3), 361–377. Schmoch, U and T Schubert 2009. Sustainability of incentives for excellent research – the German case. Scientometrics, 81, 195–218. Schubert, T 2008. Should Scientific Research Groups Specialise in the Production of Scientific Goods? DIME-BRICK Workshop, 14–15 July 2008, Torino. Available at , last accessed 1 March 2010. Schubert, T 2009. Empirical observations on new public management to increase efficiency in public research – boon or bane? Research Policy, 38, S. 1225–1234. Shrum, W and N Mullins 1988. Network analysis in the study of science and technology. In Handbook of Quantitative Studies of Science and Technology, ed. A F J van Raan, pp. 107–133. North Holland: Elsevier. Stokes, T D and J A Hartley 1989. Coauthorship, social structure and influence within specialties. Social Studies of Science, 19, 101–125. Stolte-Heiskanen, V 1979. Externally determined resources and the effectiveness of research groups. Scientific Productivity, ed. F M Andrews, pp. 121–153. Cambridge: Cambridge University Press. Van Raan, A F J 1997. The future of the quality assurance system: its impact on the social and professional recognition of scientists in the era of electronic publishing. Journal of Information Science, 23(6), 445–450. Wagner, C and L Leydesdorff 2005. Network structure, selforganization and the growth of international research collaboration in science. Research Policy, 34(10), 1608–1618. Wald, A 2005. Zur Messung von Input und Output wissenschaftlicher Produktion. FÖV Discussion Papers, 20, Speyer. Wald, A 2007. Effects of ‘Mode 2’-related policy on the research process: the case of publicly funded German nanotechnology. Science Studies, 20(1), 26–51. Warning, J 2004. Performance differences in German higher education: empirical analysis of strategic groups. Review of Industrial Organization, 24, 393–408. Winterfeldt, D V and W Edwards 1986. Decision Analysis and Behavioural Research. Cambridge: Cambridge University Press. Wissenschaftsrat 2005. Stellungnahme zu Leistungsfähigkeit, Ressourcen und Größe universitätsmedizinischer Einrichtungen. Bremen. Wooldridge, J 2002. Econometric Analysis of Cross Section and Panel Data: the MIT Press. Cambridge, MA.

Research Evaluation March 2010