Towards an objective approach to key comparison

0 downloads 0 Views 168KB Size Report
comparison reference value (KCRV). Individual na- tional measurement laboratories can be assessed in terms of their degrees of equivalence, viz., the de-.
Towards an objective approach to key comparison reference values∗ M G Cox and P M Harris

Abstract

constitute a further basis for comparison. A KCRV can be formed in many ways—mean, weighted mean, median, etc. (Section 3)—but for valid results account needs to be taken of the nature of the particular key comparison and the manner in which is conducted. Accordingly, attention must be paid to the various assertions and assumptions that are relevant to the key comparison (Section 4).

Key comparisons of measurement standards are carried out in order to assess the comparative capabilities of national metrology laboratories. This paper is concerned with the development of an objective approach to the determination of a key comparison reference value, a quantity that is central to many key comparisons involving a scalar measurand. Some of the relevant statistical tools are considered, but the emphasis is on the choice of tool being made on the basis of assumptions and assertions that relate to the key comparison of concern. The use of simple modelling concepts are central to this consideration. The extension to a vector measurand using the concept of a key comparison reference curve is indicated.

1

The derivation of a KCRV is objectively carried out by building a model (Section 5) of the key comparison. The model provides the framework that encompasses the key comparison data and the assumptions made. The solution to the model (Section 6) yields a defensible approach to the problem. A sensible model also provides a basis for evaluating an uncertainty associated with the KCRV (Section 7). A flexible model can be expanded to take account of additional information (Section 8), such as artefact drift and correlation effects arising from laboratories taking traceability from a common source.

INTRODUCTION

The Mutual Recognition Arrangement (MRA) [1] provides a basis for undertaking key comparisons (Section 2). To implement the MRA for a key comparison it is typically required to determine a key comparison reference value (KCRV). Individual national measurement laboratories can be assessed in terms of their degrees of equivalence, viz., the deviations of their measurement standards from the KCRV and the uncertainties of these deviations. The degrees of equivalence of pairs of standards

In several areas of measurement the measurand is vector- rather than scalar-valued. There is a case for handling key comparisons in these areas using the concept of key comparison reference curves (KCRCs) rather than KCRVs (Section 9).

∗ Work supported by the National Measurement System Policy Unit of the UK Department of Trade and Industry as part of its NMS Software Support for Metrology programme. Presented at the CIE Expert Symposium on Uncertainty Evaluation, Method for analysis of uncertainties in optical radiation measurement, Vienna, Austria, 22–24 January 2001.

1

2

THE MRA AND KEY COMPARISON REFERENCE VALUES

(b) The uncertainty of this difference at the 95% level of confidence.

The results of the RMO key comparisons are linked to key comparison reference values established by The technical basis of the MRA [1] is the set of CIPM key comparisons by the common participaresults obtained through key comparisons carried tion of some laboratories in both CIPM and RMO out by the Consultative Committees of the CIPM, comparisons. the BIPM and the regional metrology organizations (RMOs). RMO key comparisons are linked to the The results of the CIPM and the RMO key comparcorresponding CIPM key comparisons by means of isons, the key comparison reference values, and the degrees of equivalence, together with other informajoint participants. tion needed for their interpretation, are published The following interpretation of the MRA [1] is used by the BIPM and entered into its key comparison database.1 in this paper. Several quantities are obtained in key comparisons:

3

STATISTICAL TOOLS

1. A key comparison reference value, a reference value determined as part of a CIPM key comSuppose there are N participating national laboparison. ratories in key comarison, with a single (scalar) 2. The degree of equivalence of each national national measurement standard provided by each measurement standard, viz., the degree to participating laboratory. which a standard is consistent with the key Laboratory i provides a value xi and its uncertainty comparison reference value. u(xi ). (The u(xi ) are taken as standard uncertain3. The degree of equivalence between each pair of ties [2] for the purpose of this paper.) There may also be covariances due, e.g., to some of the labonational measurement standards. ratories taking traceability from a common source. The following definitions are given in the technical supplement to the MRA [1]:

The KCRV can usually be regarded as some measure of centrality for the standards, taking account of the specified uncertainties and, where appropriate, prescribed covariances. There are exceptions. 1. The degree of equivalence of each national For example, in gas comparisons the standards can measurement standard is expressed quantitabe prepared, using gravimetric methods, to have tively by two terms: much smaller uncertainties than those of the ana(a) Its deviation from the key comparison ref- lytical measurements of the standards subsequently made by national laboratories. It is appropriate in erence value. this instance to define KCRVs directly as the pre(b) The uncertainty of this deviation at the pared standards. The intention here is to investi95% level of confidence. gate how a KCRV can be assigned objectively in 2. The degree of equivalence between pairs of the absence of such information. national measurement standard is expressed A number of statistical tools for providing such a quantitatively by two terms: measure constitute candidate approaches. 1 The (a) The difference of their deviations from Web address of the BIPM database is http://kcdb.bipm.fr/BIPM-KCDB/. the key comparison reference value. 2

The so-called location estimators in statistical analysis can be used to provide measures of centrality of a sample of data. Together with estimators such as measures of dispersion, location estimators provide summary statistics for data. There are many location estimators, some widely and some little used. Examples of such estimators are the mean, median, trimmed mean, winsorised mean, mid-range and linearly weighted mean. Each estimator has its own characterizing properties—robustness, efficiency, etc.—that are appropriate for certain classes of problems. Each has an associated uncertainty that measures the reliance that can be placed on it, and a coverage interval at a prescribed level of confidence. Desirable properties of an estimator are that it is efficient and robust [9, pp273,363]. As stated, it is important not to “allow” the data alone to dictate the solution.

4

ASSERTIONS SUMPTIONS

AND

Such an assertion would be appropriate when there is no or negligible commonality in the measurement processes employed by the participating laboratories. In practice there will often be mutual dependence among a known subset (or known subsets) of the participating laboratories, as a consequence of common traceability for example.

3. The provided uncertainties are not all credible. A particular laboratory might provide a value for its measurement uncertainty that judged against other laboratories and taking into account the measurement procedures it uses is judged inappropriately small. If this value, together with other such values, were used without qualification, it might unduly influence the resulting key comparison reference value, degrees of equivalence, etc. Equally, an inappropriately large value could give rise to an over-favourable impression in terms of comparative degrees of equivalence. Such instances would need to be handled in a manner that did not unduly affect the outcome of processing the complete set of national measurement standards.

AS-

For all key comparisons it will be necessary to state the assumptions and assertions that are judged to apply. They should be agreed, minimal and explicit. They should relate to the metrology area of concern and not be chosen on the basis of statistical considerations alone. Examples are

Approaches are required that can take considerations such as these into account.2

1. The participating laboratories provide biased values. Such an assumption is inevitable. It is central to the concept of a key comparison. Each provided standard will be biased or offset with respect to a reference value, however that value is determined. In general, some biases can be expected to be significant in a statistical sense. The manner in which the bias is treated constitutes the assumption. One such assumption might be that the bias of each laboratory is equally likely to be positive or negative. A stronger assumption would be that on average (i.e., over the participating laboratories) the bias is zero. These assumptions would only be reasonable if there were no mutual dependence (below) of the measurement standards.

There are advantages in building a simple model of a key comparison that encompasses the key comparison data and these assertions and assumptions. The solution to this model will yield a value for the KCRV and for its uncertainty. The resulting solution will be more objective and defensible than the use of an arbitrarily chosen statistical estimator (mean, median, etc.). The ideal situation would be to have available a generic class of models that can readily be tailored to context.

2 Further possible assertions and assumptions have been enumerated [4].

2. The values provided are mutually independent. 3

5

BUILDING THE MODELS

= (xi − xKCRV ) − (xj − xKCRV ) = xi − xj .

The “raw data” from which the quantities speci- The standard uncertainty u(di,j ) of di,j is therefore fied in the MRA are to be computed are the values given by (measurement standards) x1 , . . . , xN and the stanu2 (di,j ) = u2 (xi ) + u2 (xj ), dard uncertainties u(x1 ), . . . , u(xn ) of these standards, together with, as appropriate, the covarithere being no covariance term. ances cov(xi , xj ) of the standards xi and xj , for relevant values of i = 1, . . . , N and j = 1, . . . , N . See Section 7.2 for a discussion on the interpretation of the standard uncertainties of the measureThis data, supported by the agreed assertions and ment standards as used here. assumptions, provides the basis for the model of the key comparison. The solution to this model leads Evidently, once xKCRV and its uncertainty have to a KCRV, xKCRV , and its standard uncertainty been determined, the degrees of equivalence and u(xKCRV ). Once this information is available, it is their uncertainties can be formed in terms of this then straightforward to determine degrees of equivinformation. Thus, the KCRV and its uncertainty alence. have a pivotal role. For the moment only the simplest situation where the national measurement standards are mutually independent is considered.

Note that in cases where the KCRV can a priori be expressed as a linear combination of the measurement standards, Formula (1) for the deviation needed in the degree of equivalence of national meaThe degree of equivalence of national measurement surement standard i can be expressed just in terms standard i constitutes the deviation of a linear combination of the standards. Thus, the di = xi − xKCRV (1) uncertainty of di can be written directly in terms of the standard uncertainties u(xj ), j = 1, . . . , N . and the uncertainty U (di ) of this deviation at the Through this step it is unnecessary to use Formula 95% level of confidence. Formally, the standard un- (2), which involves the calculation of the covaricertainty u(di ) of di is given by ance of laboratory standard i and the KCRV. Both approaches of course deliver the same result. u2 (di ) = u2 (xi ) + u2 (xKCRV ) − 2cov(xi , xKCRV ). (2) Since the deviation di,j does not need the value of Then U (di ) is given by the KCRV, there is no covariance consideration of this type in that case. U (di ) = ku(di ), where k is a coverage factor at the 95% level of At the heart of the modelling approach considered confidence derived from the probability distribu- is the generic model tion that is assigned to di . If a Gaussian distribuData value = True value + Error, tion is assigned, k = 1.96. the relationship between a measured (data) value, The degree of equivalence di,j between national the true value (that would be obtained in the abmeasurement standards i and j constitutes the dif- sence of measurement error) and the error of meaference of their deviations from the key comparison surement [3]. reference value and the uncertainty U (di,j ) of this deviation at the 95% level of confidence. In practice, the true value is replaced by a model value, under the assumption that the model is sufThus, ficiently realistic [3]. Then, di,j

= di − dj

Data value = Model value + Error. 4

An interpretation, when applying this concept to the set of N provided measurement standards, is the set of model equations

ducibility effects. If the provided u(xi ), which represent the laboratories’ estimates of the combined standard uncertainty of their measurement standards, were dominated by this effect, ωi would be xi = xKCRV + bi + ei , i = 1, . . . , N, (3) taken as 1/u(x ). See Section 7.2 for a further disi where bi represents the bias or offset of Laboratory cussion. i’s measurement with respect to the KCRV, and ei the error in xi that is attributable to the ran- To understand the nature of the mentioned nondom reproducibility effect of Laboratory i’s mea- uniqueness, it is evident that adding a constant to surements. In this formulation the KCRV is inter- all the bi and subtracting that constant from X will preted as an estimate of the quantity being mea- leave the measure unchanged. Thus, let sured. ci = bi + X, i = 1, . . . , N. (5) Then, (4) becomes

The requirement is to use the N model equations (3) to establish values for xKCRV (and the bi ). These equations are inadequate to provide (unique) solution values for these quantities because there are N + 1 unknown values in all, a number greater than the number of equations. However, the incorporation into the model of additional information in the form of the assertions and the assumptions will permit a solution to be obtained.

N X i=1

which is minimized when ci = xi ,

i=1

2

{ωi (xi − bi − X)} .

i = 1, . . . , N,

(6)

giving zero value for the measure (4), regardless of the assertions and assumptions. So, from (5) and (6) the laboratory offsets are simply

The set of measurement standards in a key comparison cannot be considered to be a random sample from a large homogeneous population. Indeed, these measurement standards form a large fraction of the “population” of national measurement standards! Moreover, their nature is such that there is a significant degree of heterogeneity: their respective uncertainties are often very different from each other. In modelling the key comparison it is therefore necessary to reflect these aspects. Because of the inevitable lack of distributional knowledge, appeal can be made to maximum entropy considerations [10]: a Gaussian distribution can be assigned to the variable of which each measurement standard is a realisation.3 Then, the method of maximum likelihood [9, pp253-272] yields a solution that is given by minimizing with respect to the offsets bi and the value of X the least-squares measure N X

ωi2 (xi − ci )2 ,

bi = xi − X,

i = 1, . . . , N,

(7)

and the assignment (of KCRV) made to X will fix their values.

6

SOLVING THE MODELS

As indicated in Section 5 the N model equations cannot be “solved” (uniquely) as they stand because they contain N + 1 unknown values, viz., the KCRV xKCRV and the offsets bi , i = 1, . . . , N .

A sensible set of assertions and assumptions, as discussed in Section 4, enable this deficiency to be overcome, and a unique solution to the model to be provided that reflects their content. Further, the uncertainty of the KCRV will also be derivable (4) from the model.

In (4), the ωi are “weights” that reflect the un- A number of possible sets of assertions and assumpcertainties associated with the laboratories’ repro- tions are considered and the solutions to which they lead examined. It is assumed throughout this sec3 Independent Gaussian distributions are assigned because the simplest situation, in which the variables are mu- tion, as in the previous section, that the measuretually independent, is considered in this section. ment standards are mutually independent. 5

6.1

Zero mean bias and non-credible stance, suppose that the second assumption above is changed in that the provided uncertainties are uncertainties now regarded as credible. Rather than the weights ωi being taken as mutually identical they are now taken so that they are proportional to 1/u(xi ). It follows that the model solution is now

An example of a minimal set of assumptions is 1. The mean offset for the N laboratories is zero

2. The provided uncertainties are not regarded as x KCRV = weighted mean({x1 , . . . , xN }) = credible. The interpretation of 1 is that the scatter of the lab- where oratories’ standards is such that on average they deliver the required reference value. The interpretation of 2 is that there is sufficient concern about accepting the provided uncertainties that it is judged preferable to regard all standards as having comparable uncertainty.4

6.3

The assumptions imply the following formulation. Determine X satisfying (7) and mean(bi ) = 0.

wi =

u−2 (x1 )

N X

wi xi ,

i=1

u−2 (xi ) . + · · · + u−2 (xN )

Zero mean bias and adjusted uncertainties

(8)

i.e., the KCRV is the arithmetic mean of the measurement standards, and bi = xi − xKCRV .5

Consider a case intermediate to the two cases above. Again, Assumption 2 is addressed. Suppose that the provided uncertainties are to be regarded as credible if their values are no smaller than a threshhold “state-of-the-art” value. Any uncertainty that is smaller than the threshhold is replaced by the threshhold value. The resulting uncertainties are then regarded as a credible set. The approach of Section 6.2 could then be applied.

This is the same solution that would be obtained by assigning equal weights ωi = 1/N and minimizing (4) with respect to the bi and X under Condition (8).

The use of thresholding for the uncertainties can be regarded as a counterpart of winsorising,6 a procedure sometimes applied before taking the mean of a sample.

Immediately from these two conditions mean(xi − X) = 0. Thus, the model solution is xKCRV = arithmetic mean({x1 , . . . , xN }),

The Consultative Committee for Photometry and

6.2

Zero mean bias and credible un- Radiometry adopts such an approach, using the weighted mean as above, based on the threshholdcertainties

modified uncertainties. The solution has its critics, but has the advantage that expert judgement is used to provide a solution that “modifies” the solution that would otherwise be obtained in a direction that is dictated by “state-of-the-art” uncertainties.

A different set of assertions and assumptions will generally lead to a different result. For in4 An alternative assumption that would have the same consequence is that the uncertainties are taken as sufficiently similar that they can be regarded as equal. 5 Note that the b are the deviations of the measurement i standards from the KCRV. Thus, together with their uncertainties, they constitute the degrees of equivalence of the national measurement standards [1].

6 Winsorising is the process of replacing values in a sample that fall below a minimum threshold by that threshold value (and similarly for values falling above a maximum threshold).

6

6.4

Equi-probable positive and neg- This result, and that immediately above, which ative bias and non-credible un- is a special case of it, is obtained using standard statistical principles or by following the GUM [2]. certainties An example is the arithmetic mean. In this case, wi = 1/N for all i, giving

Consider the situation as in Section 6.1, but with N the first assumption replaced by the following: each X 2 u2 (xi )/N 2 . u (x ) = KCRV offset is equally likely to be positive or negative. i=1 This is a weaker assumption than that in Section 6.1, viz., that on average the bias is zero. If, as in Section 6.1, the uncertainties of the measurement standards are taken as equal, to u0 , say, The modification to the treatment in Section 6.1 is this result reduces to that instead of (8) the condition is √ u(xKCRV ) = u0 / N , median(bi ) = 0. a well-known statement in statistics concerning the It then follows from (7) that “standard error of the mean”. median(xi − X) = 0.

Some estimators cannot be expressed a priori as a linear combination of the measurement standards. Different treatment is required.

Since adjusting a set of values by a constant simply adjusts their median by that constant, it follows that the model solution is

An instance is the median. One estimate of its uncertainty is given by the median absolute deviation (MAD) from the median. The product of MAD and an appropriate factor provides an estimate of the standard deviation [7]. The multiplication factor is 7 UNCERTAINTY CONSID- that which would apply were the underlying distribution Gaussian. If this assumption did not hold ERATIONS (and this aspect is difficult to test for small N ), the resulting value might be unreliable. It can, how7.1 Uncertainties for the computed ever, be expected to be more robust than the standard deviation determined directly from the data. KCRVs xKCRV = median({x1 , . . . , xN }).

An alternative approach, that yields directly coverage intervals for the median, is as follows. Since the assumption is that the offsets bi are independent and equally likely to be positive or negative, the numerical signs of the biases follow a binomial distribution with probability p = 1/2 [4],[9, 1 1 1 = + · · · + . p364]. Denote by {x(1) , . . . , x(N ) } the measurement u2 (xKCRV ) u2 (x1 ) u2 (xn ) standards {x1 , . . . , xN } arranged in non-decreasing For any estimator that can be expressed a priori order. A coverage interval, at the level of confias a linear combination of the measurement stan- dence 1 − α (at least), for the population median is dards the uncertainty of the KCRV is readily avail- (x(k) , x(N −k+1) ). Here k is obtained from the binoable. If the multipliers in the linear combination mial distribution. It is the unique value satisfying the inequality [4],[9, p364] are w1 , . . . , wN ,

The above solutions can be supported by uncertainty statements. For instance, it is well known that the weighted mean of Section 6.2 has an uncertainty u(xKCRV ) given by

2

u (xKCRV ) =

N X

wi2 u2 (xi ).

k−1 X

(9)

j=0

i=1

7

B(j; N, 1/2) < α/2 ≤

k X j=0

B(j; N, 1/2).

B(j; N, 1/2) is the probability of exactly j values obtained under (full) reproducibility conditions. If in the sample of N having value less than the me- laboratory i provided Mi repeat measurements, the dian, where B(j; N, p) is the binomial probability model would become N Cj pj (1 − p)N −j .7 Formulae and tables for k are xi,j = xKCRV + bi + ei,j , j = 1, . . . , Mi , available [6]. As an example, for N = 24, the sami = 1, . . . , N, ple median is (x(12) + x(13) )/2 and a 95% coverage interval for the population median is (x(7) , x(18) ). where the notation is as before except that now xi,j denotes the jth measurement from Laboratory i and ei,j the corresponding error.

7.2

Other uncertainty aspects It would then again be necessary to apply appropriate assumptions to decide how the model would be solved. For example, suppose the first assumption is as before, but that for Laboratory i the uncertainty to be assigned to the values xi,j , j = 1, . . . , Mi , is obtained by a Type A evaluation [2], viz., estimated by the standard deviation si , say, of these values.

Substitution of the solution values of xKCRV and the laboratory offsets bi into the model equations (3) would provide zero as estimates of the ei . (Recall that the ei are errors and, unlike uncertainties, can take zero values.) That ei is estimated by zero should not be regarded as unreasonable, since the fact that there is only one “observation” per laboratory means there is no redundancy of infor- In these circumstances, the model solution is given mation. In these circumstances, zero is the best by minimizing with respect to X the quantity available estimate. See below. 2 Mi  N X X xi,j − bi − X , An important point must be made in relation to obsi i=1 j=1 taining a statistically sound result. For several key comparisons each participating laboratory has pro- again subject to the condition vided only a combined (or expanded) uncertainty mean(bi ) = 0. that relates to its provided measurement standard. This uncertainty would be intended to accommo- The resulting KCRV is the weighted mean date both the uncertainty due to reproducibility ,N Mi N X X X and that due to bias. Were the former uncertainty 2 xKCRV = xi,j /si Mi /s2i . to be dominant the approach here would be valid. i=1 j=1 i=1 If this were not the case, it would be necessary to extract the reproducibility uncertainty component Its uncertainty can be evaluated using the fact that from a detailed “uncertainty budget” provided by again the KCRV is obtained as a linear combination each of the participating laboratories. In the fu- of the standards. ture, laboratories participating in key comparisons will provide these budgets, and so this information The approach has value in that rather than each can be expected to be available. laboratory being asked to provide a measurement standard and an estimate of its uncertainty, it is There is a further consideration [11] that would fa- asked instead to provide a number of values of the cilitate the task of the participating laboratories, standard, under reproducibility conditions. These and also stengthen the analysis of the standards. repeated observations permit the reproducibility Instead of providing a detailed uncertainty bud- uncertainty to be estimated (for each laboratory). get, each laboratory would be asked to provide a These uncertainties are used in the analysis. The set of representative measurements of its standard, uncertainties estimated by the laboratories are not used directly. 7

For any given α there will be values of N that are too small for a coverage interval to be determined. For a coverage probability of 95% (α = 0.05), e.g., N must be at least six.

A validation step would be possible, assuming, as is conventional, the laboratories also provided 8

Note that cov(xi , xi ) = u2 (xi ) and hence that the diagonal elements could also be wriiten in the same notation as for the off-diagonals.

estimates of the uncertainties of their standards. The aggregation, for each laboratory, of the reproducibility uncertainty and the calculated offset could be compared with the laboratory’s estimated uncertainty.

In general, by suitable numbering of the partipating laboratories, Vx is a block-banded matrix.

Suppose that there are N = 10 participants, that laboratories 2 and 3 take traceability from one source and laboratories 6, 7 and 8 take traceability from another source (that is independent of the first source). Suppose that otherwise the laboratories are mutually independent. Then, the structure of Vx is   The incorporation of additional information into a × model of a key comparison is now considered. Two   × ×   aspects are discussed. The first aspect concerns   × ×   a key comparison where certain pairs of the mea  ×   surement standards have mutual uncertainties, i.e.,   ×  , nonzero covariances. As indicated, an important   × × ×   circumstance arises when some laboratories take   × × ×   traceability from a common source. The second   × × ×   aspect concerns the influence of key comparison de  × sign, incorporating effects such as unstable artefact × behaviour due to drift. where × denotes a nonzero value and a blank represents a zero value. For instance, element (1, 2), that in row 1 and column 2, is zero, since the measure8.1 Mutually dependent measure- ment standards of laboratories 1 and 2 are mutument standards ally independent, whereas element (2, 3) is nonzero, since those of laboratories 2 and 3 are mutually dependent. In general, the uncertainty information associated with the measurement standards can be sumThere can of course be major practical difficulties marised within a covariance matrix. As before let in deciding the extent of the covariance effects. there be N participating laboratories and let xi denote the measurement standard for participant Consider, as in Section 7.1, a KCRV formed from a number i. It is possible to consider a statistical linear combination of the measurement standards, framework for treating this data, together with the i.e., associated uncertainties and covariance effects. DeN X note by wi xi , xKCRV = wT x = i=1 x = (x1 , . . . , xN )T where the (column) vector of these N standards. Let Vx x = (x1 , . . . , xN )T , w = (w1 , . . . , wN )T . denote the covariance matrix for x. Vx contains as its diagonal elements the variances, viz., the values Then, standard statistical theory gives of u2 (xi ). All other elements of Vx are zero unless u2 (xKCRV ) = wT Vx w. xi and xj are mutually dependent (e.g., because laboratories i and j take their traceability from a In cases where Vx is diagonal, because there is common source), in which case element (i, j) con- no mutual uncertainty, this formula reduces to the tains the covariance cov(xi , xj ) of xi and xj . standard one, viz., (9).

8

TAKING ACCOUNT OF ADDITIONAL INFORMATION

9

8.2

Influence of key comparison de- KCRCs have the potential to provide a smoother realisation of KCRVs across the required spectral sign

range and thus avoid the existing confusing and anomalous erratic sequence of values, uncertainties It is appropriate where possible to model the de- and degrees of equivalence that can be obtained sign of the key comparison. A travelling artefact currently. would be measured in turn at each laboratory, in which case the model should include a parametrised The existing situation with key comparisons in photime-dependent effect. A linear drift [8] might be tometry and radiometry is as follows. Each particiassumed in the absence of other knowledge. An pating laboratory, as part of a prescribed protocol, artefact returned periodically to the lead labora- provides a value of a quantity and its uncertainty tory for re-measurement would permit more sophis- at each of a number of stipulated wavelengths. At ticated modelling, through the additional measure- each wavelength the provided values are aggregated ments taken. to provide a KCRV and its uncertainty at the 95% level of confidence, in accordance with the GUM, In general the fundamental model equations (3) can and degrees of equivalence, in accordance with the be augmented by terms [8] that represent such ef- MRA. The procedure by which the values are agfects. Other influences can be included if relevant gregated would be to use an appropriate estimainformation is available. tor, such as those considered in this paper. For instance, for some of its key comparisons, the CCPR The use of modelling has another advantage. A makes use of the weighted mean (Section 6.2). For model can be constructed algebraically, as here, some photometric key comparisons, at least, a limit even before the measurements are available. Sim- or cut-off value obtained from state-of-the-art meaulations can then be carried out by a lead labo- surement capability is imposed on these weights in ratory to estimate the effects of various sequences order to render them more credible where necessary and loops of involvement by the participating lab- (as in Section 6.3). oratories in advance of the actual key comparison. As a consequence, the influence of different strate- As a natural consequence of the inevitable variation gies can be compared and one selected for the ac- in the participants’ data, the resulting sequence of tual key comparison having properties that in some KCRVs does not necessarily behave smoothly with sense are regarded as most desirable. The strategy respect to wavelength, nor do the corresponding secould be used to assist the lead laboratory in the quences of uncertainty values and degrees of equivpreparation of its measurement protocol. alence. It is reasonable to expect that the measur-

9

VECTOR MEASURANDS AND KEY COMPARISON REFERENCE CURVES

and should behave smoothly with respect to wavelength.9 It is also reasonable to expect that the width of the associated uncertainty “swathe” (see below) would largely change smoothly with wavelength. In order to overcome some of the mentioned diffitometric and radiometric key comparisons. Other areas in which KCRCs might prove beneficial are electrical metrology (measurements with respect to frequency), time metrology (transfer-time measurements with respect to epochs), temperature metrology (resistance ratio with respect to temperature) and chemical metrology (various). 9 There will be circumstances where the spectrum could be subdivided into regions with smooth behaviour expected within regions, but with relatively abrupt changes across the boundaries of the regions. Such changes might be due to the instrumentation undergoing a designed change in functional behaviour.

At the National Physical Laboratory work is under way to investigate key comparison reference curves (KCRCs). KCRCs would provide a continuous counterpart of a KCRV. A KCRV is currently computed and used essentially independently for each stipulated spectral point.8 8 The discussion here is couched in terms of the spectral dependence of the measurand and is thus applicable to pho-

10

culties and to help realise a solution that is closer to KCRC from the family. Relevant technology is the above expectations, in the NPL work the par- available [5]. ticipants’ data would be processed not separately at each wavelength, but spectrally. Thus, as opposed to obtaining a KCRV for each stipulated wavelength, a Key Comparison Reference Curve 10 CONCLUSION (KCRC) would be determined instead. Just as the KCRV is accompanied by a coverage interval at the 95% level of confidence, a counterpart would be provided for the KCRC. The latter would take the form of lower and upper coverage curves. The lower coverage curve would represent the lower spectrally-varying endpoint, and similarly the upper, of a 95% confidence swathe.

Approaches that are as objective and defensible as pragmatically possible are being investigated at the National Physical Laboratory for the analysis of data from key comparisons. According to the MRA [1], the concept of a key comparison reference value (KCRV) and its accompanying uncertainty are normally essential in determining degrees of equivalence. Therefore, initial attention has concentrated Features of the approach are on approaches to determining key comparison reference values (KCRVs) and immediately-related 1. The current difficulties associated with calcu- quantities. These approaches are based on the use lating a KCRV separately at each wavelength of a simple model of key comparisons. Other aspects are also under consideration and will be rewould largely be avoided. In particular, ported elsewhere. (a) “bumpiness” in the spectral sequence of KCRVs and in their uncertainties would The main thesis of the work is that an approach for be reduced determining a KCRV should be based on agreed, (b) confusing variations concerning the minimal and explicit assumptions. In particular, it stated degrees of equivalence from one is regarded as essential that an approach be driven wavelength value to the next would be by the metrology area rather than by abstract statistical considerations such as the assumption that ameliorated Gaussian statistics always apply. The approach (c) in future measurement protocols, particmay well be different for different disciplines and ipants would be permitted to measure at indeed for different key comparisons within a disciother than the stipulated wavelengths pline. Nevertheless, there are advantages in seeking 2. The following curves or functions would be ob- approaches having generic properties, and considering their tailoring to specific instances. The astained: sumptions can be incorporated in the model and (a) a parametrised KCRV, as a function of the solution to the model will therefore be consiswavelength tent with those assumptions. (b) a corresponding parametrised 95% uncerGenerally, the stronger the assumptions the shorter tainty swathe (c) a corresponding parametrised degree of the coverage interval that would be obtained. It equivalence, for each participating labo- is necessary to establish a balance between belief in the assumptions and the quality of the results ratory. obtained. Central to obtaining a satisfactory KCRC would be the choice of an appropriate family of regression models, and sound numerical and statistical concepts to generate candidate KCRCs. Suitable model-validation tools would be used to select a

The determination of the KCRV and its uncertainty is a fundamental step in determining degrees of equivalence of national measurement standards.

11

The generalization of the concept of KCRVs to key

comparison reference curves (KCRCs) will provide a basis for treating vector measurands. KCRCs will help to reduce some of the spurious behaviour that arises as a consequence of natural statistical variation in measurement standards with respect to a parameter such as wavelength or frequency. Sensible modelling provides a useful approach for this purpose.

[7] J. W. M¨ uller. Possible advantages of a robust evaluation of comparisons. J. Res. Natl. Inst. Stnds. Technol., 105:551, 2000. [8] L. Nielsen. Evaluation of measurement intercomparisons by the method of least squares. Technical Report CCEM/WGKC 0013, CCEM Working Group on Key Comparisons, 2000.

This work is supported by the National Measurement System Policy Unit of the UK Department of Trade and Industry as part of its NMS Software Support for Metrology programme.

[9] J. R. Rice. Mathematical Statistics and Data Analysis. Duxbury Press, Belmont, Ca., USA, second edition, 1995. [10] K. Weise and W. W¨oger. A Bayesian theory of measurement uncertainty. Measurement Sci. Technol., 3:1–11, 1992.

References

[11] D. R. White. On the choice of comparison reference values for pair-wise comparison of labo[1] BIPM. Mutual recognition of national ratories. In CPEM2000, pages 325–326, 2000. measurement standards and of calibration and measurement certificates issued by national metrology institutes. Tech- Maurice Cox, Peter Harris nical report, Bureau International des National Physical Laboratory Poids et Mesures, S`evres, France, 1999. Teddington, Middlesex, TW11 0LW http://www.bipm.org/pdf/mra.pdf. United Kingdom [2] BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, and [email protected], [email protected] OIML. Guide to the Expression of Uncertainty in Measurement, 1995. ISBN 92-67-10188-9, Second Edition. [3] M. G. Cox. Constructing and solving mathematical models of measurement. In P. Ciarlini, M. G. Cox, F. Pavese, and D. Richter, editors, Advanced Mathematical Tools in Metrology II, pages 7–21, Singapore, 1996. World Scientific. [4] M. G. Cox. A discussion of approaches for determining a reference value in the analysis of key-comparison data. Technical Report CISE 42/99, National Physical Laboratory, Teddington, UK, 1999. [5] M. G. Cox, A. B. Forbes, and P. M. Harris. Best Practice Guide No. 4. discrete modelling. Technical report, National Physical Laboratory, Teddington, UK, 2000. [6] ISO. ISO/FDIS 16269-7: 2000(E). Statistical interpretation of data–part 7: Median: Estimation and confidence intervals, 2000. 12