A Concordance Correlation Coefficient to Evaluate Reproducibility

5 downloads 0 Views 1MB Size Report
coefficient measures a linear relationship but fails to detect any departure from ... The least squares approach can reject a highly reproducible assay due to very .... underlying values of Pc with samples sizes of n = 10, n = 20, and n = 50. ..... the M.T. Also shown in these three rows are moderate precision (r = .936), with poor.
A Concordance Correlation Coefficient to Evaluate Reproducibility Author(s): Lawrence I-Kuei Lin Reviewed work(s): Source: Biometrics, Vol. 45, No. 1 (Mar., 1989), pp. 255-268 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2532051 . Accessed: 23/08/2012 08:58 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.

http://www.jstor.org

BIOMETRICS

45, 255-268

March 1989

A ConcordanceCorrelationCoefficientto Evaluate Reproducibility LawrenceI-Kuei Lin BaxterHealthcareCorporation,Route 120 and Wilson Road, Round Lake, Illinois 60073, U.S.A. SUMMARY A new reproducibilityindex is developedand studied.This index is the correlationbetweenthe two readingsthat fall on the 450 line through the origin. It is simple to use and possesses desirable properties.The statisticalpropertiesof this estimatecan be satisfactorilyevaluatedusing an inverse hyperbolictangent transformation.A Monte Carlo experimentwith 5,000 runs was performedto confirmthe estimate'svalidity.An applicationusing actualdata is given.

1. Introduction In an assay validation or an instrument validation process, the reproducibilityof the measurementsfrom trial to trial is of interest. Also, when a new assay or instrumentis developed, it is of interest to evaluate whether the new assay can reproducethe results based on a traditionalgold-standardassay(Westgardand Hunt, 1973;Bauerand Kennedy, 1981). Such validation processes are often evaluated by using the Pearson correlation coefficient, the paired t-test, the least squaresanalysis of slope (= 1) and intercept(= 0), the coefficientof variation,or the intraclasscorrelationcoefficient.There'are drawbacksto all of these, however, in that none alone can fully assess the desired reproducibility characteristics.For example, to evaluatethe blood cell counter for hematologyanalysisin a laboratory,it is desirableto have duplicatesof the same blood sample measurementby the counterat differenttimes (usuallyat most 1 day apart)yield resultsas close togetheras possible.If we plot the firstmeasurementagainstthe second measurementof the red blood cell counts for all blood samples available,we would like to see, within a tolerableerror, that the measurementsfall on a 45? line throughthe origin (45'). The Pearsoncorrelation coefficientmeasuresa linearrelationshipbut fails to detect any departurefrom the 45' line (see Figure 1). The pairedt-testfails (see Figure2) to detect poor agreementin pairsof data such as (1, 3), (2, 3), (3, 3), (4, 3), and (5, 3). Combiningthe above two methods cannot detect poor agreementin pairs of data such as (1, 2.8), (2, 2.9), (3, 3.0), (4, 3.1), and (5, 3.2). The least squaresapproachfails to detect departurefrom interceptequal to 0 and slope equal to 1 if data are very scattered(see Figure 3, lower plot). In other words, the more the data are scattered (nonreproducible),the less chance one could reject the hypothesis.The least squaresapproachcan reject a highly reproducibleassay due to very small residualerror(see Figure 3, upper plot). This is also true if the pairedt-test is used (see Figure 2, lower plot). The coefflcient of variation and the intraclass correlation coefficient allow duplicatereadingsto be interchangeable.In other words, these methods considerduplicatereadingsas replicates(random)ratherthan two distinct readings.Two Key words: Accuracy; Asymptotic normality; Concordance correlation coefficient; the origin; Precision; Z-transformation.

255

450

line through

Biometrics, March 1989

256

45.

g~ ~~~~ocation

shift

z~~~~

*.

This index may also be a good statisticfor use in goodness-of-fittests. For example,to test for normality, one can measure the agreementbetween the cumulative density function versusthe cumulative normal density function throughthis index, ratherthan taking the maximum deviation,as in the Kolmogorovtest. This index can also be applied when Y, is random and Y2is fixed, in which case the standarderror has a simpler structure.The index can be used to characterizeagreement between the observedmeasurementsand the theoretical(expected)values. These possibilities are under investigation. ACKNOWLEDGEMENTS I would like to express my sincere gratitudeto the editor and refereesfor their valuable comments, to LaureneStrauchfor her assistancein preparingthis manuscript,and to my fellow statisticiansat BaxterHealthcareCorp.for their supportand comments.

RESUMm On presenteet on etudie un nouvel indice de reproductibilite.Cet indice est la correlationentre les deux lectures qui tombent sur la premierebissectrice(45 degres).It est simple a utiliser et a les proprietessouhaitees.Les proprietesstatistiquesde cette estimationpeuvent etre evalueesde fa9on satisfaisanteen utilisantla transformationde l'arctangentehyperbolique.On a fait une simulationde Monte Carloavec 5,000 tiragespour confirmerla validitede l'estimation.On donne une application utilisantdes donneesreelles.

REFERENCES

Bauer, S. and Kennedy, J. W. (1981). Applied statisticsfor the clinical laboratory:II. Within-run imprecision.TheJournalof ClinicalLaboratoryAutomation1, 197-201. Efron,B. (1979). Bootstrapmethods:Anotherlook at the jackknife.Annalsof Statistics7, 1-26. Johnson, H. J., Northup, S. J., Seagraves, P. A., Atallah, M., Garvin, P. J., Lin, L., and Darby,T. D. (1985). Biocompatibilitytest procedureformaterialsevaluationin vitro.II.Objective methods of toxicity assessment. Journal of Biomedical Materials Research 19, 489-508. Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. New York: Wiley.

A Concordance Correlation Coefficient

267

Westgard,J. 0. and Hunt, M. R. (1973). Use and interpretationof common statisticaltests on method-comparisonstudies.ClinicalChemistry19, 49-57. ReceivedFebruary1986; revisedJune and September1987. APPENDIX Let (Yl , Y12), ..., (Yn,, Yn2) be independent observations on a bivariate normal distribution. The concordancecorrelationcoefficientis PC =

20I2 U2 + (Al

2 +

The Z-transformationof PCis Z= 1ln[(1 + Pc)/(l The sampleanaloguesare

- Pc)] =

2SI2 PC =S2

+

S2

M2)

-

+(

tanh-lpc.

-

F,2)

and Z = tanh-1 c. to obtain the We first consider the asymptotic normality of Z. It will then be straightforward asymptotic normality of c. by simply applying the hyperbolic tangent transformationof Z. The asymptoticnormalityof Z is algebraicallyless cumbersome.The transformationmaybe expressed as Z = g(v), where Y2,

V = (VI, V2, V3, V4, V5) =

n

Y1

Y2

Y12,

and g(vI, V2, V3, V4, V5)

4(5+

ln[1 +

=

2

-2

The vector v is expressedas functions of sample moments and has asymptotic5-variatenormality with mean E(v) = (Al, A2, 0-i + /4, 0-2 + 82, 012 + A1I2) and variancen- Z, where T'= =

WI4

W13=

W31

=2A

0I,

W41

=

W34= W43= 2l2 WI5=

W21 = 0-12,

WV5=A2.f0-

W24 =

2M2 0212,

W25=

+M1i012,

W53

W45 =

W54 = 2o712

0-2 02

=

+

20712 14 +

A2/02

+

2

+

W33=2o74+4

W42=2A2