Chemical Accuracy and Precision in Structural ... - CiteSeerX

0 downloads 0 Views 1MB Size Report
One of the most practically useful features of a crystallographic least squares refinement is ... all pdfs have variances; a good example is the cauchy (lorentzian) distribution, for which the .... calculated data points with respect to the model variables. .... In the IUCr Rietveld round robins [5-61, the experimentally-determined.
CHEMICAL ACCURACY AND PRECISION POWDER DIFFRACTION DATA

l[N STRUCTURAL

REFINEMENTS

FROM

James A. Kaduk Amoco Corporation,

P.O. Box 3011 MC F-9, Naperville IL 60566

INTRODUCTION One of the most practically useful features of a crystallographic least squares refinement is that what we loosely describe as error estimates are provided automatically. Before we can discuss the significance of these error estimates, we need to be precise and accurate about what we mean by accuracy and precision. A convenient guide to improving both our thinking and terminology has been prepared by the IUCr Commission on Crystallographic Nomenclature, and is available as Statistical Descriptors in Crystallography on the World Wide Web at http://www.unige.chlcrystal/astat/preface.html. This document is also accessible from the IUCr home page, and was made by merging two separate reports commissioned by the IUCr [ 1,2]. By “chemical accuracy and precision” I mean how well do we know bond distances and angles and other structural parameters from a Rietveld refinement. To assess the accuracy and precision of refined and derived structural parameters, we need to make sure that we are talking a common language. BASIC NOTIONS There are two main interpretations of probability. In the j-equentist point of view, the probability of an event is taken to equal the relative frequency of that event with respect to all possible events, as the number of trials approaches infinity. The Bayesian approach extends the interpretation of probability to include degrees of belief or knowledge. Most physical scientists generally use frequentist language, but implicitly use a more subjective or Bayesian approach in interpreting their experimental data. “What you find depends on what you expect.” The uncertainty of a measurement, or the uncertainty of the measured value of the specific quantity subject to measurement (the measurand), expresses doubt about the value of the measurand. A measurement provides only an estimate of the value of a measurand. Since the value of a measurand is an unknown quantity, the deviation of the measurement from the true value (error, or accuracy) is also unknowable. The uncertainty reflects both random and systematic effects, including deficiencies in our model. An estimate of uncertainty is based on known sources of uncertainty; unknown sources cannot be taken into account. The precision of an estimate is the closeness of agreement obtained by applying a strictly identical experimental procedure a number of times. While uncertainty is a general concept, its quantitative measure is called standard uncertainty. The term standard uncertainty is synonymous

Copyright (C) JCPDS-International Centre for Diffraction Data 1997

with, and should replace, the familiar term estimated stanhrd deviation. The standard deviation is the positive square root of the variance of the probability distribution of the possible values of the measurand. The variance is the second moment about the mean ,u of a probability density function p(x), and is normally denoted by 02:

g2=

7 (x-p)2p(x)dx -co

The normal (gaussian) probability density function has:

P(X) = o-1(27r)-1’%xp{ -%[(x-~)la]2} We tend to assume a gaussian probability density function, even when we have no knowledge about the true probability distribution function. There are good theoretical reasons for this assumption. Not all pdfs have variances; a good example is the cauchy (lorentzian) distribution, for which the second moment is infinite. Thus, there are assumptions built into the very foundations of our expressions. Uncertainty components may be classified based on their method of evaluation into two categories, known as type A and type B. Any estimate of uncertainty based on statistical analysis of experimental data is of type A Uncertainties of the results of a structure determination from a leastsquares refinement are of type A. A type B standard uncertainty reflects evaluation by scientific judgement using all relevant information on the possible variability of an observations. Rather than random or systematic errors, the type A and type B standard uncertainties reflect frequentist and Bayesian contributions to the combined standard uncertainty. LEAST SQUARES Among crystallographers, the colloquial term for a type A standard uncertainty is the estimated standard deviation, obtained from a least-squares refinement of a model. If we are to properly appreciate the significance of these values, we need to have a clear understanding of how they come about. My summary of the mathematics of least squares follows the classic treatment of Prince [3]. Consider a set of N observations yi (1 li4V) that have been measured experimentally, each subject to some random error from the measurement process. In our refinement, we assume that yi = Mi(x) + eh where M(x) represents a model (P-dimensional vector) function, and the ei are the errors distributed according to some probability density tinction. In most cases, the raw data are assumed to be uncorrelated, so that their joint distribution is the product of their individual marginal distributions. (Parenthetically, this may represent an optimistic assumption for a powder diffraction pattern, as illustrated in Figure 1. The measurements of individual points across an isolated peak profile are not uncorrelated (since they depend on the underlying F2 of the reflection), even though the errors are uncorrelated. We might then expect that the standard uncertainties derived fi-om a Rietveld refinement may be optimistic.) Copyright (C) JCPDS-International Centre for Diffraction Data 1997

I

I

I

1.00

I

1.81

I

1.82

I

I.,83

I

1.84

2-Theta, deg

I

I.85

I

1.86

I

1.B7

I

1.88

+*C++rr) I

1.89

XlOE

I

1.90

1

Fig. 1 The 100 reflection of cobalt acetate tetrahydrate, measured on a laboratory diffractometer. The crosses represent the observed data points, and the solid line the calculated peak profile. Although these observations are in principle independent, the measured intensities are related by the underlying value of F2 and the profile function. It can be shown that if the error distributions are gaussian, and the observations are weighted by the reciprocals of their variances, the method of least squares gives the maximum likelihood estimate of the parameters Xj (1