Validation of QSAR's - Wiley Online Library

5 downloads 84963 Views 284KB Size Report
methods apply to any model, be it regression, PLS, molecular modelling, expert system, or whatever, but we use a simple regression model as the illustration.
Quant. Struct.-Act. Relat. 10, 191 - 193 (1991)

Validations of QSAR’s

191

Guest Editorial

Validation of QSAR’s Svante Wold, Institute of Chemistry, Umei University, Sweden

What is the purpose of developing a QSAR for a given problem, say one of drug optimization? The obvious answer is that the QSAR gives us information of how changes in the structure of the actual compounds influence their biological activity. This, in turn, allows us to (a) modify the structure to improve the drug potency, decrease toxicity, etc., and (b) improve our understanding of the actual biological mechanism. Because it is impossible to gain a complete knowledge of all details of a biological system, we use a mathematical model of the system, here called a QSAR. The present discussion and methods apply to any model, be it regression, PLS, molecular modelling, expert system, or whatever, but we use a simple regression model as the illustration. In a simple case the QSAR model may be, for instance, a regression model with one response: y = b,

+ b,

A

+ b, a + b,

MR

+ b4 A’ + E

(1)

After measuring the biological activity (y) of some pertinent compounds and applying multiple regression analysis, we may arrive at the model parameter values: b, = 2.16, b, = 0.73, b, = -0.64, b, = 0.11, and b4 = -0.23.

,

If the model (1) is valid as discussed below, we may use the model with its parameter values to predict what would happen when the factors ( T , a, and MR) are changed. Above, a moderate increase of A and MR would increase y. If this is desirable, we would go out in the lab and make a compound with a larger and more lipophilic substituent than we had before. When the pharmacological tests return a higher value of y than seen before, we are happy and feel that our understanding and control of the situation is good. This use of mathematical models as representatives of the real problem allows us to study (simulate) the behaviour of a system without making large numbers of experiments. However, this rests on one crucial assumption, namely that the model is valid, i.e. that the model is an adequate representation, approximation, of the real system in the factor intervals of interest, say, - 2 < A < 2; -0.2 < c 0.9; 0.9 C MR < 25;

SO,how do we assure ourselves that a model, a QSAR is valid? The model is usually derived from biological activity values measured on a “training set” of n compounds, for which we also know the values of our factors (e.g. A, u, and MR). Using regression or something similar (e.g. PLS), we calculate values of the parameters b,, b,, etc., in such a way that the residuals ( E ) , the deviations between data and model, are small. A measure of the size of the residuals is given by s, the residual standard deviation (RSD). Equivalently, R2,the multiple cor@ VCH Verlagsgesellschaft mbH, D-6940Weinheim

relation coefficient, often also called the goodness of fit, measures the “explained” y-variance.

So, the first necessary condition for model validity is that RZ is close to 1.0 (R2 > 0.90, r > 0.95) and s is small, say, smaller than 0.3, if y = log(l/C). However, a large R2 and small s is not sufficient for model validity. This because of an unfortunate property of regression models (and most other models) to give a closer fit (smaller s, larger R2,etc.) the larger the number of parameters and terms in the model. And, what is even worse, if we have sufficiently many structure descriptor variables to select from we can make a model fit data very closely even with few terms, provided that they are selected according to their apparent contribution to the fit. And this even if the variables we choose from are completely random and have nothing whatsoever to do with the current problem! This is one reason why stepwise regression is impractical with data sets containing many and collinear predictor variables (X). Other ways of model fitting, such as PLS [9] should then be used, but this is not the subject of the present discussion. This risk for “chance correlations” with variable selection has been pointed out by Topliss, Wold and Dunn, and others [7,81, but seems to be insufficiently recognized by the chemical and biological communities. The big problem with chance correlations is that predictions for new compounds of such models are very poor - the model fits the “training set” data well but is useless for prediction and understanding. If now high RZand small RSD (s) is not sufficient. what should we do? The best is, of course, a fairly large and representative validation set of compounds, for which the predicted activity values can be compared with the actual values. For obvious reasons of cost and time, however, adequate validation sets are rare.

Bootstrapping and Cross-Validation Without a real validation set, a simulated one may be better than nothing. Recent developments in statistics provide us with a new interesting set of measures of validity that are based on simulating the predictive power of a model. These tools bootstrapping and cross-validation [ 1 - 61 - operate by creating a number of slight modifications of the original data set, estimating parameters from each of these modified data sets, and then calculating the variability of the predictions by each of the resulting models. Cross-validation, which is simplest to apply, creates a number of modified data sets by taking away one, or a small group of, 093 1-877 119 1/0309-0 19 I $3.50+ .25/0

192

Quant. Struct.-Act. Relat. 10, 191 - 193 (1991)

Svante Wold

observations (here compounds) from the data in such a way that each observation is taken away once and once only. Then one model is developed for each reduced data set, and the response values (y) of the deleted observations are predicted from the model. The squared differences between predicted and actual values are added to the Predictive REsidual Sum of Squares (PRESS, other interpretations of this acronym include PRediction Error Sum of Squares). In the end, PRESS will contain one contribution from each observation. PRESS is now a good estimate of the real prediction error of the model, provided that the observations (compounds) were independent (see below). If PRESS is smaller than the sum of squares of the response values (SSY), the model predicts better than chance and can be considered “statistically significant”. The ratio PRESSISSY can be used also to calculate approximate confidence intervals of predictions of new observations (compounds). To be a reasonable QSAR model, PRESSISSY should be smaller than 0.4, and a value of this ratio smaller than 0.1 indicates an excellent model.

variable selection should usually be avoided, but we here discuss how to check the validity of the resulting model. The problem now is that the final model is based on variables that have been selected according to their correlation with the activity (y). and the resulting apparent correlations are stable also in cross-validation. This is the problem discussed by Topliss [7]; selecting a few variables from many can be done in so many ways that the results very often look good in retrospect according to any criterion, including PRESS. To apply cross-validation correctly in this situation, one has to delete some observations before the variable selection, apply the step-wise procedure to the reduced data set, and then predict the y-values for the deleted observations from their xvalues and the resulting model, and finally calculate the squared differences between these predictions and the actual y-values. This whole procedure including variabIe selection is then repeated with a second group of observations deleted, etc., until each observation (compound) .has been deleted once and once only. The resulting PRESS is a real measure of the predictive power of the model developed by step-wise variable selection.

Final Words For regression models, PRESS has a simple closed form and is easy to calculate [ 5 ] : PRESS = Ci [Qi - yJ21(1

- hii)2]

(2)

Here yi and yi are the response (activity) values of observation i(i = 1,2, n), observed and predicted by the model, respectively. The diagonal elements of the “hat” matrix, H in Eq.(3), are denoted by hii. X is the (n x p) data matrix containing one column for each of the p terms of the model, i.e. 1.0 for the constant b,, xi for the linear lipophilicity term, etc. for model (1) above.

...,

H = X (X’X)-’ X’

(3)

Most regression programs nowadays give both H and PRESS among their results,

When Does Cross-Validation Work? There are two situations when cross-validation does not work well. The first is when the observations (compounds) are strongly grouped and hence not independent. With QSARs this often happens when two or more different types of compounds are put in the same model, and the activity differs between the groups as with, for example, a set of carboxylic acids with both R-COOH and Ar-COOH (R = aliphatic and Ar = aromatic). Any model will then mainly explain the difference in activity between the two groups, and cross-validation (as well as any other statistical test) will apparently confirm the consistency of this trivial difference. The second situation occurs when cross-validation is applied after variable-selection in step-wise multiple regression (or other variable selection methods). As mentioned above,

From a reader’s (or referee’s) perspective, cross-validation and bootstrapping provide tools to judge the validity of QSAR models, provided that these have been honestly developed, particularly with an explicit recognition of the initial number of candidate variables available as predictors. To summarize, you should believe in a model only: 1. when PRESS has been calculated, and PRESSISSY < 0.4, or, alternatively a bootstrapping has been made. 2. there are plots of the data, 3. a clear description is given of the candidate set of variables, and of the procedure of variable selection if such has been applied.

To some this may seem to be an overly cautious attitude, but in science that is better than being overly gullible. The reader may object that the purpose of developing a QSAR is to achieve a better understanding, not prediction or optimization. However, a model that cannot predict better than chance is a poor basis for understanding chemical-biological interactions.

References Atfen, D.M., Mean square error of prediction as a criterion for selecting variables, Technomerrics 13, 469 - 75 (1971). Diaconis, P. and Efron, B., Computer intensive methods in statistics, Scienrijc American, 96 - 108 (1983). Efron, B., Better bootstrap confidence intervals. J. Amer. Srarist. Assoc. 82, 171 - 200 (1987). Efron, B., Estimating the error rate of a prediction rule: Improvement on cross-validation, J. Amer. Srarisr. Assoc. 78, 316-331 (1983). Myers, R.H., Classical and modem regression wirh applications. Duxbury Press, Boston. 1986. Rawlings, J.O., Applied regression analysis, Wadsworth & BrookslCole, Pacific Grove, CA (1988).

Quant. Struct.-Act. Relat. 10, 193- 197 (1991) [7] Topliss, J.G.and Edwards, R.P.. Chance factors in studies of quantitative structure-activity relationships. J. Med. Chem, 22, 1238 - 1244 (1979). [8] Wold, s. and Dunn, W.J. III, Multivariate quantitative structure-activity relationships (QSAR): conditions for their applicability. J. Chem. In8 Comput, Sci. 23, 6 (1983). [9al Dunn, W.J. IIX, Wold, S., Edlund, U., Hellberg, S. and Gasteiger, J., Multivariate structure-activity relationships be-

Conformations of SHT, Receptor Agonists

193

tween data from battery of biological tests and an ensemble of structure descriptors. The PLS method. QsAR 3, 131 - 137 (1984). [9b] Hoskuldsson, A., PLS Regression Methods. J . Chemornern'cs2, 211 -228 (1988). [9c] Martens, H. and Naes, T., Multivariate Calibrution, Wiley, 1989.

Theoretical Determination of the Putative Receptor-bound Conformations of 5-HTz Receptor Agonists* Dedicated to Prof. Dr.Dr. E. Mutschler on rhe occasion of his 60th birthday. Hans-Dieter Holtje** and Hans Briem Institute of Pharmacy, Free University of Berlin, Konigin-Luise-Strasse 2+4, D-1000 Berlin 33, Germany

1 Introduction For a series of compounds with agonistic activity at the 5-HT2 receptor a pharmacophoric model was established. It has been shown by means of conformational analyses and molecular electrostatic potential calculations that the pharmacophores of all active compounds can adopt common positions at the receptorsite. Besides our model offers an explanation for the stereoselectivity of the chiral compounds as well as for the complete loss of activity of a phenylethylamine compound containing an a,cy-dimethyl sidechain.

During the last decade, research in the field of serotonin (5-hydroxytryptamine, 5-HT) binding sites has increased enormously, caused particularly by the discovery of highly specific radioligands. Today the existence of at least six 5-HTreceptor subtypes is generally accepted (see [ I ] for a recent review), although the functional correlates of these binding sites still remain far from being clear.

Key words: 5-HT, receptor agonists, phenylethylamine, indolylethylamines, conformational analysis, stereospecific pharmacophore, molecular electrostatic potential

Probably the best defined 5-HT receptor subtype is the so called 5-HT, receptor which has been found with high concentrations in the frontal part of the cortex, as well as in peripheral smooth muscle tissues (e.g. vascular, uterine, bronchial and intestinal) and in blood platelets.

** to receive all correspondance

Selective antagonists at the 5-HT2 receptor site have become of therapeutic interest for use in hypertension and schizophrenia.

@ VCH Verlagsgesellschaft mbH, D-6940 Weinhrini

093 1-877II9 IIO3W-0 193 $3S O + 2510