Quantitative structure–activity relationships for ... - Springer Link

2 downloads 0 Views 355KB Size Report
Sep 19, 2012 - for organophosphates binding to acetylcholinesterase. Christopher D. Ruark • C. Eric Hack •. Peter J. Robinson • Paul E. Anderson •. Jeffery M.
Arch Toxicol (2013) 87:281–289 DOI 10.1007/s00204-012-0934-z

MOLECULAR TOXICOLOGY

Quantitative structure–activity relationships for organophosphates binding to acetylcholinesterase Christopher D. Ruark • C. Eric Hack • Peter J. Robinson • Paul E. Anderson • Jeffery M. Gearhart

Received: 25 April 2012 / Accepted: 28 August 2012 / Published online: 19 September 2012  Springer-Verlag 2012

Abstract Organophosphates are a group of pesticides and chemical warfare nerve agents that inhibit acetylcholinesterase, the enzyme responsible for hydrolysis of the excitatory neurotransmitter acetylcholine. Numerous structural variants exist for this chemical class, and data regarding their toxicity can be difficult to obtain in a timely fashion. At the same time, their use as pesticides and military weapons is widespread, which presents a major concern and challenge in evaluating human toxicity. To address this concern, a quantitative structure–activity relationship (QSAR) was developed to predict pentavalent organophosphate oxon human acetylcholinesterase bimolecular rate constants. A database of 278 three-dimensional structures and their bimolecular rates was developed from 15

Electronic supplementary material The online version of this article (doi:10.1007/s00204-012-0934-z) contains supplementary material, which is available to authorized users. C. D. Ruark (&)  C. E. Hack  P. J. Robinson  J. M. Gearhart Henry M. Jackson Foundation for the Advancement of Military Medicine, Molecular Bioeffects Branch, AFRL/711 HPW/RHDJ Bldg. 837, 2729 R Street, Wright-Patterson AFB, Greene, OH 45433-5707, USA e-mail: [email protected] C. E. Hack e-mail: [email protected] P. J. Robinson e-mail: [email protected]

peer-reviewed publications. A database of simplified molecular input line entry notations and their respective acetylcholinesterase bimolecular rate constants are listed in Supplementary Material, Table I. The database was quite diverse, spanning 7 log units of activity. In order to describe their structure, 675 molecular descriptors were calculated using AMPAC 8.0 and CODESSA 2.7.10. Orthogonal projection to latent structures regression, bootstrap leaverandom-many-out cross-validation and y-randomization were used to develop an externally validated consensus QSAR model. The domain of applicability was assessed by the William’s plot. Six external compounds were outside the warning leverage indicating potential model extrapolation. A number of compounds had residuals [2 or \-2, indicating potential outliers or activity cliffs. The results show that the HOMO–LUMO energy gap contributed most significantly to the binding affinity. A mean training R2 of 0.80, a mean test set R2 of 0.76 and a consensus external test set R2 of 0.66 were achieved using the QSAR. The training and external test set RMSE values were found to be 0.76 and 0.88. The results suggest that this QSAR model can be used in physiologically based pharmacokinetic/pharmacodynamic models of organophosphate toxicity to determine the rate of acetylcholinesterase inhibition. Keywords QSAR  Organophosphate  PBPK/PD  Chemical warfare nerve agent  Acetylcholinesterase  Bimolecular

J. M. Gearhart e-mail: [email protected]

Introduction

P. E. Anderson Computer Science Department, College of Charleston, 66 George Street, Charleston, SC 29494, USA e-mail: [email protected]

Organophosphate (OP) pesticides and chemical warfare nerve agents (CWNAs) act primarily by inhibiting acetylcholinesterase (AChE, EC 3.1.1.7). Much of the human

123

282

Arch Toxicol (2013) 87:281–289

population is exposed to OP pesticides because they comprise more than 50 % of all pesticides used worldwide (Grigoryan et al. 2009). CWNAs have been used by Iraq against Kurdish civilians and Iranian military personnel (Munro et al. 1994). The cult Aum Shinrikyo have even used them in a variety of terrorist activities (Vale 2007). To better improve human risk assessment of these agents, physiologically based pharmacokinetic/pharmacodynamic (PBPK/PD) models that quantitatively describe the metabolism and disposition in both experimental animals and humans have been developed. Historically, these models have not been applicable to high-throughput risk screening and ranking due to a lack of experimental measurements for key model parameters, such as reaction rates or affinities between AChE and the OP or CWNA compounds. However, quantitative structure–activity relationship (QSAR) models can be developed to predict missing parameter values based on a compound’s structure alone (Eriksson et al. 2003). QSAR models developed on a training set of compounds allow for the prediction of biological activity of related chemicals. They typically take the form Yi = Fi(X1, X2,…,Xn) where Yi are biological activities of molecules represented by some function Fi of experimental or calculated structural properties or descriptors (X1, X2,…,Xn) (Golbraikh and Tropsha 2002). Specifically, OP and CWNA PBPK/PD models require quantitative estimates of their bimolecular (k1), regeneration (k-1) and aging (k2) rate constants describing the binding reaction kinetics between the OP and AChE: k1

AChE þ OPX !AChE - OPX ! k2 AChE - OP þ X, k1

where AChE and OPX are the reactants, AChE-OPX is the unaged product, AChE-OP is the aged product and X is the leaving group. k1 has units of M-1 min-1, while the firstorder rate constants k-1 and k2 have units of min-1. A number of studies measuring or predicting human AChE bimolecular rate constants for pentavalent OP oxons are published in the literature (Brestkin and Godovikov 1978; Fukuto 1990; Kabachnik et al. 1970; Raevskii et al. 1990). This extensive experimental database permits the development of a QSAR model that predicts k1 for new OP oxon and CWNA compounds of interest. Physiochemical descriptors, or structural properties, can be used to represent OP oxon and CWNA chemical structures in multidimensional space. Many types of physiochemical descriptors are available for use, ranging from classical descriptors that are known or easily measured (e.g., number of carbon atoms, molecular weight, etc.) to state-of-the-art quantum chemical descriptors that are computationally intensive. Commercially available software, such as AMPAC 8.0 and CODESSA 2.7.10

123

(SemiChem, Inc., Shawnee Mission, KS), has the ability to calculate both classical and quantum chemical descriptors. A number of publications have shown the efficacy of AMPAC and CODESSA in predicting PBPK/PD-specific parameters (Katritzky et al. 1998, 2005; Ruark et al. 2011). Multivariate regression techniques have been used in the development of successful QSAR models (Dudek et al. 2006). The most commonly used method is the projected to latent structures (PLS, often referred to as partial least squares) which projects predicted variables and observed variables to a new space. A modified PLS method that uses the non-iterative partial least squares (NIPALS) method, called orthogonal projection to latent structures (O-PLS), is a supervised, multivariate modeling technique used to determine the variation within QSAR descriptor space that is correlated with observation space. A second variation within the descriptor space that has nothing to do with the observation space (e.g., noise) is filtered out, resulting in a single latent vector (LV), analogous to a principal component. O-PLS can reduce QSAR model complexity while simultaneously preserving the prediction ability of the model by effectively removing non-correlated variation in the descriptors. In addition to providing a predictive model, O-PLS performs best in the presence of multi-colinearity (Trygg and Wold 2002). Development of a QSAR model can be fraught with difficulties, pitfalls and limitations (Cronin and Schultz 2003). Current issues that reduce the models’ predictive capabilities, for example, include the lack of invariance of chemical space and the presence of ‘‘activity cliffs’’ (Johnson 2008; Maggiora 2006). To alleviate some of these issues, researchers have developed internal and external validation strategies (Gramatica 2007), ways to identify activity cliffs (Guha & Van Drie 2008), y-randomization procedures to prevent chance correlations (Tropsha et al. 2003) and the use of applicability domains (Jaworska et al. 2005) to identify appropriate model extrapolation. These strategies are necessary to detect mispredictions even in cases where predictivity is high (Maggiora 2006). While uncertainty exists, QSAR models are still becoming ever increasingly important in toxicology. As a best approximation, many experts have combined individual models to form consensus predictions (Votano et al. 2004; Tropsha 2010). Individual QSAR models may overemphasize, underestimate or ignore certain physiochemical features. Thus, it seems reasonable that a consensus QSAR model, derived from averages of predictions from individual models, may provide better statistical fit and predictive ability as compared to the individual models. Previous studies have shown a superiority of consensus approaches over the use of individual models, while others have found it not warranted (Hewitt et al. 2007).

Arch Toxicol (2013) 87:281–289

Herein, an O-PLS consensus QSAR model is developed to predict human AChE k1 for pentavalent OP oxons and CWNAs using descriptors calculated from AMPAC and CODESSA. The QSAR model is externally validated, and the William’s plot is used to define the model applicability domain. Furthermore, a y-randomization procedure is used to prevent chance correlations. In the future, this model can be incorporated into a combined QSAR-PBPK/PD approach to predict pentavalent OP oxon AChE inhibition time course in laboratory animals and humans. This should fill several data gaps related to their toxicodynamics and contribute to an enhanced in silico testing strategy for more rapid prioritization of pentavalent OP oxons in relation to traditional toxicity testing. It will also result in a valuable contribution to human health risk assessment as conventional toxicity testing strategies are becoming impractical due to the immense number of chemicals involved and the limited scientific resources to test them (Yang et al. 1998).

Methods General QSAR workflow A summary of the QSAR approach can be found in Fig. 1. A database is developed from the literature, and descriptors are calculated using AMPAC 8.0 and CODESSA 2.7.10 (SemiChem, Inc., Shawnee Mission, KS). Twenty percent of the database is initially left out at random for final validation as an external test set. The remaining compounds are repeatedly bootstrapped, using leave-randommany-out cross-validation, into multiple training sets and multiple internal test sets (10–30 % of the remaining data set). The training set is used to develop each QSAR model. Chance correlation is assessed using a y-randomization procedure. Once the QSAR models have been tested against the internal and external test sets, statistical exclusion criteria are used to select the best QSAR models. Acceptable models (R2 [ 0.6 for training, internal and external test sets), as a consensus, are then used to predict the activity of novel compounds that fall within the defined applicability domain. Compound selection and data transformation The experimental values for the human AChE bimolecular rate constants of a diverse set of pentavalent OP oxon and CWNA compounds in their active phosphoryl-oxygen (P=O) form were collected from 15 literature sources (Agabekyan et al. 1974, 1977; Berkhamov et al. 1981; Brestkin and Godovikov 1978; Kuzamyshev et al. 1977; Lee and Metcalf 1973; Makhaeva et al. 1998, 2009; Mastryukova et al. 1978; Ol’khovaya et al. 1976a, b;

283

Fig. 1 General QSAR model development flowchart for predicting OP oxon pesticide and CWNA bimolecular rate constants

Plyamovatyi et al. 1997; Raveh et al. 1993; Sadykov et al. 1983; Volkova et al. 1982). The colorimetric determination of activity (Ellman et al. 1961) was used to quantify the bimolecular rate constants for all sources. Inter-laboratory differences were present and accounted for much of the variability in the data set. Duplicate measurements from different laboratories were averaged in the database. To eliminate temperature variability, a single most cited temperature of 25 C was chosen. Furthermore, since the purpose was to develop a human risk assessment, human bimolecular rate constants were selected over those of other species. The data were then transformed to a log10 scale to approximate a normal distribution, improving the interpretability and predictivity of the QSAR model. See Supplemental Material, Table I, for the chemical structures in SMILES notation, the bimolecular rate constants and references used in the development of the database. Descriptor calculations The chemical structures were drawn, three-dimensionally optimized and saved as .MOL files by using ChemSketch (Advanced Chemistry Developments, Inc., Toronto, Ontario, Canada). Three-dimensional (3D) optimization was based on the CHARMM force field parameterization (Brooks et al. 1983), and the stereo bonds on the 3D structure were ambiguous. Each .MOL file contained chemical information necessary to compute CODESSA topological, topochemical and geometric descriptors, while the output files (.OUT) carried the quantum chemical, electrostatic and thermodynamic descriptors computed in AMPAC. To generate the AMPAC.OUT files, each .MOL file was loaded into AMPAC’s graphic user interface (AGUI). The CODESSA output scheme (Job Type: Opt ? Fre, Minimize Energy, Using TRUST, IR Frequencies and Thermodynamic Properties; Method: Model

123

284

AM1, Wavefunction Restricted HF; Properties: Calculate bond orders, Calculate ESP charges, Generate output for CODESSA; Solvent: Solvation None; General: Default; Comment: default settings; Title: default settings) was used in the AMPAC calculation setup, and each .MOL file was submitted and saved as a .OUT file. The .OUT and .MOL files were then loaded into the CODESSA program to calculate the molecular descriptors. See Ivanciuc (1997) for a review of these descriptors and the software itself. The descriptors were then exported to MATLAB 2012A (The MathWorks, Inc., Natick, MA) where they were normalized and centered. Mean centering and unit variance (autoscaling) of the descriptor data were performed to facilitate model interpretation and to ensure that all the descriptors had the same chance to influence the regression. Without any scaling, parameters with a large numerical range may otherwise dominate other variables with small numerical ranges. Orthogonal projection to latent structures (O-PLS) The O-PLS algorithm used in developing this consensus QSAR model was written in MATLAB 2012A (The MathWorks, Inc., Natick, MA), and the algorithm was taken from Trygg and Wold (2002). Each O-PLS model was evaluated for its predictive ability, using the coefficient of determination (R2) metric and the root mean squared error (RMSE). R2 and RMSE were calculated as follows: _ Pn SSE ðy  y i Þ2 i ¼ 1  Pi¼1 R2 ¼ 1  ; n SST Þ2 i¼1 ðyi  y qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi _ RMSE ¼ ðyi  yi Þ2 ;

where SSE is the residual sum of squares, SST is the total _ sum of squares, y is the y mean across all samples, y i are the predicted yi values and yi is the observed y value for sample i (Consonni et al. 2009). A higher R2, approaching 1, indicates more predictive capability, while smaller values, approaching 0, indicate poor predictive capability.

Arch Toxicol (2013) 87:281–289

each descriptor was independently and randomly permuted to remove any correlation between it and the dependent descriptor. The actual O-PLS model loading was then compared with its null distribution of loadings, and if it was greater than or less than 0.5 % (a = 0.01) of the null distribution, it was found to contribute significantly to the model (Fig. 2). The permutation was initially repeated 100 times for each descriptor, and those falling within ±5 % of the border were selected for further evaluation. The permutation was then repeated with the identified borderline descriptors 300 times in order to maximize the probability that they would be placed in the appropriate bin. A number of those descriptors identified as significant were also found to be highly inter-correlated (R2 [ 0.85). Therefore, those descriptors were excluded from the model. Model validation Twenty percent of the database was randomly selected as an external test set. The external test set was then used for an evaluation of the final model’s performance against a set of compounds it has never seen. To select the internal test sets, a leave-random-many-out cross-validation procedure was performed 1,000 times by bootstrapping, randomly selecting with replacement, between 10 and 30 % of the internal database. The remaining training sets were used to train the O-PLS regression models. A tenfold cross-validation procedure was employed on each training set in order to identify the appropriate number of orthogonal components (latent vectors) in the O-PLS model that maximized each tenfold cross-validation test set R2. Each step then resulted in regressing the O-PLS model on the training set and predicting the activity of its internal and external test sets. In order to form consensus predictions, each QSAR model was filtered by selecting acceptable

Descriptor selection An approach to improving QSAR models is the removal or suppression of uninformative descriptors. Often, between 20 and 30 % of QSAR model descriptors have less information than noise (Eriksson et al. 2003); thus, removal can improve the accuracy and robustness of the model. Herein, each descriptor’s loadings, commonly referred to as coefficients, were compared with its null distribution in order to select for significance. The null distribution was determined by refitting the O-PLS model to data sets in which

123

Fig. 2 Example of a randomized probability distribution to test for model descriptor significance. Model coefficients whose p* values fall inside the randomized probability distribution are insignificant, and those falling outside the tail [p* [ or \ ±0.5 % (a = 0.01)] are significant descriptor loadings

Arch Toxicol (2013) 87:281–289 Table 1 Chemical property statistics of OP oxon and CWNA database

285

Molecular property

Max

Min

Number of atoms

74

17

47.77

12.23

Number of carbon atoms

25

3

14.73

4.26

Number of oxygen atoms

6

1

2.77

0.9

Number of nitrogen atoms

2

0

0.54

0.66

Number of sulfur atoms

2

0

1.21

0.65

Number of fluorine atoms

6

0

0.18

0.78

Number of chlorine atoms

2

0

0.14

0.4

Number of bonds

76

16

47.76

12.67

Number of single bonds

73

14

43.4

13.57

Number of double bonds

4

1

Number of triple bonds

Mean

1.46

SD

0.89

1

0

0.004

0.06

24

0

2.89

3.91

Number of rings

4

0

0.99

0.85

Number of benzene rings

4

0

0.48

0.65

Molecular weight (g/mole)

485.64

140.09

334.16

59.72

Number of aromatic bonds

models with an R2 [ 0.6 for training, internal test and external test sets. y-Randomization/permutation The experimental observations were randomly assigned to the set of chemical descriptors in an attempt to observe the action of chance in fitting an O-PLS model to the given data. A y-randomization procedure was repeated 1,000 times to provide a randomized R2 distribution. This distribution was compared with the non-permuted training, internal and external test set R2 distributions to validate the model’s significance. For more information on QSAR model validation, see Eriksson et al. (2003).

constants were collected from 15 peer-reviewed literature sources. A list of some chemical property statistics from the database can be located in Table 1. A chi-square goodnessof-fit test determined that the database was not log-transformed to approximate normality with the estimated mean and variance at the 5 % level of significance. However, the normal probability plot suggests that most of the database is normally distributed with exceptions found at the k1 extremes (Fig. 3). The log10-transformed data set had a mean and standard deviation of 5.20 ± 1.65 M-1 min-1, a range of 7.46 M-1 min-1 and an inter-quartile range of 2.36 M-1 min-1. The mode was found to be 4.08 M-1 min-1, the minimum was found to be 1.48 M-1 min-1, and the maximum was found to be 8.94 M-1 min-1 (Fig. 4).

Applicability domain The leverage that warns of potential extrapolation, called the warning leverage, is typically set at 3*p/n, where n is the number of observations used to fit the model and p is the number of parameters in the model. An AD plot is then commonly used to establish the AD inside a squared area within ±2.0 standard residual deviations (experimental - predicted activity) and the warning leverage (William’s Plot; Gramatica 2007).

Results Compound selection and data transformation A total of 278 unique chemical structures of measured pentavalent OP oxon and CWNA AChE bimolecular rate

Fig. 3 Normal probability plot for the OP bimolecular rate constant library. Black ‘‘o’’ symbols identify experimental probability, while the black line indicates the expected normal probability. Deviations from the black line indicate that the experimental data deviate from normality

123

286

Arch Toxicol (2013) 87:281–289

Fig. 4 OP library box plot. Second and third quartiles are found in the upper and lower boxes. First and fourth quartiles are found within the upper and lower dashed lines. No extreme values are identifiable

Descriptor calculation and selection AMPAC and CODESSA calculated a total of 675 molecular descriptors for each compound. These descriptors were then filtered in MATLAB 2012A and selected for inclusion when high significance and low inter-correlation was present. The O-PLS loading permutation procedure identified the cutoff between significant and insignificant model descriptors. A total of 232 descriptors were found to be significant, while 443 were found to be insignificant. After removing the least significant descriptors that were colinear, a total of 62 descriptors remained. The highest occupied molecular orbital (HOMO)–lowest unoccupied molecular orbital (LUMO) energy gap was found to be the most significant descriptor followed by the HOMO energy and the 1X GAMMA polarizability (DIP).

Fig. 5 Experimental versus predicted log10 AChE-OP bimolecular rate constants (M-1 min-1) using the consensus QSAR model against the external test set

and the warning leverage (0.85) (Fig. 6). Six external compounds were found to be outside the warning leverage that indicates potential model extrapolation. A number of compounds had residuals [2 or \-2, indicating potential outliers or activity cliffs.

Discussion Bimolecular rate constant database Database selection is an extremely important first step in QSAR model development. Those compounds and types of

Model validation Bootstrapping was used to estimate the distribution of training, internal test and external test set R2, as well as the y-randomization R2. Of the 1,000 bootstrap procedures performed, 521 were found to be acceptable (R2 [ 0.6 for training, internal and external test sets). From these, the training, internal test, external test and y-randomized test set R2 correlations were found to be, on average, 0.80, 0.76, 0.66 and -0.31, respectively. The training and external test set RMSE values were also found to be 0.76 and 0.88, respectively. See Fig. 5 for a plot of the external test set experimental versus consensus predicted results. The domain of applicability The AD was estimated using a William’s plot. This plot placed the AD within ±2.0 standard residual deviations

123

Fig. 6 Applicability domain for QSAR model. The warning leverage (hat diagonal cutoff) was set at 3 9 the number of descriptors/the number of compounds (0.85). The external database is represented by solid circles, and the internal database is represented by open circles. Six external compounds are shown to have high leverage

Arch Toxicol (2013) 87:281–289

measurements included in the final database drive the applicability domain and ultimately the usefulness of the model. It is well established that OP and CWNA bimolecular rate constants vary with temperature (Aldridge and Reiner 1972) and species (Wang and Murphy 1982). To eliminate this variability, a single most cited temperature of 25 C was chosen. Furthermore, since the purpose was to develop a human risk assessment, human bimolecular rate constants were selected over those of other species. While the ideal approach may be to use data from one laboratory to minimize inter-laboratory measurement variability, in this case data had to be acquired from multiple peerreviewed literature sources. Care was taken to find an appropriate applicability domain size while simultaneously retaining high-quality data that minimized uncertainties. The database was quite diverse, spanning 7 log units of activity and containing both OP pesticides and CWNAs in their oxon form. Furthermore, the molecular weights ranged from 140.09 to 485.64 g/mole, the maximum number of rings ranged from 0 to 4, and maximum alkyl chain length ranged from 0 to 9 (Table 1). Supplemental Material, Table I, contains the chemical structures in SMILES notation, the bimolecular rate constants and references used in the development of the database. Descriptor selection The QSAR descriptors were calculated from AMPAC and CODESSA that are publicly available from SemiChem, Inc. (Shawnee Mission, KS). A number of papers have already been published showing the efficacy of AMPAC and CODESSA in predicting PBPK-/PD-specific chemical parameters (Katritzky et al. 1998, 2005; Ruark et al. 2011). Here, these descriptors were sufficient to develop an externally validated consensus QSAR model for pentavalent OP oxon pesticides and CWNAs, using an O-PLS algorithm (Trygg and Wold 2002) coded in MATLAB 2012A (The MathWorks, Inc., Natick, MA). The HOMO–LUMO energy gap, the HOMO energy and the 1X GAMMA polarizability (DIP) were found to be statistically significant descriptors. However, it was not possible to interpret them in terms of toxicological significance or biological relevance because correlation does not imply causation. Latent vector regression models, such as the O-PLS regression used herein, project the descriptor space to a new space, reducing the dimensionality but making it difficult to interpret a potential biological mechanism. A model where a mechanistic interpretation is possible could improve overall performance and lead to a better understanding of the biological process. To fully implement a mechanism-based model would go beyond the scope of traditional QSAR analysis.

287

Model validation The results suggest that the O-PLS consensus QSAR model is predictive of the external test set, as shown by a high external R2 (0.66). This indicates that the model performs well on compounds that it has never before seen. However, there are cases in QSAR where mispredictions of activity arise even when there is high model predictivity (Maggiora 2006). This could be due to a rough structure–activity optimization surface or because the QSAR model is misused. These issues, for example, could be explored using a structure–activity landscape index (SALI; Guha and Van Drie 2008) and by paying careful attention to modern QSAR pitfalls and caveats. As long as the QSAR model is not misused, it can be a valuable tool to fill data gaps in the absence of experimental measurements. y-Randomization/permutation While the R2 and RMSE statistics can be used to quantify the model’s performance in the external test set, it is insufficient to statistically validate the model. Therefore, this is often supplemented with a y-randomization procedure, whereby the R2 is compared with those of models built for permuted or randomly shuffled responses (Rucker et al. 2007; Tropsha et al. 2003). In this test, the bimolecular rates were randomly shuffled, and a new QSAR model was developed using the original independent-variable matrix. If the QSAR model obtained in the y-randomization test had high R2 values, it implied that an acceptable QSAR model cannot be obtained for the given data set by the current modeling technique. However, in the present case, the result of the y-randomization indicated that the probability of a chance correlation is very low. Therefore, our QSAR model is identifying relationships with independent variables that are not by statistical chance. The domain of applicability The Organization for Economic Cooperation and Development (OECD) QSAR Validation Principles recommend that a model only be used within its applicability domain (AD; OECD 2007). This can be characterized as the physiochemical, structural or biological space on which the training set of the model has been developed (Jaworska et al. 2005). The AD can help identify the range of new compounds for which the QSAR can be used to make predictions. One way of defining the AD of a QSAR model is according to the leverage of a compound. A compound’s leverage measures its influence on the model and can be calculated for training sets, test sets and novel compounds. Therefore, they are useful in finding compounds that

123

288

influence model parameters of a training set or for checking the AD of the model when predicting novel compounds. The present consensus QSAR model predicted the bimolecular reaction rate, in units of M-1 min-1, between pentavalent OP oxon pesticides and CWNAs and that of the enzyme AChE. All of the compounds selected were in their active oxon form. The consensus QSAR model can only be valid within the domain of its training set and cannot be applied to extrapolation outside of this domain. The AD was quantified with the William’s plot providing assurance of the external test set predictions and to state whether the models assumptions were met. Any new predictions with this consensus QSAR model should only be performed once the proposed molecule is found to reside within the applicability domain.

Arch Toxicol (2013) 87:281–289

reduce experimental animal use. Ultimately, a future in silico QSAR-PBPK/PD toolkit containing QSAR models for bimolecular inhibition, aging, regeneration and thiooxon conversion rate constants will lead to improved animal and human population toxicity assessments for pentavalent OP or CWNA exposures. Acknowledgments This work was supported by the Defense Threat Reduction Agency—Joint Science and Technology Office, Basic and Supporting Sciences Division [2.G 806 08 AHB C]. Conflict of interest of interest.

The authors declare that they have no conflict

References Toxicological significance OP oxon human health risk assessments are difficult due to the necessary integration of toxicological, pathological, absorption, distribution, metabolism and elimination data into predictive PBPK/PD models. These models, as a whole, allow toxicological predictions to be made. However, the models require very specific types of experimental data to fully take advantage of their power, and confidence in model predictions is significantly reduced in cases where parameter values are missing. This work fulfilled one of these data gaps through QSAR prediction of OP-AChE bimolecular rate constants which are the initial driver toward AChE inhibition. Once the active site of AChE is inhibited, it can be further modulated by the OP oxon aging and regeneration processes. In order to adequately assess OP oxon toxicity with a PBPK/PD model, these additional rates would have to be subjected to further QSAR analysis, which is beyond the scope of this work.

Conclusions A consensus QSAR model for pentavalent OP oxons and CWNAs in their active form has been constructed to predict their inhibitory bimolecular rates against human AChE. The model demonstrates a high degree of correlation between the predicted and experimental data and has a high degree of predictive power as shown by a y-randomization test and the internal and external validation strategies. The model applicability domain was also recognized which identifies those compounds that are appropriate for the QSAR model. This QSAR model can be used to complement PBPK/PD modeling by predicting pentavalent OP oxon and CWNA AChE bimolecular rate constants for structures lacking experimental data. It also has the potential to alleviate experimental laboratory costs and

123

Agabekyan RS, Berkhamov MK, Godovikov NN, Kabachnik MI, Ol’khovaya GG (1974) Synthesis and anticholinesterase properties of O-ethyl-S-(Beta-alkylsulfoxyethyl)methylthiophosphonates. Russ Chem Bull 23:1288–1290 Agabekyan RS, Berkhamov MK, Kuzamyshev VM, Muzykantova VN, Bekanov MKh, Kabachnik MI (1977) Reaction of O-ethylS-(Beta-alkylmercaptoethyl)benzylthiophosphonates and their iodomethylates with cholinesterases. Russ Chem Bull 26:1730– 1734 Aldridge WN, Reiner E (1972) Enzyme inhibitors as substrates: interactions of Esterases with esters of organophosphorus and carbamic acids, vol 1. American Elsevier Publishing Co, New York, pp 1–328 Berkhamov MK, Zakharova LM, Grineva LG, Kuzamyshev VM, Ol’khovaya GG, Bekanov MK et al (1981) Synthesis and anticholinesterase activity of S-Beta-aryl-and S-Beta-Benzylmercaptoethyl esters of thioacids of phosphorus. Russ Chem Bull 30:658–663 Brestkin AP, Godovikov NN (1978) Combined inhibition of Cholinesterases by organophosphorus compounds. Russ Chem Rev 47:859–869 Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4:187–217 Consonni V, Ballabio D, Todeschini R (2009) Comments on the definition of the Q2 parameter for QSAR validation. J Chem Inf Model 49:1669–1678 Cronin MTD, Schultz TW (2003) Pitfalls in QSAR. J Mol Struct (Theochem) 622:39–51 Dudek AZ, Arodz T, Galvez J (2006) Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb Chem High T Scr 9:213–228 Ellman GL, Courtney KD, Andres V, Feather-Stone RM (1961) A new and rapid colorimetric determination of acetylcholinesterase activity. Biochem Pharmacol 7:88–95 Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P (2003) Methods for reliability and uncertainty assessment and for the applicability evaluations of classificationand regression-based QSARs. Environ Health Perspect 111: 1361–1375 Fukuto TR (1990) Mechanism of action of organophosphorus and carbamate insecticides. Environ Health Perspect 87:245–254 Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20:269–276

Arch Toxicol (2013) 87:281–289 Gramatica P (2007) Principles of QSAR model validation: internal and external. QSAR Comb Sci 26:694–701 Grigoryan H, Schopfer LM, Peeples ES, Duysen EG, Grigoryan M, Thompson CM, Lockridge O (2009) Mass spectrometry identifies multiple organophosphorylated sites on tubulin. Toxicol Appl Pharm 240:149–158 Guha R, Van Drie JH (2008) Structure-activity landscape index: identifying and quantifying activity cliffs. J Chem Inf Model 48:646–658 Hewitt M, Cronin MTD, Madden JC, Rowe PH, Johnson C, Obi A, Enoch SJ (2007) Consensus QSAR models: do the benefits outweigh the complexity? J Chem Inf Model 47:1460–1468 Ivanciuc O (1997) CODESSA Version 2.13 for Windows. J Chem Inf Comput Sci 37:405–406 Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set in descriptor space: a review. ATLA 33:445–459 Johnson SR (2008) The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J Chem Inf Model 48:25–26 Kabachnik MI, Brestkin AP, Godovikov NN, Michelson MJ, Rozengart EV, Rozengart VI (1970) Hydrophobic areas on the active surface of cholinesterases. Pharmacol Rev 22:355–388 Katritzky AR, Wang Y, Sild S, Tamm T, Karelson M (1998) QSPR studies on vapor pressure, aqueous solubility, and the prediction of water-air partition coefficients. J Chem Inf Comput Sci 38:720–725 Katritzky AR, Kuanar M, Fara DC, Karelson M, Acree WE, Solov’ev VP et al (2005) QSAR modeling of blood:air and tissue:air partition coefficients using theoretical descriptors. Bioorg Med Chem 13:6450–6463 Kuzamyshev VM, Berkhamov MK, Godovikov NN, Agabekyan RS, Kireeva EG, Pegova ZK et al (1977) Reaction of O-ethyl-S(ß-alkylmercaptoethl)-cyclohexylthiophosphonates and their iodomethylates with choline esterases. Russ Chem Bull 26:1471–1476 Lee AH, Metcalf RL (1973) In vitro inhibition of acetylcholinesterase by O, O-dimethyl-S-aryl phosphorothioates. Pesticide Biochem Physiol 2:408–417 Maggiora GM (2006) On outliers and activity cliffs-Why QSAR often disappoints. J Chem Inf Model 46:1535 Makhaeva GF, Filonenko IV, Yankovskaya VL, Fomicheva SB, Malygin VV (1998) Comparative studies of O, O-dialkyl-Ochloromethylchloroformimino phosphates: interaction with neuropathy target esterase and acetylcholinesterase. Neurotoxicology 19:623–628 Makhaeva GF, Aksinenko AY, Sokolov VB, Serebryakova OG, Richardson RJ (2009) Synthesis of organophosphates with fluorine-containing leaving groups as serine esterase inhibitors with potential for Alzheimer disease therapeutics. Bioorgan Med Chem Lett 19:5528–5530 Mastryukova TA, Agabekyan RS, Uryupin AB, Kabachnik MI (1978) Synthesis and anticholinesterase properties of some S-benzhydryl esters of phosphorus monothio acids. Russ Chem Bull 26:2153–2157 Munro NB, Ambrose KR, Watson AP (1994) Toxicity of the organophosphate chemical warfare agents GA, GB, and VX: implications for public protection. Environ Health Perspect 102:18–38

289 OECD Environment Health and Safety Publications. Series on Testing and Assessment No. 69 (2007) Guidance document on the validation of (Quantitative) structure-activity relationship [(Q)SAR] models. Organisation for Economic Co-Operation and Development, Paris Ol’khovaya GG, Agabekyan RS, Berkhamov MK, Godovikov NN, Kabachnik MI (1976a) Synthesis and reaction of O-ethyl S-alkyl phenylthiophosphonates with cholinesterase. Russ Chem Bull 24:1719–1721 Ol’khovaya GG, Agabekyan RS, Berkhamov MK, Godovikov NN, Kabachnik MI (1976b) Synthesis and reaction of O, O-dibutyl S-alkyl thiophosphates with cholinesterase. Russ Chem Bull 24:1722–1724 Plyamovatyi AK, Vandyukova II, Shagidullin RR, Makhaeva GF, Malygin VV, Gorbunov SM (1997) Study of the relationship between spacial structure and anticholinesterse activity of O-phosphorylate oximes. Pharm Chem J 31:199–204 Raevskii OA, Chistiakov VV, Agabekian RS, Sapegin AM, Zefirov NS (1990) Formation of models of the interaction between organophosphate compound structure and their ability to inhibit cholinesterase. Bioorg Khim 16:1509–1522 Raveh L, Grunwald J, Marcus D, Papier Y, Cohen E, Ashani Y (1993) Human butyrylcholinesterase as a general prophylactic antidote for nerve agent toxicity. In vitro and in vivo quantitative characterization. Biochem Pharmacol 45:2465–2474 Ruark CD, Hack CE, Robinson PJ, Gearhart JM (2011) Quantitative structure-activity relationships for organophosphates binding to trypsin and chymotrypsin. J Toxicol Environ Health 74:1–18 Rucker C, Rucker G, Meringer M (2007) Y-randomization and its variants in QSPR/QSAR. J Chem Inf Model 47:2345–2357 Sadykov AS, Dalimov DN, Godovikov NN (1983) Phosphorylated derivatives of alkaloids and nitrogen-containing heterocyclesCholinesterase inhibitors. Russian Chem Rev 52:918–930 Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inf 29:476–488 Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22:69–77 Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemometr 16:119–128 Vale JA (2007) Nerve agents: why they are so toxic and can poisoning from these agents be treated? Toxicology 240:141– 142 Volkova RI, Brestkin AP, Kochetova LM (1982) Stereospecificity of the active site of acylcholinesterases. Biochemistry 47:669–675 Votano JR, Parham M, Hall LH, Kier LB, Oloff S, Tropsha A, Xie Q, Tong W (2004) Three new consensus QSAR models for the prediction of Ames genotoxicity. Mutagenesis 19:365–377 Wang C, Murphy SD (1982) Kinetic analysis of species difference in acetylcholinesterase sensitivity to organophosphate insecticides. Toxicol Appl Pharmacol 66:409–419 Yang RS, Thomas RS, Gustafson DL, Campain J, Benjamin SA, Verhaar HJ et al (1998) Approaches to developing alternative and predictive toxicology based on PBPK/PD and QSAR modeling. Environ Health Perspect 106:1385–1393

123