Medicinal Chemistry

0 downloads 0 Views 1MB Size Report
this respect, predicting the endocrine disrup- ... Predictive models of the endocrine poten- ...... Endocrine disruptome – an open source prediction tool for.

Future

Research Article

Medicinal Chemistry

Docking-based classification models for exploratory toxicology studies on high-quality estrogenic experimental data

Background: The ethical and practical limitation of animal testing has recently promoted computational methods for the fast screening of huge collections of chemicals. Results: The authors derived 24 reliable docking-based classification models able to predict the estrogenic potential of a large collection of chemicals provided by the US Environmental Protection Agency. Model performances were challenged by considering AUC, EF1% (EFmax = 7.1), -LR (at sensitivity = 0.75); +LR (at sensitivity = 0.25) and 37 reference compounds comprised within the training set. Moreover, external predictions were made successfully on ten representative known estrogenic chemicals and on a set consisting of >32,000 chemicals. Conclusion: The authors demonstrate that structure-based methods, widely applied to drug discovery programs, can be fairly adapted to exploratory toxicology studies.

In the last few years, computational toxicology has attracted more and more interest to fulfill the growing demand of rapid and cost effective screening of potentially hazardous chemicals [1–3] . Many important steps forward have been made to fill knowledge gaps when little or no experimental data are available  [4,5] . In December 2006, the European Commission (EC) issued Regulations, Registration, Evaluation, Authorization and restriction of Chemical (REACH) [6] that promotes the use of alternative in silico methods  [3] . Recent legislation in this context is focused strongly on human health endpoints, as these represent a large and preeminent area of the hazard assessment of chemicals [7] . In this respect, predicting the endocrine disruptor potential of chemicals and, more specifically, their ability to interfere with the estrogen receptors (ERs) are of utmost relevance for the protection of human health and the environment [8] . Predictive models of the endocrine potential of congeneric molecular series of known EDCs already have been developed [9–12] . However, most of these models were derived using typical quantitative structure–activity relationship (QSAR) strategies requiring

10.4155/fmc.15.103 © 2015 Future Science Ltd

structural similarity to known EDCs. In this respect, such approaches are a double-edged sword, ensuring reliable predictions only for chemicals similar to those already known. The current availability of x-ray-solved target structures can be employed to tackle this drawback. Developed for and mainly used in drug discovery programs, molecular docking approaches can add critical information that enables rapid assessments of the endocrine disruption potential of structurally unrelated substances and elucidating the molecular recognition patterns behind their actions [13] . Molecular docking is a well-known computational method mainly used to predict both molecular poses and rank scores for large numbers of chemicals. Molecular poses are driven by a preferred conformation adopted for binding, whereas the scoring represents a rough energetic estimation of the strength of the established interactions [14] . The increased power of computer systems is making molecular docking a viable option to analyze large numbers of chemicals. For instance, the virtual screening of large pharmaceutical chemical libraries has been accomplished successfully using molecular docking [15,16] . Recent approaches in drug discovery, such as those

Future Med. Chem. (Epub ahead of print)

Daniela Trisciuzzi1, Domenico Alberga2, Kamel Mansouri3, Richard Judson3, Saverio Cellamare1, Marco Catto1, Angelo Carotti1, Emilio Benfenati4, Ettore Novellino5, Giuseppe Felice Mangiatordi*,1 & Orazio Nicolotti**,1 Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari ‘Aldo Moro,’ Via E. Orabona, 4, Bari I-70126, Italy 2 Dipartimento Interateneo di Fisica ‘M. Merlin,’ Università degli Studi di Bari ‘Aldo Moro,’ INFN, Via E. Orabona, 4, Bari I-70126, Italy 3 National Center for Computational Toxicology, US Environmental Protection Agency, 109 T.W. Alexander Drive, Research Triangle Park, NC 27711, USA 4 IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via Privata Giuseppe La Masa, 19, 20156 Milano, Italy 5 Dipartimento di Farmacia - Università degli Studi di Napoli ‘Federico II’ Corso Umberto I, 40 – 80138 Napoli, Italy *Author for correspondence: Tel.: +39 080 544 2551/2765; Fax: +39 080 5442230; [email protected] **Author for correspondence: Tel.: +39 080 544 2551/2765; Fax: +39 080 5442230; [email protected] 1

part of

ISSN 1756-8919

Research Article  Trisciuzzi, Alberga, Mansouri et al.

Key terms Computational toxicology: A relevant research area whose ultimate mission is that of protecting human health and the environment from risks posed by chemicals. Docking-based classification models: Predictive models encoding the wealth of physicochemical information contained in the native protein structures. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project supervised by the US Environmental Protection Agency. Sensitivity (SE): Proportion of chemicals, experimentally proved to be binders, which are correctly classified.

based on scaffold hopping and drug repurposing, are derived mostly from the smart application of molecular docking. Despite successful use of molecular docking models in the pharmaceutical area, this technique has been used infrequently for regulatory purposes where screening large collections of chemicals for possible biological activity is high priority [3] . Inspired by the recent work of Kolšek et al.  [17] , the authors adapted molecular docking to the needs of computational toxicology by deriving ad hoc docking-based classification models to discern potential estrogenic from nonestrogenic activity. The objective was to illustrate how experts should evaluate docking performances for toxicological and regulatory purposes, with the goal of minimizing the number of FNs (i.e., estrogenic substances incorrectly classified as nonestrogenic). Perturbation of ER signaling may lead to altered sexual function, fertility and adverse developmental effects [18–20] . An implicit advantage of using docking-based models is that physicochemical information on the binding pocket of the biological target can be employed to enlarge the applicability domain beyond the narrower boundaries of classical QSAR-like models [21–23] . Importantly, this paper describes a set of models developed within the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) supervised by the US Environmental Protection Agency (US EPA) [Mansouri et al. CERAPP: collaborative estrogen receptor activity prediction (2015), Submitted] ; [Judson et al. Integrated model of chemical perturbations of a biological pathway using 18 in vitro high-throughput screening assays for the estrogen receptor (2015), Submitted] in collaboration with 17 groups

in the USA and Europe. CERAPP is a large-scale modeling initiative aimed at demonstrating the use of predictive computational models in high-throughput screening (HTS) studies. In this respect, CERAPP aimed to derive a consensus through all the models submitted by the participants. The models were developed using a curated collection [24] of high-quality ER signaling data for 1677 chemicals from the US

10.4155/fmc.15.103

Future Med. Chem. (Epub ahead of print)

EPA’s ToxCast program as a training set. The authors evaluated and compared the performances of dockingbased classification models derived using grid-based ligand docking with energetics (GLIDE) [25] and genetic optimization for ligand docking (GOLD) [26] . Particular attention was paid to describe the appropriate use of statistical parameters and precautions that assessed the goodness of docking-based classifications for exploratory toxicology studies [27] . Materials & methods Training set

The 3D training dataset (hereafter referred to as EPA-ERDB) consists of 1677 chemical structures shared by US EPA. The list with the IUPAC InChI (International Chemical Identifier) codes and the binding class of each compound of the training set is enclosed as Supplementary Material. In addition, the profile relative to a number of relevant physicochemical properties (i.e., molecular weight, log P, hydrogen bond acceptor atoms, hydrogen bond donor atoms, number of rotatable bonds and number of rings) is shown in Supplementary Figure 1. For each chemical, the estroegenic/nonestrogenic binding was quantified based on concentration–response data from 18 high-throughput assays exploring multiple sites in the mammalian ER pathway. These assays were a combination of biochemical and cell-based in vitro assays and probe perturbations of the ER pathway at multiple sites: receptor binding, receptor dimerization, chromatin binding of the mature transcription factor, gene transcription and changes in ER-induced cell growth kinetics [Judson et al. Integrated model of chemical perturbations of a biological pathway using 18 In Vitro high-throughput screening assays for the estrogen receptor (2015), Submitted] .

A score of 0.01, roughly analogous to an IC50 of 100 μM, was selected to discriminate the 237 estrogenic chemicals, 14.13%, from the 1440 nonestrogenic chemicals, 85.87%, having very low or no ER bioactivity (i.e., a score ≤0.01). In this respect, if a chemical was an active agonist or antagonist, it was also considered as an active binder, or alternatively, if a chemical was an nonagonist and nonantagonist, it was considered also as nonbinder. The aim of the authors was that of deriving reliable classification models able to categorize estrogenic and nonestrogenic substances irrespective of their agonist/antagonist action. Molecular docking

Eight ER crystal structures were retrieved from the Protein Data Bank (PDB) for docking simulations. All four possible ER classes were considered: ER-α bound to agonist, ER-α bound to antagonist, ER-β bound to agonist and ER-β bound to antagonist. For each of

future science group

Docking-based classification models for exploratory toxicology studies Research Article

these, two crystal structures were retrieved from PDB: one previously considered by Zhang et al.  [28] for the identification of new EDs based on docking simulations, and the other recently selected by Kolšek et al. [17] based on their capability to provide highly predictive docking-based classification models. On the other hand, the choice of eight different x-ray solved crystals (comprising ER-α and ER-β complexes for agonists and antagonists) was made to possibly cover a broader spectrum of possible biological actions of compounds comprised within the EPA training dataset. Indeed, it is acknowledged that the distinct biological activities (ER-α agonist, ER-α antagonist, ER-β agonist, ER-β antagonist) rely on the different receptor conformation assumed upon ligand binding, the latter being crucial for the interactions with co-activators [29] . Building on these criteria, the following structures were considered as targets for docking studies: • Complex of ER-α and agonist (R,R)-5,11-cisdiethyl-5,6,11,12-tetrahydrochrysene-2,8-diol (PDB ID: 1L2I; resolution 1.95 Å) [30] ; • Complex of ER-α and agonist estradiol (PDB ID: 1A52, resolution 2.80 Å) [31] ; • Complex of ER-α and antagonist GW368 (PDB ID: 3DT3, resolution 2.40 Å) [32] ; • Complex of ER-α and antagonist (2S,3R)-2-(4(2-(piperidin-1-yl)-ethoxy) phenyl)-2,3-dihydro3-(4-hydroxyphenyl)-benzo[b]  [1,4] -oxathiin-6-ol (PDB ID: 1SJ0, resolution 1.90 Å) [33] ; • Complex of ER-β and agonist estradiol (PDB ID: 3OLS, resolution 2.20Å) [34] ; • Complex of ER-β and agonist 4-(4-hydroxyphenyl)1-naphthaldehyde oxime (PDB ID: 2NV7, resolution 2.10 Å) [35] ; • Complex of ER-β and antagonist raloxifene (PDB ID: 1QKN, resolution 2.25 Å) [36] ; • Complex ERβ and antagonist (R,R)-5,11-cisdiethyl-5,6,11,12-tetrahydrochrysene-2,8-diol (PDB ID: 1L2J, resolution 2.95 Å) [30] . For the sake of completeness, the chemical structures of x-ray solved cognate ligands are reported in Supplementary Table 1. The 3D conformations of the 1677 chemicals in the training dataset were subjected to LigPrep (Schrodinger Suite) [37] to properly generate all the tautomers and ionization states at a pH value of 7.0 ± 2.0. Docking studies were performed on the 2019 structures obtained for all eight ER crystal structures. Protein structures were prepared using the Protein

future science group

Preparation Wizard [38] available from Schrodinger Suite v2014–4. The obtained files were used for docking simulations performed by both GLIDE v.6.5 [25] , which is part of the Schrodinger Suite, and GOLD v.5.2 [26] . During the docking process, the receptor protein was held fixed, whereas full conformational flexibility was allowed for the ligands. For GLIDE simulations, the default Force Field OPLS_2005 [39] , all the default settings for standard precision and a cubic grid having an edge of 16 Å centered on the center of mass of the cognate ligands were used. For GOLD simulations, a spherical grid having a radius of 8 Å centered on the center of mass of the cognate ligands was used; the configuration file template for Nuclear Hormone Receptors, available in GOLD, was used, which implements ChemScore as fitness function [40] . The reliability of our simulation protocols was preliminary challenged by computing the root mean square deviation (RMSD) values (shown in Supplementary Table 2). In this respect, for each selected x-ray complex, the Cartesian coordinates of corresponding heavy atoms of the poses resulting from docking and that available in the x-ray structure were compared. Performance evaluation

The ability of the selected docking protocols to discern binders from nonbinders was assessed using a statistical confusion matrix (see Supplementary Table 3), which includes information about experimental and predicted matches and mismatches returned for each classification system. In particular, the confusion matrix accounts for the number of true positives (TPs) that are the number of experimental positives cases that were correctly identified; true negatives (TNs) that are the number of experimental negatives cases that were classified correctly; false positives (FPs) that are the number of experimental negatives cases that were incorrectly classified as positive; false negatives (FNs), the number of experimental positives cases that were incorrectly classified as negative. Minimizing the occurrence of FNs is of importance for regulatory purposes where the prediction of dangerous substances as safe should be avoided. It is thus possible to define the following parameters:

SE =

TP TP + FN

SP =

TN TN + FP

and

Sensitivity (SE) ranges from 0–1 and estimates the proportion of chemicals, experimentally proved to be binders that are identified correctly. Once a given

www.future-science.com

10.4155/fmc.15.103

Research Article  Trisciuzzi, Alberga, Mansouri et al.

Key terms Estrogenic potential: Ability of chemicals to interfere with the estrogen receptors. Chemical prioritization: Designating compounds for additional experimental testing.

SE threshold is chosen by the user, data exceeding that value are considered as TPs if properly classified, otherwise as FPs. An SE threshold =0.25 can be set to capture early occurrence of positive cases [17] . Specificity (SP) measures the proportion of chemicals experimentally proved not to be binders that are identified correctly (under a given SP threshold). Also, SP ranges from 0–1, but here, the lower the threshold, the smaller the SP value. SE represents the TP rate, whereas the term 1-SP represents the FP rate. Both the 1-SP and SE values represent the coordinates to graph the receiver operating characteristic (ROC) curve and, thus, to calculate the corresponding AUC. The latter normally is used to provide, at first glimpse, an idea of the performances in classification. Indeed, the computed curve is placed between two straight lines, one passing through the points (0,0) and (1,1), and the other through the points (0,1) and (1,1). The former indicates a random (AUC = 0.5) classifier, and the latter indicates an ideal (AUC = 1) classifier. In other words, the AUC values stand for the probability that a classification model will rank a randomly chosen positive instance (a binder chemical) higher than a randomly chosen negative one (a nonbinder chemical), thus providing an evaluation of the classifier quality independent from a particular threshold. Next, docking performance usually is evaluated using the enrichment factor (EF). It refers to the percentage of known binders found at a given percentage of the ranked database and can be computed with the following equation: HSCR EF = H TOT

#

EFmax values as close as possible. Although the threshold at EF1% is by far the one more important [17] , for the sake of completeness we reported also the percentage of binders at 20%, which represents a well-known threshold to check the late-stage database screening  [41] . As shown in Supplementary Table 4, such values are in agreement with the average value of 52.9% of actives at the top 20% of database screened in case of benchmark data, described in the milestone paper of Shoichet and Irwin [42] , relative to 40 properly selected pharmaceutical targets. The thresholds for defining the classes can be set on the basis of the desired SE values. In the present study, two SE values equal to 0.25 and to 0.75 were set as thresholds to define, for each ER crystal, three probability binding classes as follows: • SE ≤0.25, the class with high probability of binding (i.e., binder molecules). • SE >0.75, the class with low probability of binding (i.e., nonbinder molecules). • 0.25 < SE ≤0.75, the class with medium probability of binding (i.e., suspicious molecules). Such classes are expected to designate the estrogenic potential or nonestrogenic potential of a given substance while further considerations about a likely agonist/antagonist action is beyond the scope of the present investigation. The generation of prediction classes was defined automatically by using an in-house-developed Python script. Similarly, the energetic docking scores better approximating the SE-based thresholds equal to 0.25 and 0.75 were calculated. Once the SE-based thresholds have been established, the goodness of the classification can be measured using the following parameters:

DTOT DSCR

NPV =

TN TN + FN

PPV =

TP TP + FP

and where HSCR is the number of binders recovered at a specific percentage level of the binders/nonbinders, HTOT is the total number of binders for a given target, DSCR is the number of chemicals screened at a specific percentage level of the database and DTOT is the total number of chemicals in the database. Notably, we computed the EF at the early 1% of the ranked dataset (i.e., EF1%). The ideal value of the EF (EFmax) is obtained by dividing the total number of chemicals by the total number of binders. In the case of the EPA-ERDB, the theoretical value of EFmax = 7.1. Predictive docking-based classification models are expected to return EF1% and

10.4155/fmc.15.103

Future Med. Chem. (Epub ahead of print)

At a given threshold, PPV is related to the probability that a chemical predicted as a binder (over-threshold) is actually a binder, whereas NPV is related to the probability that a chemical predicted as a nonbinder (under-threshold) is actually a nonbinder. In our studies, NPV and PPV values were computed for two SE thresholds, namely SE equal to 0.25 and to 0.75. It is worthy to note that the EPA-ERDB training set included a relatively small number of chemicals (237,

future science group

Docking-based classification models for exploratory toxicology studies Research Article

+ LR = and

SE 1 − SP

− −LR = 1 SE SP The greater the +LR is at a given threshold, the better the performance of the classification model. For example, a +LR value = 6 would indicate a sixfold increase in the probability of a chemical being an ER binder with respect to the initial condition; similarly, a -LR = 0.6, indicates that, for an under-threshold chemical, the probability of its being a binder is equal to 6/10 with respect to that at the initial condition. In other words, the larger the +LR value (or the lower the -LR), the more informative is the classification model. Note that likelihood ratios are independent from the data distribution within the training set. Results & discussion Model evaluation

All the results herein discussed were obtained by subjecting to molecular docking a 3D training dataset consisting of 1677 chemicals provided by US EPA. The outcome of molecular docking of the 1677 chemicals resulted in an uneven distribution of the biological scores, with 237 (14.13%) chemicals with ER binding potential from 1440 (85.87%) nonbinding chemicals is shown in Figure 1. Two popular software for drug discovery, namely GLIDE v.6.5 [25] and GOLD v.5.2 [26] , were used. Note that, unless differently specified, binders are assumed to be those chemicals that bind to the ER irrespective of their agonist and/or antagonist activity. The entire EPA-ERDB was docked into the binding site of eight crystal structures (four ER-α and four ER-β). Using the GLIDE and GOLD software, 16 different docking-based models were obtained. In addition, eight consensus models also were derived by sorting all the docked EPA-ERDB using a toprank approach. As a whole, we derived 24 dockingbased models whose goodness was first evaluated by calculating AUC and EF1% . For the ease of representation, all the graphs are shown in Figure 2 while details about EF1% values and AUC are reported in Table 1. Interestingly, the AUC values range from 0.63 (crystal 1QKN, software GLIDE) to 0.72 (crystal

future science group

1440 60 Number of molecules

14.13%) experimentally designated as binders. Thus, the strong asymmetry of data prompted us to compute the positive (+LR) and the negative likelihood ratio (-LR) for each of the SE considered thresholds. These parameters can be calculated as follows:

50 40 30 20 10 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Biological score

Figure 1. Histogram showing the uneven distribution of chemicals’ biological score in the training set database. Binders (237 chemicals with bioactivity score >0.01) and nonbinders (1440 – biological score ≤0.01) are depicted by gray and red bars, respectively.

3OLS, consensus model). In this respect, it should be noted that GOLD returns AUC values slightly higher than those of GLIDE. Such a trend is confirmed for all the considered crystal ER structures. As expected, the use of a consensus strategy returns higher AUC values although the gain is not always appreciable. An opposite trend is observed if the EF1% values are considered. In all cases, GLIDE performs better than GOLD. Notably, the former achieves values ranging from 4.6 (1QKN) to 6.2 (1L2I), whereas the latter achieves values ranging from 2.5 (1QKN) to 5.4 (3OLS). In this respect, the proximity of EF1% to EFmax is an indirect proof of the predictive power of the obtained classification models. Unequivocally, GLIDE detects a higher number of binders in the earliest fraction of the rank despite the lower AUC values. Consensus strategy is ineffective to improve EF1% , irrespective of the considered crystals. For each ER crystal, the values of docking scores approximating the SE thresholds of 0.25 and 0.75 were thus calculated and reported in Table 2. Considering each ER crystal, chemicals with docking scores higher than that best approximating a value of SE of 0.25 are, according to the model, binders; chemicals, with docking scores lower than that best approximating a SE of 0.75, represent nonbinders; the class comprising all other chemicals designates the substances that are difficult to predict and could be object of chemical prioritization for additional experimental testing. Note that, for the sake of clarity, the threshold referring to the docking score best approximating SE = 0.25 and that referring to the docking score best approximating SE = 0.75 will be referred

www.future-science.com

10.4155/fmc.15.103

3DT3 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 GLIDE 0.2 GOLD 0.1 Consensus 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity 1SJ0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 GLIDE 0.2 GOLD 0.1 Consensus 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity

Sensitivity Sensitivity

1A52 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 GLIDE 0.2 GOLD 0.1 Consensus 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity

Sensitivity

1L2I 1.0 0.9 0.8 0.7 0.6 0.5 0.4 GLIDE 0.3 GOLD 0.2 Consensus 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity

Sensitivity

Sensitivity

Sensitivity

Sensitivity

Sensitivity

Research Article  Trisciuzzi, Alberga, Mansouri et al. 3OLS 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 GLIDE 0.2 GOLD 0.1 Consensus 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity 2NV7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 GLIDE 0.3 0.2 GOLD 0.1 Consensus 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity 1QKN 1.0 0.9 0.8 0.7 0.6 0.5 0.4 GLIDE 0.3 0.2 GOLD Consensus 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity 1L2J 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 GLIDE 0.2 GOLD 0.1 Consensus 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 - specificity

Figure 2. ROC curves. ROC curves derived from ER-α (PDB entries: 1L2I, 1A52, 3DT3 and 1SJ0) and ER-β structures (PDB entries: 3OLS, 2NV7, 1QKN and 1L2J) are shown on the left and right hand side, respectively. For the sake of clarity, additional details about AUC and EF1% are reported in Table 1. EF: Enrichment factor; ER: Estrogen receptor; GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking; PDB: Protein Data Bank; ROC: Receiver operating characteristic. 

10.4155/fmc.15.103

Future Med. Chem. (Epub ahead of print)

future science group

Docking-based classification models for exploratory toxicology studies Research Article

Table 1. Enrichment factor1% and AUC values relative to the Protein Data Bank entries 1L2I, 1A52, 3DT3, 1SJ0, 3OLS, 2NV7, 1QKN and 1L2J obtained using Grid-based ligand docking with energetics, Genetic optimization for ligand docking and consensus models. PDB code

GLIDE

GOLD

Consensus

 

EF1% 

AUC

EF1% 

AUC

EF1% 

AUC

1L2I

6.2

0.69

4.1

0.71

5.2

0.72

1A52

6.0

0.68

5.0

0.70

6.0

0.71

3DT3

6.0

0.68

3.7

0.69

5.2

0.70

1SJ0

6.1

0.65

3.7

0.68

3.9

0.68

3OLS

5.8

0.69

5.4

0.70

5.8

0.72

2NV7

5.6

0.67

5.0

0.69

5.4

0.71

1QKN

4.6

0.63

2.5

0.65

3.5

0.67

1L2J

4.7

0.67

2.9

0.69

4.6

0.69

AUC: Area under curve, EF: Enrichment factor; GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking; PDB; Protein Data Bank. 

to as threshold SE = 0.25 and threshold SE = 0.75, respectively from now on. As far as the first threshold (SE = 0.25) is concerned, PPV ranges from 25.8 (1QKN, GOLD) to 63.9 (3OLS, consensus). For all ER crystal structures, the ability to minimize FPs is higher with GLIDE with respect to GOLD, in agreement with the already discussed EF1% factors. The obtained results are shown in Table 3.. Importantly, an opposite trend can be detected if the second threshold (SE = 0.75) is considered. GOLD returns PPV values higher than GLIDE. In other words, GLIDE ensures better performances in terms of ability to minimize FPs if we limit our interest to the upper part of the ranking. Note that the use of a consensus strategy causes a PPV improvement only in a few cases. NPV values range from 87.7 (1QKN, GOLD) to 88.7 (3DT3 and 2NV7, GLIDE) for the first threshold (SE = 0.25) and from 89.7 (1SJ0, GLIDE) to 94.2 (1L2I, GOLD) for the second

threshold (SE = 0.75). Note that the closeness of NPV values (at least, compared with the corresponding PPV values characterized by larger spreads) does not necessary mean that the performances of the different models are comparable. Instead, this apparent discrepancy between computed PPV and NPV might be because of the uneven data distribution. Because of the small proportion of binders in the training set, NPV values may be similar even comparing models with very different performances. On the contrary, models having very similar performances may result in significantly different PPV values. To overcome this intrinsic limitation of this unbalanced dataset, positive (+LR) and negative (-LR) likelihood ratios were also computed for each threshold. The obtained results are shown in Table 4. In this respect, the -LR values, instead of NPV, unveils how the differences among models are not negligible in terms of their ability to place nonbinders in the right class. Regarding

Table 2. Docking scores related to the applied SE-based thresholds (docking scores of GOLD and GLIDE are expressed as kcal/mol and kJ/mol, respectively). PDB code

SE = 0.25

SE = 0.75

 

GLIDE

GOLD

GLIDE

GOLD

1L2I

-7.52

-30.42

-5.64

-22.25

1A52

-7.75

-30.44

-5.33

-22.15

3DT3

-8.34

-31.68

-5.74

-22.92

1SJ0

-7.59

-30.65

-4.62

-22.76

3OLS

-7.68

-30.97

-5.55

-21.40

2NV7

-7.69

-31.18

-5.70

-21.46

1QKN

-7.53

-29.91

-5.37

-22.08

1L2J

-7.71

-30.23

-5.89

-22.35

GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking; PDB: Protein Data Bank; SE: Sensitivity.

future science group

www.future-science.com

10.4155/fmc.15.103

Research Article  Trisciuzzi, Alberga, Mansouri et al.

Table 3. Negative predictive values and positive predictive values computed for two thresholds (SE = 0.25 and SE = 0.75). PDB code  

SE = 0.25 GLIDE

GOLD

SE = 0.75 Consensus

GLIDE

GOLD

Consensus

PPV

NPV

PPV

NPV

PPV

NPV

PPV

NPV

PPV

NPV

PPV

NPV

1L2I

57.3

88.1

43.5

88.4

56.3

88.0

22.7

92.7

27.4

94.2

27.9

93.8

1A52

54.6

88.4

41.4

88.4

52.2

88.4

18.5

91.3

24.9

93.8

25.2

93.7

3DT3

62.5

88.7

34.7

88.2

60.0

88.6

19.0

91.9

23.6

93.6

23.1

93.4

1SJ0

46.2

88.5

28.3

87.8

40.4

88.3

16.2

89.7

24.5

93.8

21.6

93.0

3OLS

58.1

88.3

47.6

88.5

63.9

88.3

20.5

92.2

24.2

93.7

27.4

93.9

2NV7

49.0

88.7

41.1

88.4

47.5

88.5

18.7

92.1

21.9

93.1

23.7

93.7

1QKN

35.3

88.3

25.8

87.7

33.3

88.1

17.2

90.9

22.0

93.2

22.4

93.3

1L2J

40.7

88.1

33.3

88.1

37.5

87.9

21.3

92.7

24.1

93.7

24.7

93.5

GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking; NPV: Negative predictive values; PDB: Protein Data Bank; PPV: Positive predictive values; SE: Sensitivity.

the threshold SE = 0.75, for instance, substantial differences can be found between GLIDE and GOLD docking-based models. An example is represented by -LR values computed using 1A52 as crystal structure, ranging from 0.56 (GLIDE) to 0.39 (GOLD). Such a trend is always confirmed for all the crystals. Again, in most cases, the use of the consensus strategy does not lead to any improvement in the performance of the model. This is true considering both -LR and +LR values. The results presented provide evidence that evaluation of classification models strongly depends on the purpose one wishes to pursue in the screening procedure. For high confidence in identification of new compounds with ER activity (few FPs), +LR values computed at the first threshold (SE = 0.25) are of utmost importance. For first tier toxicological

screening, where higher rates of FPs can be tolerated (few FNs), an -LR can be set at SE = 0.75. In the former case, the GLIDE may be more useful, and, for the latter case, the GOLD may be more appropriate. In all cases, the use of a consensus strategy was ineffective. Importantly, a disagreement between +LR and -LR also can be found as a function of the considered different crystals. This is clearly shown in Figure 3 where all the crystals have been ranked according to their performance on the two considered trials, +LR (SE = 0.25) and -LR (SE = 0.75). These results revealed that those docking-based models performing well for predicting putative new binders do not perform necessarily the same for classifying nonbinders. This is true considering both GLIDE- and GOLD-based models. Several important conclusions can be drawn from the current research

Table 4. Positive and negative likelihood ratios computed for two thresholds (SE = 0.25 and SE = 0.75). PDB code  

SE = 0.25 GLIDE

GOLD

SE = 0.75 Consensus

+LR

-LR

+LR

-LR

+LR

-LR

1L2I

7.69

0.77

4.56

0.79

7.56

1A52

7.44

0.77

4.44

0.79

7.14

3DT3

9.84

0.76

3.23

0.81

1SJ0

5.15

0.78

2.34

3OLS

8.11

0.77

5.92

GLIDE +LR

-LR

0.77

1.66

0.77

1.37

6.88

0.77

0.83

4.49

0.78

9.46

GOLD +LR

-LR

0.44

2.28

0.56

1.99

1.42

0.52

0.79

1.19

0.76

1.49

Consensus +LR

-LR

0.36

1.99

0.39

0.39

2.04

0.39

1.86

0.41

1.75

0.43

0.67

1.96

0.40

1.86

0.41

0.50

1.92

0.40

2.54

0.35

2NV7

5.99

0.78

4.13

0.79

5.46

0.78

1.39

0.53

1.68

0.44

1.87

0.41

1QKN

3.25

0.81

2.05

0.85

3.01

0.81

1.26

0.61

1.70

0.44

1.73

0.43

1L2J

4.05

0.79

3.01

0.81

3.45

0.80

1.63

0.46

1.91

0.40

1.92

0.40

GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking; +LR: Positive likelihood ratios; -LR: Negative likelihood ratios; PDB: Protein Data Bank; SE: Sensitivity.

10.4155/fmc.15.103

Future Med. Chem. (Epub ahead of print)

future science group

Docking-based classification models for exploratory toxicology studies Research Article

GLIDE 3DT3

GOLD 3OLS

-LR (SE = 0.75) +LR (SE = 0.25)

3OLS

-LR (SE = 0.75) +LR (SE = 0.25)

1L2I

1L2I

1A52

1A52

2NV7

2NV7

3DT3

1SJ0

1L2J

1L2J

1SJ0

1QKN

1QKN 0

2

4

6

8

10

12

14

16

Ranking position

0

2

4

6

8

10

12

14

16

Ranking position

Figure 3. Summary of the crystal rankings for the two trials: +LR (SE = 0.25) and -LR (SE = 0.75). GLIDE based (A) and GOLD based (B) models. GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking; SE: Sensitivity.

findings. Either using GOLD or GLIDE, the 1L2I docking-based model was best for correctly identifying nonbinding and binding chemicals (third position in the GLIDE ranking, second position in the GOLD ranking). Our data suggest the use of this crystal to screen an external set irrespective of the docking software (GOLD or GLIDE) available to the user. Building on these data, interested users or stakeholders easily can predict the estrogen potential of any given chemical by comparing the calculated docking scores of their own chemicals with the values explicitly shown in Table 2 for each of our docking-based classification models. Importantly, the computational protocol described in this paper can help to make transparent and informed decisions. In fact, one can have an index of the probability of estrogenic activity (+LR, SE = 0.25) or of the absence of such activity (-LR, SE = 0.75) by looking at docking scores reported in Table 2. In addition, the eight classification models could be also employed as independent multiple alerts to flag a potential estrogenic chemical. Needless to say that, for a given substance, a high concordance shared by different classification models will indeed increase the confidence in prediction. In this regard, we counted the number of chemicals simultaneously recognized as binders (SE ≤0.25) and those simultaneously recognized as nonbinders (SE >0.75) by all the eight classification models. In the former case, a full concordance through the eight models was obtained for 78.26% and 40.00% of binders using GLIDE and GOLD, respectively. In the latter, a full concordance through the eight models was obtained for 91.95 and 94.91% of decoys using GLIDE and GOLD, respectively. The interested reader is referred to Supplementary Table 5 for additional details

future science group

relative to the same analysis focused on the models based only on the agonist (i.e., 1L2I, 1A52, 3OLS, 2NV7) or antagonist (i.e., 3DT3, 1SJ0, 1QKN, 1L2J) targets. The high predictive power of the developed models was also supported by a further internal validation. Notably, the performance of each model was assessed accounting for 37 reference compounds (binders and decoys) included in the training set. As shown in the traffic-light scheme reported in Table 5, a good match between experimental and calculated binding classes can be appreciated for most of the reference compounds. In particular, the different colors allow appreciating the level of concordance obtained by multiple models. Note that the predictions were made by comparing the calculated score with that approximating the SE thresholds of the classification models reported in Table 2. In this respect, compounds having docking scores better than that approximating the value of SE = 0.25 are predicted as binders (in red in Table 5) ; substances having docking scores worse than that approximating the value of SE = 0.75 are predicted as nonbinders (in green in Table 5) ; all the others are considered as suspicious (in yellow in Table 5) . It is noteworthy that, although software developed and widely used for drug design purposes, GLIDE and GOLD have been proved herein to also be effective for toxicological purposes. Indeed, in addition to the good performance of the models, another point should be noted: only a few chemicals were excluded by the software for docking simulations (see Supplementary Table 6 ), despite the huge structural variability of the used training set. As far as this important aspect is concerned, GOLD performs slightly better than GLIDE, with the number

www.future-science.com

10.4155/fmc.15.103

Research Article  Trisciuzzi, Alberga, Mansouri et al.

Table 5. Traffic light scheme illustrating the performances of the classification models with respect to 37 reference compounds within the training set. Reference Compound

Experimental Software 1L2I

1A52 3DT3 1SJ0 3OLS 2NV7 1QKN 1L2J

Meso-Hexestrol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

17 β-Estradiol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Estrone

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Genistein

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Bisphenol B

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Apigenin

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Daidzein

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

4-Cumylphenol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Kaempferol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

o,p’-DDT

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

17-Methyltestosterone

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

5 α-Dihydrotestosterone

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

4-(1,1,3,3-Tetramethylbutyl) phenol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Ethylparaben

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Methoxychlor

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Butyl benzyl phthalate

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

Kepone

GOLD

 

GLIDE

 

 

 

 

 

Chrysin

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

4-Nonylphenol

GOLD

 

 

 

 

 

 

 

  /

/

/

  /

/ /

 

/ /

/  

Note that the use of different colors allow appreciating the level of concordance obtained by multiple models. Red: chemicals predicted as binders; Yellow: chemicals predicted as suspicious; Green: chemicals predicted as nonbinders. Slash: undocked chemicals. GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking.

10.4155/fmc.15.103

Future Med. Chem. (Epub ahead of print)

future science group

/

Docking-based classification models for exploratory toxicology studies Research Article

Table 5. Traffic-light scheme illustrating the performances of the classification models with respect to 37 reference compounds within the training set (Cont.). Reference Compound

Experimental Software 1L2I

1A52 3DT3 1SJ0 3OLS 2NV7 1QKN 1L2J

 

GLIDE

 

 

 

 

 

 

 

Progesterone

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

p,p’-DDE

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Corticosterone

GOLD

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

Fenarimol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Dibutyl phthalate

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Dicofol

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Di(2-ethylhexyl) phthalate

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

Flutamide

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Atrazine

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Procymidone

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Linuron

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Reserpine

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

Spironolactone

GOLD

 

 

 

 

GLIDE

 

 

 

Haloperidol

GOLD

 

 

 

 

 

 

GLIDE

 

 

 

 

 

Hydroxyflutamide

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Phenobarbital sodium

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

Ketoconazole

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

Cycloheximide

GOLD

 

 

 

 

 

 

 

 

GLIDE

 

 

 

 

 

 

 

/

/  

  /

/

/

  /

/

 

/

/

/

/  

/  

/  

/  

/

/

/

/

 

 

Note that the use of different colors allow appreciating the level of concordance obtained by multiple models. Red: chemicals predicted as binders; Yellow: chemicals predicted as suspicious; Green: chemicals predicted as nonbinders. Slash: undocked chemicals. GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking.

future science group

www.future-science.com

10.4155/fmc.15.103

Research Article  Trisciuzzi, Alberga, Mansouri et al. of excluded compounds equal to 1.49%. Finally, keeping in mind that a classification model should be used for screening large chemical libraries, its application must be as fast as possible. In this regard, it should be noted that the used GOLD protocol (see the ‘Materials & methods’ section for methodological details) is much less time consuming. Predictions of ten representative estrogenic substances & of the CERAPP collection

The goodness of the developed docking-based classification models was first challenged by predicting ten representative substances, as paradigms, with known estrogenic actions: five drugs (diethylstilbestrol, ethinylestradiol, tamoxifen, raloxifen and triclosan) and five variously employed chemicals (bisphenol-A, methylparaben, propylparaben, isobutylparaben and resveratrol). Diethylstilbestrol [43,44] , ethinylestradiol [45] , tamoxifen  [46,47] and raloxifene [48] are drugs conceived to bias the estrogenic receptor. On the other side, triclosan has been used mostly as an antibacterial since the early 1970s [49] . Bisphenol-A is used for the industrial preparation of polycarbonate plastic and epoxy resins, widely used in countless daily life applications [50] . Parabens are a series of parahydroxybenzoates or parahydroxybenzoic acid largely used as preservatives in cosmetic and pharmaceutical products [51] . Resveratrol is a natural polyphenolic compound found in the skin of grapes, blueberries, raspberries and mulberries with multiple pharmacological activities [52,53] . Each of these compounds was cross-docked into the binding pocket of our eight selected crystal structures. Thus, the predictions were made by comparing the calculated score with that approximating the SE thresholds of the classification models reported in Table 2 using a traffic light scheme based on the criteria already described for Table 5. As far as the drugs are concerned, four out of five are predicted as binders by all the classification models (see Table 6) . The sole substance predicted as suspicious is triclosan, whose docking scores approximate SE values in the range from 0.75 to 0.25. Needless to say that drugs designed to target the ERs are expected to behave as binders. On the other hand, triclosan is flagged as suspicious by our models and should, thus, be prioritized for further experimental testing. In this respect, recent mounting scientific evidence documents its adverse health effects: endocrine disruption, skin irritation and bacterial and antibiotic resistance [54] . The failure of GLIDE in docking tamoxifen and raloxifen within 3OLS, 2NV7 and 1L2I is because of

10.4155/fmc.15.103

Future Med. Chem. (Epub ahead of print)

the limited surface area of their binding pockets ranging from 430,598 Å 2 to 772,244 Å 2 compared with those of the 1QKN, 1A52, 3DT3, 1L2J and 1SJ0 ranging from 831,420 Å 2 to 3,228,945 Å 2. Bisphenol-A is a well-known endocrine disruptor able to interact with human ERs. As shown in Table 6, the majority of models assign bisphenol-A to the class of binders. Only a few models designate bisphenol-A as suspicious although it should be noted that, in such cases, docking scores are very close to the threshold of the class of binders. Parabens have been acknowledged as responsible for weak estrogenic activity, increasing with the length of the alkoxy ester group [55] . Consistently, docking scores increase (which means a potential higher estrogenic toxicity) according to the size of the alkyl group (isobutyl > propyl > methyl). Our classification models have shown that parabens are predicted as suspicious. This holds true also for resveratrol. The high structural similarity of resveratrol and diethylstilbestrol would imply a potential estrogenic action. In this respect, studies report that resveratrol binds both ER-β and ER-α with comparable affinity although with 7000-fold lower affinity than estradiol [52] . It is worthy to emphasize that herein we discussed just a few selected meaningful examples and that our docking-based approach was applied successfully on a blind test set from the CERAPP prediction set consisting of >32,000 chemicals [Mansouri  et  al. CERAPP: collaborative estrogen receptor activity prediction (2015), Submitted] .

In this respect, the distribution of relevant physicochemical properties (i.e., molecular weight, log P, hydrogen bond acceptor atoms, hydrogen bond donor atoms, number of rotatable bonds and number of rings) is shown in Supplementary Figure 2. Explicitly encoding molecular structure-based information, our docking-based classification categorized 1658 substances as estrogenic (SE ≤0.25) and 13,520 substances as nonestrogenic (SE >0.75) returning the fourth better prediction of 21 models obtained within the CERAPP initiative. Satisfactorily, the consensus built weighting models based on their accuracies demonstrated effective to overcome the limitations of single models provided by partners from USA and Europe. Conclusion This article shows a protocol to derive docking-based classification models and, more importantly, to analyze their performance from the obtained results. Taken as a whole, this work demonstrates how the smart application of structure-based methods, such as molecular docking, can be a crucial complement to ligand-based approaches, not only to properly support experts in medicinal chemistry programs but also to

future science group

Docking-based classification models for exploratory toxicology studies Research Article

Table 6. Docking scores of the ten representative chemicals predicted by the eight docking-based classification models (docking scores of GOLD and GLIDE are expressed as kcal/mol and kJ/mol, respectively).  Chemical

 Score

1L2I

1A52

3DT3

1SJ0

3OLS

2NV7

1QKN

1L2J

GLIDE

-7.51

-8.00

-9.08

-7.74

-8.53

-6.83

-8.24

-8.40

GOLD

-30.79

-32.07

-33.52

-29.77

-33.44

-29.93

-31.96

-30.76

GLIDE

-8.43

-9.64

-10.70

-9.44

-8.90

-8.17

-9.19

GOLD

-37.76

-36.38

-43.62

-31.88

-40.85

-36.09

-35.20

-9.26

-9.42

-10.06

-8.70

-8.34

-48.44

-49.85

-49.18

-42.57

-38.24

-10.11

-12.27

-11.83

-12.00

-7.14

OH

HO

OH H

/ -34.24

H HO

GLIDE

OH

Cl

Cl

GOLD

Cl

GLIDE

OH

S

O

N

/ -32.25

/

/ -24.64

/

/ -33.51

/

GOLD

-25.47

-38.80

-48.57

-45.52

-15.84

-15.75

-41.50

-27.50

GLIDE

-5.40

-6.40

-6.41

-6.93

-5.78

-6.43

-6.43

-6.13

GOLD

-26.41

-28.40

-28.62

-27.36

-26.56

-26.56

-25.50

-29.43

GLIDE

-7.51

-8.00

-9.08

-7.74

-8.53

-6.83

-6.83

-8.40

GOLD

-30.79

-32.07

-33.52

-29.7

-33.44

-29.93

-31.96

-30.76

GLIDE

-5.63

-6.17

-7.03

-5.87

-6.31

-6.39

-6.21

-6.15

GOLD

-20.20

-19.12

-21.76

-21.76

-21.81

-20.12

-19.96

-20.01

GLIDE

-6.21

-6.39

-6.89

-6.12

-6.44

-6.54

-6.33

-6.00

GOLD

-23.22

-21.96

-23.34

-23.34

-23.51

-23.04

-22.67

-22.53

GLIDE

-6.36

-6.85

-7.58

-6.50

-6.77

-7.10

-6.95

-6.49

GOLD

-23.56

-23.00

-25.90

-25.90

-25.06

-25.45

-24.01

-23.13

GLIDE

-5.40

-8.14

-9.02

-8.43

-8.57

-9.05

-9.05

-8.23

GOLD

-29.92

-30.82

-30.24

-34.96

-28.70

-30.83

-25.50

-28.81

HO O Cl

OH

Cl

Cl

OH

HO

O O HO O O HO O O HO OH HO

OH

The chemical structure of diethylstilbestrol, ethinylestradiol, tamoxifen, raloxifen, triclosan, bisphenol-A, methylparaben, propylparaben, isobutylparaben and resveratrol are reported in the first column. Note that the use of different colors allow appreciating the level of concordance obtained by multiple models. Red: Chemicals predicted as binders; Yellow: Chemicals predicted as suspicious; Green: Chemicals predicted as nonbinders. Slash: undocked chemicals. GLIDE: Grid-based ligand docking with energetics; GOLD: Genetic optimization for ligand docking.

future science group

www.future-science.com

10.4155/fmc.15.103

Research Article  Trisciuzzi, Alberga, Mansouri et al. meet the extremely restrictive conditions typical of the regulatory context. However, these are two sides of the same coin. An informed interpretation of the results is the sole difference. Importantly, the results would suggest that the use of GLIDE or GOLD depends on the pursued goals. As shown, there is not a winning model, but rather a case-by-case evaluation should be made. However, the in silico procedure presented here might be helpful for immediately obtaining a preliminary but valuable idea of the estrogenic activity by simply comparing the docking score of a given chemical with those herein reported at the different SE-based thresholds. Future perspective The protection of human health and the environment is a theme of public concern and, thus, of utmost relevance for Western world countries. New national and international legislation promotes the development of more sustainable chemicals. Regulatory authorities, such as the US EPA and the EU’s European Chemicals Agency, are strongly committed to launching programs to assess the safety of chemicals commonly used in our daily life. However, the large amount of data prompted the adoption of predictive computational methods as alternatives to expensive animal tests. In this respect, exploratory toxicology is a highly interdisciplinary research area trying to meet these new emerging needs. In this article, the authors anticipate how adapting molecular docking, largely used for lead optimization in drug discovery programs, can

work as a predictive classifier. Unlike classical QSAR models, they report how the wealth of physicochemical information contained in the native protein structures can be useful to screen large chemical collections provided with high-quality experimental data. Exploratory toxicology could represent one of the future directions of medicinal chemistry where properly addressing skills and competencies coming from data and molecular modeling. Supplementary data To view the supplementary data that accompany this paper please visit the journal website at: www.future-science.com/doi/ full/10.4155/fmc.15.103

Disclaimer The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the US Environmental Protection Agency.

Financial & competing interests disclosure O Nicolotti and GF Mangiatordi wish to acknowledge FIRB (Futuro in Ricerca 2012, RBFR12SJA8_003) and IDEA 2011 (GRBA11EB3G). E Benfenati would like to thank LIFE+ project EDESIA for funding. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. No writing assistance was utilized in the production of this manuscript.

Executive summary Model evaluation • A large collection of chemical structures having high-quality experimental data, provided by the US Environmental Protection Agency (EPA), has been used as training set. • Twenty-four docking-based classification models have been derived to predict the estrogenic potential of chemicals.

Predictions of ten representative estrogenic substances and of the CERAPP collection • Chemicals have been classified as binders, nonbinders or suspicious according to the more restrictive regulatory conditions. • Docking score values have been provided explicitly as numerical thresholds to enable users to make their own predictions.

References

3

Nicolotti O, Benfenati E, Carotti A et al. REACH and in silico methods: an attractive opportunity for medicinal chemists. Drug Discov. Today 19(11), 1757–1768 (2014).

Papers of special note have been highlighted as: • of interest; •• of considerable interest

10.4155/fmc.15.103

1

Merlot C. Computational toxicology – a tool for early safety evaluation. Drug Discov. Today 15(1–2), 16–22 (2010).

••

Describes how medicinal chemistry can meet regulatory purposes.

2

Kavlock R, Dix D. Computational toxicology as implemented by the U.S. EPA: providing high throughput decision support tools for screening and assessing chemical exposure, hazard and risk. J. Toxicol. Environ. Health Part B 13(2–4), 197–217 (2010).

4

Judson R, Richard A, Dix DJ et al. The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117(5), 685–695 (2009).

••

Survey on programs of chemical screening and prioritization.

Future Med. Chem. (Epub ahead of print)

future science group

Docking-based classification models for exploratory toxicology studies Research Article

5

Judson R, Richard A, Dix D et al. ACToR – aggregated computational toxicology resource. Toxicol. Appl. Pharmacol. 233(1), 7–13 (2008).

6

European Commission. Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorization and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. J. Eur. Union Lett. 396, 1–849 (2006).

7

8

9

10

11

20

Colborn T. Environmental estrogens: Health implications for humans and wildlife. Environ. Health Perspect. 103(Suppl. 7), 135–136 (1995).

21

Muegge I, Oloff S. Advances in virtual screening. Drug Discov. Today Technol. 3(4), 405–411 (2006).

22

Ma XH, Zhu F, Liu X et al. Virtual screening methods as tools for drug lead discovery from large chemical libraries. Curr. Med. Chem. 19(32), 5562–5571 (2012).

23

Gissi A, Gadaleta D, Floris M et al. An alternative QSAR-based approach for predicting the bioconcentration factor for regulatory purposes. ALTEX 31(1), 23–36 (2014).

24

Fourches D, Muratov E, Tropsha A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50(7), 1189–1204 (2010).

25

Shi LM, Fang H, Tong W et al. QSAR models using a large diverse set of estrogens. J. Chem. Inf. Comput. Sci. 41(1), 186–195 (2000).

Small-Molecule Drug Discovery Suite 2014–4: GLIDE, version 6.5, Schrödinger, LLC, New York, NY, 2014.

26

Devillers J, Marchand-Geneste N, Carpy A, Porcher JM. SAR and QSAR modeling of endocrine disruptors. SAR QSAR Environ. Res. 17(4), 393–412 (2006).

Jones G, Willett P, Glen RC, Leach AR, Taylor R. Development and validation of a genetic algorithm for flexible docking1. J. Mol. Biol. 267(3), 727–748 (1997).

27

Jacobs MN. In silico tools to aid risk assessment of endocrine disrupting chemicals. Toxicology 205(1–2), 43–53 (2004).

Hornberg JJ, Laursen M, Brenden N et al. Exploratory toxicology as an integrated part of drug discovery. Part I: why and how. Drug Discov. Today 19(8), 1131–1136 (2014).

28

Zhang L, Sedykh A, Tripathi A et al. Identification of putative estrogen receptor-mediated endocrine disrupting chemicals using QSAR- and structure-based virtual screening approaches. Toxicol. Appl. Pharmacol. 272(1), 67–76 (2013).

29

Heldring N, Pike A, Andersson S et al. Estrogen receptors: how do they signal and what are their targets. Physiol. Rev. 87(3), 905–931 (2007).

30

Shiau AK, Barstad D, Radek JT et al. Structural characterization of a subtype-selective ligand reveals a novel mode of estrogen receptor antagonism. Nat. Struct. Mol. Biol. 9(5), 359–364 (2002).

31

Tanenbaum DM, Wang Y, Williams SP, Sigler PB. Crystallographic comparison of the estrogen and progesterone receptor’s ligand binding domains. Proc. Natl Acad. Sci. USA 95(11), 5998–6003 (1998).

32

Fang J, Akwabi-Ameyaw A, Britton JE et al. Synthesis of 3-alkyl naphthalenes as novel estrogen receptor ligands. Bioorg. Med. Chem. Lett. 18(18), 5075–5077 (2008).

33

Kim S, Wu JY, Birzin ET et al. Estrogen receptor ligands. II. Discovery of benzoxathiins as potent, selective estrogen receptor α modulators. J. Med. Chem. 47(9), 2171–2175 (2004).

34

Möcklinghoff S, Rose R, Carraz M, Visser A, Ottmann C, Brunsveld L. Synthesis and crystal structure of a phosphorylated estrogen receptor ligand binding domain. Chem. Biol. Chem. 11(16), 2251–2254 (2010).

35

Mewshaw RE, Bowen SM, Harris HA, Xu ZB, Manas ES, Cohn ST. ERbeta ligands. Part 5: synthesis and structure–activity relationships of a series of 4’-hydroxyphenyl-aryl-carbaldehyde oxime derivatives. Bioorg. Med. Chem. Lett. 17(4), 902–906 (2007).

Van Heerden S. Recent developments in global regulatory framework in the chemical industry. Popul. Plast. Packag. 57, 46–50 (2012). Diamanti-Kandarakis E, Bourguignon JP, Giudice LC et al. Endocrine-disrupting chemicals: an Endocrine Society scientific statement. Endocr. Rev. 30(4), 293–342 (2009).

12

Organization for Economic Cooperation and Development. www.qsartoolbox.org

13

Nicolotti O, Giangreco I, Miscioscia TF, Carotti A. Improving quantitative structure–activity relationships through multiobjective optimization. J. Chem. Inf. Model. 49(10), 2290–2302 (2009).

14

Nicolotti O, Miscioscia TF, Carotti A, Leonetti F, Carotti A. An integrated approach to ligand- and structure-based drug design: development and application to a series of serine protease inhibitors. J. Chem. Inf. Model. 48(6), 1211–1226 (2008).

15

mechanisms of action. Chem. Res. Toxicol. 24(1), 6–19 (2011).

Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3(11), 935–949 (2004).

16

Shoichet BK. Virtual screening of chemical libraries. Nature 432(7019), 862–865 (2004).

17

Kolšek K, Mavri J, Sollner Dolenc M, Gobec S, Turk S. Endocrine disruptome – an open source prediction tool for assessing endocrine disruption potential through nuclear receptor binding. J. Chem. Inf. Model. 54(4), 1254–1267 (2014).



Updated report on methods to screen chemicals toward a number of nuclear receptors.

18

Mueller SO, Korach KS. Estrogen receptors and endocrine diseases: lessons from estrogen receptor knockout mice. Curr. Opin. Pharmacol. 1(6), 613–619 (2001).

19

Shanle EK, Xu W. Endocrine disrupting chemicals targeting estrogen receptor signaling: identification and

future science group

www.future-science.com

10.4155/fmc.15.103

Research Article  Trisciuzzi, Alberga, Mansouri et al. 36

ovariectomized cynomolgus monkeys (macaca fascicularis). Am. J. Obstet. Gynecol. 196(1), 75.e1–75.e7 (2007). 46

Shiau AK, Barstad D, Loria PM et al. The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen. Cell 95(7), 927–937 (1998).

37

Schrödinger Release 2014–4: LigPrep, version 3.2, Schrödinger, LLC, New York, NY, 2014.

47

38

Schrödinger Release 2014–4: Schrödinger Suite 2014–4 Protein Preparation Wizard; Epik version 3.0, Schrödinger, LLC, New York, NY, 2014; Impact version 6.5, Schrödinger, LLC, New York, NY, 2014; Prime version 3.8, Schrödinger, LLC, New York, NY, 2014.

Yin L, Hu Q. Drug discovery for breast cancer and co-instantaneous cardiovascular disease: what is the future? Future Med. Chem. 5(4), 359–362 (2013).

48

Banks JL, Beard HS, Cao Y et al. Integrated modeling program, applied chemical theory (IMPACT). J. Comput. Chem. 26(16), 1752–1780 (2005).

Barkhem T, Carlsson B, Nilsson Y, Enmark E, Gustafsson J, Nilsson S. Differential response of estrogen receptor alpha and estrogen receptor beta to partial estrogen agonists/antagonists. Mol. Pharmacol. 54(1), 105–112 (1998).

49

Schapira M, Abagyan R, Totrov M. Nuclear hormone receptor targeted virtual screening. J. Med. Chem. 46(14), 3045–3059 (2003).

Dinwiddie MT, Terry PD, Chen J. Recent evidence regarding triclosan and cancer risk. Int. J. Environ. Res. Public Health. 11(2), 2209–2217 (2014).

50

Geens T, Aerts D, Berthot C et al. A review of dietary and non-dietary exposure to bisphenol-A. Food Chem. Toxicol. 50(10), 3725–3740 (2012).

51

Rastogi SC, Schouten A, de Kruijf N, Weijland JW. Contents of methyl-, ethyl-, propyl-, butyl- and benzylparaben in cosmetic products. Contact Dermatitis. 32(1), 28–30 (1995).

52

Bowers JL, Tyulmenkov VV, Jernigan SC, Klinge CM. Resveratrol acts as a mixed agonist/antagonist for estrogen receptors α and β. Endocrinology 141(10), 3657–3667 (2000).

39

40

10.4155/fmc.15.103

Pike AC, Brzozowski AM, Hubbard RE et al. Structure of the ligand-binding domain of oestrogen receptor beta in the presence of a partial agonist and a full antagonist. EMBO J. 18(17), 4608–4618 (1999).

41

Klebe G. Virtual ligand screening: strategies, perspectives and limitations. Drug Discov. Today 11(13–14), 580–594 (2006).

••

Describes benefits and limitations of virtual screening.

42

Huang N, Shoichet BK, Irwin JJ. Benchmarking sets for molecular docking. J. Med. Chem. 49(23), 6789–6801 (2006).



Docking-based analysis of benchmark biological data.

43

Nam K, Marshall P, Wolf RM, Cornell W. Simulation of the different biological activities of diethylstilbestrol (DES) on estrogen receptor alpha and estrogen-related receptor gamma. Biopolymers 68(1), 130–138 (2003).

53

Fuller HR, Humphrey EL, Morris GE. Naturally occurring plant polyphenols as potential therapies for inherited neuromuscular diseases. Future Med. Chem. 5(17), 2091–2101 (2013).

44

Walker VR, Jefferson WN, Couse JF, Korach KS. Estrogen receptor-α mediates diethylstilbestrol-induced feminization of the seminal vesicle in male mice. Environ. Health Perspect. 120(4), 560–565 (2012).

54

Huang H, Du G, Zhang W et al. The in vitro estrogenic activities of triclosan and triclocarban. J. Appl. Toxicol. 34(9), 1060–1067 (2014).

55

45

Sikoski P, Register TC, Lees CJ et al. Effects of two novel selective estrogen receptor modulators, raloxifene, tamoxifen, and ethinyl estradiol on the uterus, vagina and breast in

Okubo T, Yokoyama Y, Kano K, Kano I. ER-dependent estrogenic activity of parabens assessed by proliferation of human breast cancer MCF-7 cells and expression of ERα and PR. Food Chem. Toxicol. 39(12), 1225–1232 (2001).

Future Med. Chem. (Epub ahead of print)

future science group