QSAR Study of Curcumine Derivatives as HIV-1 Integrase Inhibitors

4 downloads 0 Views 218KB Size Report
Abstract: A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model ...
Send Orders of Reprints at [email protected] Current Computer-Aided Drug Design, 2013, 9, 141-150

141

QSAR Study of Curcumine Derivatives as HIV-1 Integrase Inhibitors Pawan Gupta1, Anju Sharma2, Prabha Garg*,1,2 and Nilanjan Roy3 1

Department of Pharmacoinformatics, 2Computer Centre, 3Department of Biotechnology, National Institute of Pharmaceutical Education and Research (NIPER), Sector-67, S.A.S. Nagar, 160062, Punjab, India Abstract: A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r2) 0.891 and cross validated r2 (r2cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r2pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

Keywords: HIV-1 integrase, curcumine derivatives, descriptors, 2D-QSAR, applicability domain. 1. INTRODUCTION Human immunodeficiency virus (HIV) 1 integrase (IN) catalyzes the insertion of the viral DNA into the host cell genome through two different steps, 3-processing and strand transfer. During 3'-processing IN selectively cleaves the last two nucleotides (GT) of viral DNA, to generate two CA-3'hydroxyl recessed ends, which are the reactive intermediates and essential for the next step. The enzyme, still bound to the 3'-processed viral DNA, translocates to the nucleus of the infected cell as a part of the preintegration complex, wherein the terminal 3'-OH of the viral DNA attacks the host DNA in the strand transfer step. Intense research is being carried out on HIV-1 IN protein, however, only 27 molecules have been approved as anti HIV drugs in last few decades [1]. Out of which only two drugs Raltegravir [2] (US FDA approved) and Elvitagravir [3] (phase III clinical trial) have been successfully tested for treatment of patients against HIV-1 IN [4-6]. Different series of inhibitors have been designed and synthesized as inhibitor of HIV-1 IN like Diketo acid [7, 8], Mercaptobenzene sulfonamide [9], Caffoyl anilide [10], Styrylquinoline [11], Quinolone carboxylic acid [12], Tricyclic analogs [13], Hydroxy pyrimidine [14], Chichoric acid [15] nitrogen containing polyhydroxylated aromatics [16] and Keto-salicylic acid [17] derivatives etc. In the last few decades, computational-based rational design has contributed extensively to the discovery and optimization of many clinically used drugs. Various tools have been applied to the discovery of inhibitors of HIV-1 IN like Pharmacophore searching [18, 19] and Quantitative Structure-Activity Relationships (QSAR). Different structurally diverse IN inhibitors have been identified with

the help of these tools [20, 21]. QSAR is a relationship between structural properties and biological activity of molecules. Cheng and co-worker performed QSAR modeling on carboxylic acid derivatives and identified polarizability and mass important properties for defining the activity [22]. While valence connectivity index order 1, lowest unoccupied molecular orbital and dielectric energy were found important properties in QSAR model equation for naphthyridine derivatives by Ravichandran and co-workers [23]. In addition, atomic mass, electronegativity and atomic polarizability were found important descriptors in QSAR study of tricyclic phthalimide analogs using linear and non-linear methodologies [24]. Moreover several attempts have been made to build QSAR model using different series molecules like Styrylquinoline [25, 26] and Benzyl amide keto-acid [27] to finding out structural features important for defining the activity of these inhibitors against HIV IN. In previous studies, only docking, Comparative Molecular Field Analysis (CoMFA) and pharmacophore mapping were performed for curcumine series. It was found that polar interactions (cation- and metal ion interactions in docking, high contribution of electrostatic field in CoMFA and hydrogen bond donor and hydrogen bond acceptor features in pharmacophore mapping) were important for binding of these molecules into the active site [28]. The main purpose of present work is to develop a QSAR model for predicting inhibitory activity of the curcumine derivatives, and better understanding of structural features of these molecules. The results of this study may be helpful for designing of new analogs with better biological profile and for screening of new molecules. 2. MATERIALS AND METHODS 2.1. QSAR

*Address correspondence to this author at the Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Sector-67, S.A.S. Nagar, 160062, Punjab, India; Tel: +91 172 2214 682; Fax: +91 172 2214 692; E-mails: [email protected], [email protected] 1875-6697/13 $58.00+.00

2.1.1. Data Set All the curcumine derivatives have shown inhibitory activity against 3 processing reaction of HIV-1 IN in © 2013 Bentham Science Publishers

142 Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

enzyme based assay [29]. (Table S1 in supporting information) These molecules were built using SYBYL7.1 molecular modeling package installed on a Silicon Graphics Fuel Work station running IRIX 6.5. Tripos force field, Gasteiger Hückel partial atomic charges and Powell’s conjugate gradient method were used for minimization of all molecules with 0.05 kcal/mol energy gradient convergence criterion as also used for CoMFA model generation [28]. The data set was divided into training and test set using cross validation partition function using 3:1 ratio in MATLAB. Based on the activity values, data set was labeled into two classes as 'active' (pIC50 > 4) and 'inactive' (pIC50 = 4) (Table 1). The data set of 39 molecules were partitioned into training (29 molecules) and test (10 molecules) sets using partition function. This function was used with stratification, using the class information; that is both, training and test sets have roughly the same class proportions. 2.1.2. Descriptor Selection and Model Building Chemical structure of the molecule is usually represented by variety of descriptors such as Topological, Constitutional, Geometrical, Charge, Information, WHIM, GETAWAY, Functional group, Eigen value, Connectivity and Edge adjacency indices etc. All set of descriptors were calculated using Dragon 5.3. After removing constant and near constant set descriptors, remaining 1367 descriptors were scaled in order to negate the dominance of descriptor with higher numerical values on the data thus allowing underlying characteristics of the data sets to be compared. Scaling was carried out using following equation:-

x_scaled ( i ) =

x ( i )  mean ( x ) std.dev(x)

where x represents a particular descriptor's value The principle of Occam’s razor (principle of parsimony) is stated as ‘the variables which contain the information that is necessary for the modeling but nothing more’[30] thereby a stable and interpretable model can be generated. This is very tough to identify those optimal set descriptors that have appropriate correlation of structural properties of molecules with activity. Therefore, correlation matrix was used to identify the descriptors which have high correlation with activity as used in previous work [31]. The descriptors having high correlation with each other as well as low correlation with biological activity were not used for this purpose. As evident, these highly correlated (multi-colinearity) descriptors do not actually bias results; but they just produce large standard errors in the developed model. The common interpretation of the computed regression parameters is measuring the change in the expected value of the activity when the corresponding descriptors are varied while all other descriptors are held constant is not fully applicable when a high degree of correlation exists. This is due to the fact that with highly correlated descriptors, it is difficult to attribute changes in the activity to one of the two correlated descriptors. When developed model is applied to new data which differs from the data that was fitted, it may introduce

Gupta et al.

large errors in predictions because the pattern of multicollinearity between the descriptors is different in new data from the data used for estimation [32, 33]. Therefore, these descriptors were not considered for further studies. Moreover, Randic also commented on selection of optimum descriptors, as that descriptors should be defined by the structure-property-activity correlation, although statistical criteria have to be used for preliminary screening of the descriptors taken from large pool [34, 35]. Thus, QSAR results generated using these descriptors should interpret the models in physico-chemical and/or mechanistic sense [36]. Finally, 128 descriptors (top 10% of 1367 descriptors) that have high correlation to biological activity were selected. Still, numbers of descriptors were too large for generation of reliable model using multiple linear regression (MLR) [36, 37]. Thus, further dimensionality reduction need to be carried out to reduce the complexity of model interpretation [38]. Two different combinations of variable selection methods were used for this purpose in WEKA program [39]. These were (i) correlation-based feature selection method with best first search and greedy stepwise search methods (ii) Chi square-based feature selection with ranker search method [40, 41]. The correlation-based feature selection method is the attribute subset evaluation method that ranks subsets of features according to a correlationbased evaluation function. The bias of the evaluation function is towards subsets that contain features which are highly correlated with the class and uncorrelated with each other. Irrelevant features should be ignored because they will have low correlation with the class. Redundant features should be screened out as they will be highly correlated with one or more of the remaining features. The acceptance of a feature will depend on the extent to which it predicts classes in the areas of the instance space not already predicted by other features. The best search and greedy step wise search methods were used for searching and ranking the features. In best first, greedy hill climbing with back tracking algorithm is used for searching the features. It can search forward from empty set of attributes (forward), backward from the full set, or start at intermediate point (specified by a list of attribute indices) and search in both directions by consideraing all possible single attribute addition and deletion. In greedy stepwise, same algorithm is used as used for best first, but without backtracking. Like best first, it may progress forward from the empty set or backward from the full set. Unlike best first, it does not back track but terminates as soon as adding or deleting the best remaining attribute decreases the evaluation metric. The Chi square evaluation method is single attribute evaluators method. The attributes are evaluated by computing the chi-square statistic with respect to the class. The rank search method was used along this method to sort the attributes by individual evaluations. The rank search method not only ranks attributes but also performs attribute selection by removing the lower ranking ones It evaluates subsets by increasing size of attributes one by one using above discussed method (forward selection) - the best attributes, the best attribute plus the next best one, and so on -reporting the best subset.

QSAR Study of Curcumine Derivatives as HIV-1 Integrase Inhibitors

Table 1.

a

Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

Experimental Predicted (pIC50) and Leverage Values [h(x)] from QSAR Model and Descriptors Values for Curcumine Molecules (h* = 0.621; h# = 0.414)

Mol ID

Actual pIC 50

1

b

Pred. pIC 50

h(x)

Mor22m

H1m

Q2

DELS

BIC5

Hy

4.00

3.76

0.26

-1.809

2.065

-0.674

-1.704

-0.478

-1.443

2

4.00

4.04

0.28

-0.675

1.884

-0.722

-1.812

-0.165

-1.170

3

4.00

4.02

0.26

-0.320

1.807

-0.688

-1.502

-0.450

-1.460

4

4.00

4.38

0.21

-0.791

1.161

-0.717

-1.633

-0.207

-1.198

a

5

5.05

4.53

0.12

-0.844

1.241

-0.652

-1.210

0.975

-1.437

6

4.00

4.32

0.16

-1.338

1.298

-0.694

-1.373

-0.264

-1.174

7

4.00

3.82

0.26

-0.937

2.142

-0.681

-1.507

-0.535

-1.437

a

8

4.00

3.94

0.26

-1.611

2.001

-0.713

-1.637

-0.264

-1.174

a

9

4.00

4.03

0.27

0.308

0.349

2.032

0.653

-0.620

-0.381

10

4.00

4.03

0.25

-0.814

0.148

2.048

0.610

-0.620

-0.381

a

11

6.30

5.79

0.20

-0.082

-0.880

-0.519

-0.619

-0.279

0.045

12

4.00

3.97

0.26

0.104

0.200

2.047

0.646

-0.606

-0.795

13

4.00

4.01

0.22

0.575

0.133

2.060

0.598

-0.606

-0.795

a

6.30

5.83

0.17

0.360

-0.853

-0.518

-0.567

-0.136

0.023

15

4.00

4.08

0.21

1.366

0.329

1.946

0.697

-0.450

-0.807

16

4.00

4.12

0.26

-0.07

-0.125

2.068

0.648

-0.450

-0.807

a

17

5.77

5.70

0.09

0.541

-0.530

-0.470

-0.435

0.035

-0.069

18

4.00

3.77

0.29

-1.867

0.632

1.979

0.864

-0.222

-0.859

a

14

143

19

4.00

3.81

0.28

-1.425

0.307

2.221

0.812

-0.222

-0.859

20

6.52

6.14

0.37

-0.012

-0.761

-0.420

0.399

-1.333

1.631

21

6.40

6.16

0.15

1.064

-0.967

-0.423

0.392

-1.262

1.049

22

6.30

6.19

0.26

2.023

-0.845

-0.401

0.443

-1.048

1.018

23

6.15

6.05

0.30

2.070

-0.465

-0.351

0.607

-0.734

0.886

24

5.85

6.15

0.18

1.285

-0.974

-0.410

0.344

-1.333

1.060

25

5.22

6.03

0.14

0.273

-0.850

-0.365

0.564

-1.461

1.082

26

6.15

5.57

0.19

0.349

0.118

-0.398

0.338

-1.461

1.082

27

5.80

5.93

0.18

0.517

-0.691

-0.416

0.272

-1.589

1.094

28

6.52

6.08

0.39

-0.809

-1.180

-0.601

-1.010

1.801

-0.368

29

6.70

5.93

0.18

-0.355

-1.126

-0.443

-0.477

0.690

-0.339

30

5.52

6.15

0.14

-0.582

-0.691

-0.413

-0.230

1.175

0.862

31

6.00

5.98

0.13

0.296

-0.257

-0.479

-0.571

1.175

0.862

a

32

6.70

6.03

0.24

-0.390

-0.649

-0.346

0.495

1.203

-0.015

33

5.92

6.14

0.15

0.029

-0.493

-0.321

0.554

1.274

0.511

34

6.70

6.40

0.40

0.273

-0.455

-0.235

1.499

1.274

0.953

a

35

6.70

6.55

0.52

0.930

-0.316

-0.211

1.541

1.360

1.541

36

5.55

6.10

0.27

-0.512

-0.843

-0.332

0.465

1.232

-0.033

37

6.15

6.25

0.17

1.192

-0.612

-0.307

0.511

1.303

0.486

a

38

5.59

6.44

0.37

0.721

-0.480

-0.250

1.476

1.303

0.928

39

6.70

6.60

0.47

1.273

-0.423

-0.199

1.509

1.374

1.508

Test set for 2D QSAR; b3 processing activities evaluated from enzyme based assay and converted into pIC50 using –log[IC50 in μM] formula [19].

All these evaluations were done at 10 fold cross validation in learning process; was performed variable selection for each cross validation fold. All the descriptors

were ranked in each method. Only 8 descriptors were found common during these feature selection procedures, so all of them were selected for model building using MLR analysis.

144 Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

In many QSAR cases [36, 42, 43], ratio of the objects in the training set to the number of descriptors should be 5:1 [Ratio between number of training set (N) and number of descriptors in model equation (k)]. So considering this hypothesis, influence of each descriptor on r2 was studied by removing them one by one. The descriptors having small or negligible drop in the r2 were eventually removed. Finally, a set of 6 descriptors were selected for MLR. 2.1.3. Quality of Fit and Predictive Ability of QSAR Model Quality of fit of QSAR model on the training data is the first indication of success of model. The most commonly used parameters for this purpose are squared correlation coefficient (r2), root mean square (RMS) error and the Fvalue statistic [44].

 (Y  Y )  (Y  Y )

Gupta et al.

subjected for model building using MLR and mean r2 was calculated for each random model. If the original QSAR model is statistically significant, its score should be significantly better from random data set. Tropsha [45] tests of model predictivity were performed. The developed model is considered predictive, if following conditions are satisfied: 2 rpred > 0.6

(R

2

 R02 R

(R

2

)

2

 R0'2 R

< 0.1

)

2

< 0.1

2

r2 =1-

0.85=k=1.15 or 0.85=k'=1.15

Pred

2

Mathematical definition of R2, R02, R0'2, k and k' are based on regression of the observed activities against predicted activities of and vice versa (predicted against actual activities). r2pred is squared correlation coefficient for test set molecules.

training

RMS =

 (Y  Y ) ( N  k  1)

2

pred

(r / k ) F= ((1  r ) / ( N  k  1))

Roy et al. [46] suggested a metric Rm2 to evaluate the predictive ability of the model that was calculated using the following equation:

where Y and YPred are the observed and predicted activity values, respectively, of the training set and Y training is the mean activity value of the training set. k is the number of independent variables in model.

Rm2 = R 2 1 

2

2

In addition to a high cross validated r2 (r2cv) a reliable model should also be characterized by a high squared correlation coefficient (r2) (0.9)[45]. So, it can be concluded with confidence that developed QSAR model is good enough to establish the goodness of fit (r2) and goodness of prediction (r2cv) for studied data set. Moreover, predicted r2 (r2pred) was also considered to evaluate the model predictive ability for test set molecules, which were excluded during model building. The r2cv and r2pred were calculated using this formula. r

2

cv

= 1 --

(Y (Y

Training

 YPred (Training

Traning

r

2

pred

= 1 --

 (Y  (Y

 YTraining

Pred (Test ) Test

 YTest )

 Y Training

)

)

)

(

R 2  R02

)

where R2 is the squared correlation coefficient between observed and predicted values and R02 is the squared correlation coefficient between observed and predicted values without intercept. If there is large difference between predicted values and the corresponding observed values, the R2m statistics penalize the model heavily. R2m calculation does not affect the predictability of the model rather it is used to judge the quality of prediction. If R2m > 0.5 for given model, it indicates a good external predictability of the developed model.

2

2

2

2

where Y Training and Y pred (Training) are the observed and predicted activity values of training set, respectively. Y Test and Y Pred (Test ) are the observed and predicted activity values, respectively, of the test set molecules and Y Training is the mean activity value of the training set. To further check the predictive ability of developed model, randomization test was done that evaluate statistical significance of the relationship between the anti-HIV-1 IN activity and molecular descriptors. 100 random samples were generated using shuffling of activities of molecules without disturbing descriptor column. The generated samples were

2.2. Defining Applicability Domain of Model For prediction of test and new screened molecules from QSAR model, domain of applicability (AD) must also be defined for model data. This is required to check the prediction reliability of model for these molecules. The prediction is considered reliable, if molecule fall into this domain, otherwise extrapolated outside the domain. Extent of extrapolation [47] is one of the simple approach to define the AD of model for new molecules. It is defined by calculation of leverage h(x) [48] for each molecule. The leverage h(x) of a molecule measures its influence on the model. The leverage of a molecule is defined as:

(

h ( x ) = N 1 + x T X T X

)

1

x

where x represents the test molecule in centered descriptor space and X is the training data matrix whose N rows represent the training molecules in the centered descriptor space, T is transpose.

QSAR Study of Curcumine Derivatives as HIV-1 Integrase Inhibitors

Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

'Centered' means that the grand mean of the training data is taken as the origin of the descriptor space. h(x) can be calculated for training, test and new screened molecules. h(x) for the training set molecule indicates those molecules that may have influenced the model parameters to a marked extent. For the test and new molecules these indicate the AD of the model. The warning leverage (h*) is defined as 3*k/N while high leverage (h#) is > 2*k/N, where N is the number of molecules in the training set and k is the number of descriptors in the model. The h(x) value of a molecule is having h#  h(x), the prediction is considered reliable as interpolated within domain of training set. However, the molecules having h(x) value > h# and < h* means molecules prediction are not extrapolated to such extent which can be considered extrapolation, so it can be considered reliable as well. Conversely, molecules with h(x) > h* in the test set that means structurally distant from the training chemicals, resulting extrapolated outside the AD of the model, hence prediction is unreliable. These prediction must be used with great care by users as having increased uncertainty [36, 48].

145

molecules was used for 10 fold cross validation in the feature selection procedures. The data was partitioned into two statistically representative classes of training (29 molecules) and test set (10 molecules) using cross validation partition function. The statistically significant model was obtained using MLR in MATLAB with r2 = 0.898 and r2cv = 0.800 (Equation-1). The influence of each descriptor on r2 was studied by removing them one by one to reach 5:1 ratio for best QSAR model equation. Finally, 6 descriptors were selected having r2 0.891 for training set of 29 molecules. The selected descriptors are Mor22m, H1m, Q2, DELS, BIC5 and Hy (Table 1) [Equation-2 for final QSAR model]. Model equation-1 pIC50 = 5.328 + 0.059*Mor22m - 0.251*HVcpx 0.503*H1m - 0.427*Q2 + 0.065*DELS - 0.293*Yindex + 0.326*Hy + 0.228*BIC5 N = 29; r2 = 0.898; RMS= 0.0754; r2cv=0.800; F-value = 33.10; Model equation-2 pIC50 = 5.28 + 0.065*Mor22m - 0.456*H1m - 0.512*Q2 + 0.203*DELS + 0.169*BIC5 + 0.267*Hy

3. RESULTS AND DISCUSSIONS 3.1. QSAR

N = 29; r2 = 0.891; RMS= 0.077; r2cv= 0.825; r2cv5fold = 0.860 F-value = 34.097

3.1.1. Descriptor calculation and model building The descriptors for complete data set were calculated. The normalization and dimensionality reduction were performed using the methods described in Section 2.1.2. Ideally there should be no inter-correlation between descriptors, but should have good correlation with activity. So, inter-correlation of descriptors checked between descriptors as well as with activity using correlation matrix, eventually descriptors having high correlation with each other and low correlations with activity were discarded. The inter-correlation of descriptors was checked using correlation matrix. The detail of this method was described in Section 2.1.2. Finally, 128 descriptors which had high correlation with activity and low correlation with each others were selected. To reduce the dimensionality of data set, optimization of these descriptors was also performed to get proper descriptors for model building using two different variable selection procedures: (i) correlation-based feature selection method with best first search and greedy stepwise search methods (ii) Chi square based feature selection with ranker search method. It was found that 8 descriptors were top ranked in both correlation-based and Chi square based feature selection methods, so finally these descriptors were used for QSAR model building. The complete data set of 39 Table 2.

Table 2 is showing the selected descriptors, their definition, and class. The data set is small comprising of 39 molecules. Splitting of these molecules into test and training sets may result in the exclusion of some molecular features in model building. If the data is already large, then making it larger will not make predictions much better. But if the data is small, making it yet smaller will make predictions much worse [49]. In order to overcome such issues, r2cv was calculated at different levels [Leave one out (LOO) and leave 5-out (5-fold)] for training data and entire data set molecules to check the model predictability and robustness even if the data set is small. The LOO r2cv and r2cv5fold for entire data set were obtained, 0.820 and 0.816, respectively. Hence, the selected descriptors (using both the methods: correlation and Chi square based) are optimal to establish the structure activity relationship of this data set. Moreover, the LOO r2cv and r2cv5fold for training set molecules were also calculated 0.825 and 0.860, respectively that indicated the good predictability of developed model. For evaluation of the external predictive power of the generated model, model was applied for the prediction of

Selected Descriptors in Final QSAR Model

Descriptor

Definition

Descriptor Class

Mor22m

3D-MoRSE - signal 22/weighted by atomic masses(m)

3D-MoRSE

H1m

H autocorrelation of lag 1/weighted by atomic masses(m)

GATAWAY

Q2

Summing the squares of the atomic charges

Charge

DELS

Electro topological variation

Electro-Topological

BIC5

Bond information content-neighborhood symmetry of 5-order

Information

Hy

Hydrophilic factor

Molecular Properties

146 Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

Gupta et al.

pIC50 values of test set (10 molecules) which was not part of training set during model development. r2pred and RMS for test molecules were found 0.849 and 0.045, respectively. The calculated pIC50 values for training and test set molecules are summarized in Table 1. The scatter plots for training (a) and test set (b) actual activities were plotted against predicted activities, respectively in Fig. (1). The brief explanation of the six descriptors that appeared in the final model follows next. The 3D-MoRSE (3D Molecule Representation of Structures based on Electron diffraction) [50] descriptors appearing in the model are important because they take into account the 3D arrangement of the atoms and do not depend on the molecular size. Thus, it is applicable to a large number of molecules with great structural variance and being a characteristic common to all of them. This type of indices is based on the idea of obtaining information from the 3D atomic coordinates by the transform used in electron diffraction studies for preparing theoretical scattering curves. A generalized scattering function, called the molecular transform, can be used as the functional dependence for deriving, from a known molecular structure, the specific analytic relationship of both X-ray and electron-diffraction. In order to take into account the specific contributions of the atoms to the property being studied, different atomic properties can be employed as weighing schemes. The 3DMoRSE codes have great potential for representation of molecular structure. It is worth noting that they reflect the 3D arrangement of the atoms of a molecule and do not care about chemical bonds. Thus, 3D-MoRSE code can reveal the skeleton, different size and substituent’s information for a molecule. Here, Mor22m correspond to signal 22 weighted by atomic masses. GETAWAY [51] (GEometry, Topology, and AtomWeights AssemblY) descriptors used for structure property correlations and molecular profiles studies suitable for similarity/diversity analysis. Differently from the MoreauBroto autocorrelations, GETAWAYs are geometrical descriptors encoding information on the effective position of substituents and fragments in the molecular space thereby describing differences in congeneric series of molecules. This descriptor try to match 3D molecular geometry provided by Molecular Influence Matrix (H), and atom

(

H = M . M T .M ) M T

Total squared charge (Q2) represents measure of molecular polarity calculated by summing the squares of the atomic charges. This charge descriptor is electronic descriptor defined in terms of atomic charges and used to describe electronic aspects both of the whole molecule and of particular regions, such as atoms, bonds, molecular fragments etc. DELS is molecular electro topological variation which is electro topological descriptor derived from hydrogen depleted molecular graph representation of the molecule. It can be sensitive to one or more structural features of the molecule such as size, shape, symmetry, branching and cyclicity and can also encode chemical information concerning atom type and bond multiplicity. It is described by this formula:

DELS = I i i

Ii is the field effect on the ith atom due to the perturbation of all other atoms as defined by Kier and Hall [52].

I i =  j

6

5

5

Actual pIC50

6

3 2

0 3

4

Pred pIC50

5

6

7

+ 1)

2

Test set

2

0 2

ij

3

1

1

(d

4

1

0

Ii  I j

where the sum runs over all the other atoms in the molecular graph, I is the atomic intrinsic state and d the topological distance between the two considered atoms. The intrinsic

7

4

1

where M is the molecular matrix consisting of the centered Cartesian coordinates x, y, z of the molecule atoms (hydrogens included) in a chosen conformation, and the superscript T refers to the transposed matrix. This equation is almost similar to equation of leverage (h) which encodes atomic information and represents the “influence” of each molecule atom in determining the whole shape of the molecule [51].

Traning set

7

Actual pIC50

relatedness by topology with chemical information by using different atomic weighting schemes (unit weights, mass, polarizability, electronegativity). H1m correspond to autocorrelation of lag 1 / weighted by atomic masses.

0

1

2

3

4

Pred pIC50

Fig. (1). Scatter plot for (a) training and (b) test set between experimental versus predicted activity values.

5

6

7

QSAR Study of Curcumine Derivatives as HIV-1 Integrase Inhibitors

state of an atom is calculated as the ratio between Kier-Hall atomic electro negativity and the vertex degree, i.e. the number of bonds of the atom, encoding information related to both partial charges of atoms and their topological position relative to the whole molecule. Therefore, DELS is simply the sum of all atoms of the intrinsic state differences and could be a measure of total charge transfer in the molecule. Bonding Information Content (BIC5) (5 stand for neighbourhood symmetry of 5-order) is Information indices which is defined as total and information content of molecules. Indices of neighbourhood symmetry (like BICk, k = 1 to 5) are topological information indices calculated for an H-included molecular graph and based on neighbour degrees and edge multiplicity [53]. They are calculated by partitioning graph vertices into equivalence classes; the topological equivalence of two vertices is that the corresponding neighbourhoods of the kth order are the same. Different criteria are used for defining equivalence classes, i.e. equivalency of atoms in a molecule such as chemical identity, ways of bonding through space, molecular topology and symmetry. Hy is hydrophilicity index which shows a functions of a number of hydrophilic groups in the molecule and the number of carbon atoms. These descriptors include a set of heterogeneous molecular descriptors describing physicochemical and biological properties as well as some molecular characteristics obtained by literature models [54]. According to produced QSAR model (Equation-2), electronic properties (Q2, DELS) have great influence on activity. The parameter Q2 has negative contribution to the activity. The molecules 9-10, 12-13, 15-16 and 18-19 (-NO2 group) have high positive charge as well as molecules 11, 14, 17 and 20-39 (-OH group) have high negative charge. Because of these differences, these molecules may exhibit low (for molecules 9-10, 12-13, 15-16 and 18-19) to high (for molecules 11, 14, 17 and 20-39) inhibitory activity. Another electro-topological parameter DELS contribute positively to the model equation. The molecules 34-35 and 38-39 have three -OH, ester (-OCO-) and carboxylic acid groups (-COOH) in their structures which may impart favourable topology and high electron density to the molecules (Table S1 in supporting information). These groups were rendered to participate in metal ion coordination as these groups had highly electronegative atoms, resulting enzyme inhibition (as reported in docking studies of molecules 29, 32, 34, 35 and 39 [28]). This may be the reason to produce higher activity by these molecules. Besides these groups, molecules 1-8 have -Cl groups into their structures. The presence of this group may cause low value of DELS, hence contributes to low activity (Table 1). On other hand, active hydrogen present in –OH group was participate in H-bonding interactions with active site residues as found during docking studies of highly active molecules 29, 32, 34, 35 and 39 [28]. Owing these groups (OH, -OCO- and –COOH), these molecules may exhibit good inhibitory activity than molecules 1-4 and 6-8 during in vitro studies. The high contribution of these electronic parameters in developed model was concordance with high contribution of polar interactions [cation– interactions, electrostatic field, H-bonding (HBA and HBD),] in docking, CoMFA,

Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

147

and pharmacophore mapping of these series as previously published [28]. The DELS positively contributed to shape, size, symmetry and branching of structures as BIC5. All the molecules have tricyclic ring system with different substitutions at different position that render to different geometry (shape and size) and topology of molecules, so exhibit varying inhibitory activity. BIC5 and Mor2m have positive contribution for defining the inhibitory activity in developed model. The molecules 32-39 had -OH substitutions at both side on phenyl rings and -OCO- or COOH groups in middle ring as not found in other molecules. Because of these substitution, these molecules may be represented by high value of BIC5 and Mor22m in Table 1. These properties may impart favourable shape and size to the molecules which directly related to steric property that contributes to high activity of these molecules (Table S1 in supporting information). However, electronic properties (Q2 and DELS) had been found high for these molecules (as described above). H1m have negative contribution in QSAR equation. Like high value of Q2 descriptors value for molecules 9-10, 1213, 15-16 and 18-19 (-NO2 group), H1m had shown similar effect for these molecules as well (described above). As described above (detail of H1m), this property was related to substitution effect [electron withdrawing –NO2 (molecules 9, 10, 12, 13, 15, 16, 18 and 19) and -Cl (molecules 1-8)] or donating -OH (most of the molecules) at meta and para position]. These substitution effect may play important role for defining their activity from low to high (Table 1 and Table S1 in supporting information). The higher activity of molecules having -OH groups may be imparted with the sufficient capability to take part into binding of molecules into the active site (high contribution in docking interactions). Like high negative value of Q2 properties for molecules (11, 14, 17, 20-22, 24, 25, 28-30, 32 and 37) having -OH substitution at both side of phenyl rings, also had high negative value of H1m that rendered to high activity. Both Q2 and H1m have contributed equally for defining the activity of series molecules. However, the groups (-OCO- and -COOH) have more steric property as size and mass of the molecules (32-39) is increased. This may reflect high activity of these molecules as compared to molecules not having these groups (see Table S1 and Table 1). These properties are related to topological descriptors (DELS and BIC5). This was supported by CoMFA results as steric field is also required for binding of molecules, but less from electrostatics field. Moreover, hydrophilicity descriptor Hy has positive contribution for defining the inhibitory activity. Increase in hydrophilic groups [-OH (molecules 11, 14, 20-27, 30, 31, 33-35 and 37-39) and -COOH (molecules 33, 35, 37 and 39)] in their structures may cause higher activity. The molecules 1-4 and 6-8 (-Cl) and 9-10, 12-13, 15-16, 18-19 (-NO2) exhibited low activity, which can be explained here due to lack of hydrophilicity of these groups into their structures (Table S1 in supporting information). The presence of this hydrophilic index also influenced the other properties like Q2, DELS and H1m as described above. If number of substitutions of such groups like -OH groups will increase at both side of phenyl ring, potency of molecule may be

148 Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

increased. Other properties like Q2, DELS and H1m may also affect in this case. As result of this analysis, the presence of -OH groups in structure greatly affects the activity profile of these molecules as directly contributed to Q2, DELS, H1m and Hy properties. In addition, other groups -OCO- and -COOH in structures were also imparted to favourable shape, size and geometry that were related to H1m, BIC5 and Mor22m. These properties also affected the steric (shape and size) properties of molecules. The isosteric replacement [groups with similar size (steric) and electronic characteristics] of OH and -OCO- and -COOH substitutions into the structure, may increase the potency of molecules as Q2, DELS, H1m and Hy properties may change accordingly. Single property had not influence on activity profile of series molecules, but altogether had on it. One or two properties in model equation were attenuated or augmented by other properties due to some structural changes. For example, high values of Q2, H1m, DELS and BIC5 for molecules 32 contributed more to make them active than Mor22m and Hy (these had negative values as sign in model equation are positive). However, values of Q2, BIC5, Mor22m and Hy for molecule 16 contributed very less to make them inactive than H1m (have negative value) and DELS (have positive value). The overall effect of all the equation descriptors of both the molecules contributed on activity according to structural differences into their structures (see Table S1 in supporting information) that make them inactive or active (described above). So, these properties were found important for defining the HIV-1 IN inhibitory activity of series molecules during QSAR model analysis. The combined application of these properties can be exploited for designing of new inhibitors with good potency. One property for the molecules is not dominant in model equation to make it active or inactive as descriptors selection method is obvious for choosing the optimum set of descriptors (described in Section 2.1.2), but other properties may augment or attenuate the effect of one or more properties, by contributing positively or negatively in model equation, respectively. Some structural changes in structures may cause the dramatically changed the values of other correlated properties. Generally, inactive molecules have more negative values for positive contributed descriptors and vice versa to make them inactive. But in case of active molecules, more positive values for positive contributed descriptors and vice versa. Sometime, one or more descriptors inversely affect model equation unlike active or inactive molecules that can render to make inactive or active. This depends on the structural and its correlated properties (descriptors). This hypothesis is used to described the relation of each descriptors activity with descriptors values. 3.1.2. Model Validation and Predictive Ability The quality of developed model is accessed by high value of r2 0.891, r2cv 0.825 and F value 34.097 and low value of RMS 0.077. This is indicated that developed model is statistically significant. The developed model capable of defining 89.10% variance of total data set that means model have good fitting on training data. Moreover, high r2pred 0.849 for test set indicates good predictive ability of developed model. This is evident from this analysis that the

Gupta et al.

developed model can also be used confidently to predict the anti-HIV-1 IN activity of similar molecules. The predicted values for test set were found to be very close to their actual activities values as shown in Table 1. The outcomes of statistical analysis were showed that developed model was statistically robust. The developed model also passed Tropsha’s recommended tests for the predictive ability. 2 rpred = 0.849 > 0.6

(R

2

 R02 R

(R

2

)

2

 R0'2 R2

)

= 0.0059 < 0.1

= 0.0082 < 0.1

k = 0.965, k' = 1.03 In addition, R2m metric was also calculated for evaluation of predictive ability of model which was found within acceptable range 0.789 (>0.5), indicated good predictive ability. The developed model further validated by applying randomization test. The randomly generated data had low score than the original QSAR model (r2mean = 0.0085 at 100 random sample). As evident from this test, original QSAR model was not developed by chance, so confidence for model robustness was enhanced. The above results showed that MLR technique with successful variable selection methodology was very crucial to generate a successful QSAR model for modeling and predicting functional curcumine derivatives as HIV-1 IN inhibitors. 3.2. Applicability Domain Analysis The developed model is statistically robust and significant. But it cannot be expected that new molecules activity will be predicted reliably, so AD was performed for each new molecule to assess the prediction reliability from QSAR model using extent of extrapolation method [47]. The h* and h# limit for developed model are 0.621 and 0.414, respectively. According to this method, only those predictions of the molecules reliable, whose leverages lie within the AD. In Table 1 leverage values for the training and test sets are given. The leverage of test set molecules lied within the limit. So, prediction of test set is considered reliable as occupied within the AD of training set. However, molecules 35 (0.52) and 39 (0.47) had leverage greater than the high leverage h#, but less than warning leverage h*. These molecules were not much far away from domain of reliable prediction which can be considered extrapolation. Hence, these were considered reliable. Rest of the studied molecules had low leverage less than h* and h# means all molecules lied within the limit of AD of developed model. So this methodology strengthens the model reliability for prediction of HIV-1 IN inhibitory activity of molecules. 4. CONCLUSIONS In this study, QSAR model of curcumine derivatives was built for HIV-1 IN inhibitory activity. The developed model was significant, robust and has good internal r2cv = 0.825 and

QSAR Study of Curcumine Derivatives as HIV-1 Integrase Inhibitors

Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

external r2pred = 0.849 prediction. This model explained 89.10% of the variance in the experimental activity with an acceptable predictive power. The applicability domain analysis was also performed for test set molecules to check prediction reliability of developed QSAR model for these molecules which were excluded during model generation. These molecules occupied the same domain as training set molecules of QSAR model, hence predictions considered reliable. The molecular descriptors found in QSAR equation have encoded information about size, shape, symmetry, branching and cyclicity of the molecules. In addition to this, electronic effect, substitution effect and hydrophilicity were also found important parameter for defining the inhibitory activity that concordance to published results of docking studies, CoMFA and pharmacophore mapping. These physiochemical parameters can be served as important footprint for designing of novel and potent inhibitors. The developed model may also be used for screening of new molecules to estimate in silico HIV-1 IN inhibitory activities.

activities of 4-chloro-N-(4-oxopyrimidin-2-yl)-2-mercaptobenzenesulfonamide derivatives. Eur. J. Med. Chem., 2008, 43, 11881198. Bodiwala, H.S.; Sabde, S.; Gupta, P.; Mukherjee, R.; Kumar, R.; Garg, P.; Bhutani, K.K.; Mitra, D.; Singh, I.P. Design and synthesis of caffeoyl-anilides as portmanteau inhibitors of HIV-1 integrase and CCR5. Bioorg. Med. Chem., 2010, 19, 1256-1263. Zouhiri, F.; Mouscadet, J.F.; Mekouar, K.; Desmaele, D.; Savoure, D.; Leh, H.; Subra, F.; Le Bret, M.; Auclair, C.; d'Angelo, J. Structure-activity relationships and binding mode of styrylquinolines as potent inhibitors of HIV-1 integrase and replication of HIV-1 in cell culture. J. Med. Chem., 2000, 43, 15331540. Sechi, M.; Rizzi, G.; Bacchi, A.; Carcelli, M.; Rogolino, D.; Pala, N.; Sanchez, T.W.; Taheri, L.; Dayam, R.; Neamati, N. Design and synthesis of novel dihydroquinoline-3-carboxylic acids as HIV-1 integrase inhibitors. Bioorg. Med. Chem., 2009, 17, 2925-2935. Jin, H.; Metobo, S.; Jabri, S.; Mish, M.; Lansdown, R.; Chen, X.; Tsiang, M.; Wright, M.; Kim, C.U. Tricyclic HIV integrase inhibitors V. SAR studies on the benzyl moiety. Bioorg. Med. Chem. Lett., 2009, 19, 2263-2265. Donghi, M.; Kinzel, O.D.; Summa, V. 3-Hydroxy-4-oxo-4H-pyrido [1, 2-a] pyrimidine-2-carboxylates-A new class of HIV-1 integrase inhibitors. Bioorg. Med. Chem. Lett., 2009, 19, 19301934. Lin, Z.; Neamati, N.; Zhao, H.; Kiryu, Y.; Turpin, J.A.; Aberham, C.; Strebel, K.; Kohn, K.; Witvrouw, M.; Pannecouque, C. Chicoric acid analogs as HIV-1 integrase inhibitors. J. Med. Chem., 1999, 42, 1401-1414. Yu, S.; Zhang, L.; Yan, S.; Wang, P.; Sanchez, T.; Christ, F.; Debyser, Z.; Neamati, N.; Zhao, G. Nitrogen-containing polyhydroxylated aromatics as HIV-1 integrase inhibitors: synthesis, structure-activity relationship analysis, and biological activity. J. Enzyme Inhib. Med. Chem., 2012, 27,628-640. Sharma, H.; Patil, S.; Sanchez, T.W.; Neamati, N.; Schinazi, R.F.; Buolamwini, J.K. Synthesis, biological evaluation and 3D-QSAR studies of 3-keto salicylic acid chalcones and related amides as novel HIV-1 integrase inhibitors. Bioorg. Med. Chem., 2011, 19, 2030-2045. De Luca, L.; Barreca, M.L.; Ferro, S.; Christ, F.; Iraci, N.; Gitto, R.; Monforte, A.M.; Debyser, Z.; Chimirri, A. Pharmacophorebased discovery of small-molecule inhibitors of protein-protein interactions between HIV-1 integrase and cellular cofactor LEDGF/p75. ChemMedChem, 2009, 4, 1311-1316. De Luca, L.; De Grazia, S.; Ferro, S.; Gitto, R.; Christ, F.; Debyser, Z.; Chimirri, A. HIV-1 integrase strand-transfer inhibitors: design, synthesis and molecular modeling investigation. Eur. J. Med. Chem., 2011, 46, 756-764. Liao, C.; Nicklaus, M.C. Computer tools in the discovery of HIV-1 integrase inhibitors. Future Med. Chem., 2011, 2, 1123-1140. Almerico, A.M.; Tutone, M.; Ippolito, M.; Lauria, A. Molecular modelling and QSAR in the discovery of HIV-1 integrase inhibitors. Curr. Comput. Aided Drug Des., 2007, 3, 214-233. Cheng, Z.; Zhang, Y.; Fu, W. QSAR study of carboxylic acid derivatives as HIV-1 Integrase inhibitors. Eur. J. Med. Chem., 2010, 45, 3970-3980. Ravichandran, V.; Shalini, S.; Sundram, K.; Sokkalingam, A.D. QSAR study of substituted 1,3,4-oxadiazole naphthyridines as HIV-1 integrase inhibitors. Eur. J. Med. Chem., 2010, 45, 27912797. Ghasemi, G.; Nirouei, M.; Shariati, S.; Abdolmaleki, P.; Rastgoo, Z. A quantitative structure-activity relationship study on HIV-1 integrase inhibitors using genetic algorithm, artificial neural networks and different statistical methods. Arab. J. Chem., 2011, http://dx.doi.org/10.1016/j.arabjc.2011.1003.1006. Leonard, J.T.; Roy, K. Exploring molecular shape analysis of styrylquinoline derivatives as HIV-1 integrase inhibitors. Eur. J. Med. Chem., 2008, 43, 81-92. Toropova, A.P.; Toropov, A.A.; Benfenati, E.; Gini, G. Simplified Molecular Input-Line Entry System and International Chemical Identifier in the QSAR Analysis of Styrylquinoline Derivatives as HIV-1 Integrase Inhibitors. Chem. Biol. Drug Des., 2011, 77, 343360. Vengurlekar, S.; Sharma, R.; Trivedi, P. Two-and threedimensional QSAR studies on benzyl amide-ketoacid inhibitors of

[10]

[11]

[12]

[13]

[14]

[15]

CONFLICT OF INTEREST The authors declare that they have no competing interests.

[16]

ACKNOWLEDGEMENTS Pawan Gupta acknowledges the NIPER for providing fellowship. SUPPLEMENTARY MATERIAL

[17]

[18]

Supplementary material is available on the publisher’s web site along with the published article. REFERENCES [1] [2] [3]

[4] [5]

[6] [7]

[8]

[9]

Mehellou, Y.; De Clercq, E. Twenty-six years of anti-HIV drug discovery: where do we stand and where do we go? J. Med. Chem., 2010, 53, 521-538. Hicks, C.; Gulick, R.M. Raltegravir: the first HIV type 1 integrase inhibitor. Clin. Infect. Dis., 2009, 48, 931-939. Shimura, K.; Kodama, E.; Sakagami, Y.; Matsuzaki, Y.; Watanabe, W.; Yamataka, K.; Watanabe, Y.; Ohata, Y.; Doi, S.; Sato, M. Broad antiretroviral activity and resistance profile of the novel human immunodeficiency virus integrase inhibitor elvitegravir (JTK-303/GS-9137). J. Virol., 2008, 82, 764-774. Dayam, R.; Al-Mawsawi, L.Q.; Neamati, N. HIV-1 integrase inhibitors: an emerging clinical reality. Drugs R. D., 2007, 8, 155168. De Clercq, E. Anti-HIV drugs: 25 compounds approved within 25 years after the discovery of HIV. Int. J. Antimicrob. Agents, 2009, 33, 307-320. McColl, D.J.; Chen, X. Strand transfer inhibitors of HIV-1 integrase: bringing IN a new era of antiretroviral therapy. Antiviral Res., 2010, 85, 101-118. Hu, L.; Zhang, S.; He, X.; Luo, Z.; Wang, X.; Liu, W.; Qin, X. Design and synthesis of novel -diketo derivatives as HIV-1 integrase inhibitors. Bioorg. Med. Chem., 2012, 20, 177-182. Vandurm, P.; Guiguen, A.; Cauvin, C.; Georges, B.; Le Van, K.; Michaux, C.; Cardona, C.; Mbemba, G.; Mouscadet, J.F.; Hevesi, L. Synthesis, biological evaluation and molecular modeling studies of quinolonyl diketo acid derivatives: New structural insight into the HIV-1 integrase inhibition. Eur. J. Med. Chem., 2011, 46, 1749-1756. Brzozowski, Z.; Slawinski, J.; Saczewski, F.; Sanchez, T.; Neamati, N. Synthesis, anti-HIV-1 integrase, and cytotoxic

[19]

[20] [21]

[22] [23]

[24]

[25] [26]

[27]

149

150 Current Computer-Aided Drug Design, 2013, Vol. 9, No. 1

[28]

[29]

[30] [31]

[32] [33] [34] [35] [36]

[37]

[38]

[39] [40] [41]

HIV integrase and their reduced analogs. Med. Chem. Res., 2010, 19, 1106-1120. Gupta, P.; Garg, P.; Roy, N. Comparative docking and CoMFA analysis of curcumine derivatives as HIV-1 integrase inhibitors. Mol. Divers., 2011, 15, 733-750. Costi, R.; Santo, R.D.; Artico, M.; Massa, S.; Ragno, R.; Loddo, R.; La Colla, M.; Tramontano, E.; La Colla, P.; Pani, A. 2, 6-Bis (3, 4, 5-trihydroxybenzylydene) derivatives of cyclohexanone novel potent HIV-1 integrase inhibitors that prevent HIV-1 multiplication in cell-based assays. Bioorg. Med. Chem., 2004, 12, 199-215. Hawkins, D.M. The problem of overfitting. J. Chem. Inf. Comput. Sci., 2004, 44, 1-12. Gupta, P.; Garg, P.; Roy, N. Identification of novel HIV-1 Integrase inhibitors using shape-based screening, QSAR, and docking approach. Chem. Biol. Drug Des., 2012, 79, 835-849. Chernick, M.R.; Friis, R.H. Introductory biostatistics for the health sciences:Modern applications including the bootstrap, John Wiley & Sons: NJ, USA, 2003. Van Belle, G.; Fisher, L. Biostatistics: a methodology for the health sciences, John Wiley & Sons, Inc., : Hoboken, New Jersey, 2004. Randic, M. Comparative regression analysis. Regressions based on a single descriptor. Croat. Chem. Acta, 1993, 66, 289-289. Randi, M. Fitting of nonlinear regressions by orthogonalized power series. J. Comput. Chem., 1993, 14, 363-370. Tropsha, A.; Gramatica, P.; Gombar, V.K. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci., 2003, 22, 69-77. Kiralj, R.; Ferreira, M.M.C. Basic validation procedures for regression models in QSAR and QSPR studies: theory and application. J. Braz. Chem. Soc., 2009, 20, 770-787. Aptula, A.O.; Jeliazkova, N.G.; Schultz, T.W.; Cronin, M.T.D. The better predictive model: High q2 for the training set or low root mean square error of prediction for the test set? QSAR Comb. Sci., 2005, 24, 385-396. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009, 11, 10-18. Hall, M.A., Ph. D. Thesis, The University of Waikato, Hamilton, NewZealand, 1999. Witten, I.H.; Frank, E. Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, San Francisco, CA 94111: 2005.

Received: October 5, 2011

Gupta et al. [42]

[43]

[44] [45] [46] [47] [48]

[49] [50]

[51]

[52] [53]

[54]

Todeschini, R.; Consonni, V.; Mauri, A.; Pavan, M. Detecting “bad” regression models: multicriteria fitness functions in Regression analysis. Anal. Chim. Acta, 2004, 515, 199-208. Afantitis, A.; Melagraki, G.; Sarimveis, H.; Igglessi-Markopoulou, O.; Kollias, G. A novel QSAR model for predicting the inhibition of CXCR3 receptor by 4-N-aryl-[1,4] diazepane ureas. Eur. J. Med. Chem., 2009, 44, 877-884. Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors, Wiley-VCH: Weinheim, 2000. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model., 2002, 20, 269-276. Roy, P.P.; Roy, K. On some aspects of variable selection for partial least squares regression models. QSAR Comb. Sci., 2008, 27, 302313. Atkinson, A.C. Plots, Transformations and Regression, 1985, Clarendon Press, Oxford. Stanforth, R.W.; Kolossov, E.; Mirkin, B. A measure of domain of applicability for QSAR modelling based on intelligent K means clustering. QSAR Comb. Sci., 2007, 26, 837-844. Hawkins, D.M.; Kraker, J. Deterministic fallacies and model validation. J. Chemometr., 2010, 24, 188-193. Schuur, J.H.; Selzer, P.; Gasteiger, J. The coding of the threedimensional structure of molecules by molecular transforms and its application to structure-spectra correlations and studies of biological activity. J. Chem. Info. Compt. Sci., 1996, 36, 334-344. Consonni, V.; Todeschini, R.; Pavan, M. Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 1. Theory of the novel 3D molecular descriptors. J. Chem. Info. Comp. Sci., 2002, 42, 682-692. Kier, L.B.; Hall, L.H.; Frazer, J.W. An index of electrotopological state for atoms in molecules. J. Med. Chem., 1991, 7, 229-241. Magnuson, V.R.; Harriss, D.K.; Basak, S.C. Topological Indices Based on Neighborhood Symmetry: Chemical and Biological Applications in Studies in Physical and Theoretical Chemistry, Elsevier: Amsterdam, Netherlands, 1983. Todeschini, R.; Vighi, M.; Finizio, A.; Gramatica, P. 30-Modelling and prediction by WHIM descriptors. Part 8. Toxicity and physicochemical properties of environmental priority chemicals by 2D-TI and 3D-WHIM Descriptors. SAR QSAR Environ. Res., 1997, 7, 173-193.

Revised: December 9, 2011

Accepted: November 16, 2012