Amino acid side chain parameters for ... - Wiley Online Library

14 downloads 0 Views 680KB Size Report
Virginia Commonwealth University, Richmond, Virginia, USA; 414/2 A R Naarden, The Netherland.7;. 'Institute of Animal Science, Swiss Federal Institute of ...
Int. J . Peptide Protein Res. 32, 1988, 269-278

Amino acid side chain parameters for correlation studies in biology and pharmacology JEAN-LUC FAUCHERE’, MARVIN CHARTON’, LEMONT B. KIER’, ARIE VERLOOP4and VLADIMIR PLISKA’ ‘Department of Biotechnology, Swiss Federal Institute of Technology, ETH, Zurich, Switzerland; 2Departnient of Chemistry, Pralt Institute, Brooklyn, New York; ’Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, Virginia, USA; 414/2 A R Naarden, The Netherland.7; ’Institute of Animal Science, Swiss Federal Institute of Technology, ETH, Zurich, Switzerland Received 25 February, accepted for publication April 1988

Fifteen physicochemical descriptors of side chains of the 20 natural and of 26 non-coded amino acids are compiled and simple methods for their evaluation described. The relevance of these parameters to account for hydrophobic, steric, and electric properties of the side chains is assessed and their intercorrelation analyzed. It is shown that three principal components, one steric, one bulk, and one electric (electronic), account for 66% of the total variance in the available set. These parameters may prove to be useful for correlation studies in series of bioactive peptide analogues. Key words: amino acid side chain parameters; LFER parameters; QSAR in peptides; QSAR

parameters

One of the main limitations of correlation studies for bioactive peptides is the lack of reliable physicochemical amino acid side chain descriptors. This is mainly due to the difficulty in selecting those features which control the peptide-receptor interactions, and in finding conditions under which each property can be individually measured. In previous work, Sneath (1) quantitatively evaluated the similarity and dissimilarity of the 20 natural amino acids. He reasoned that small structural changes should bring about Amino acids are abbreviated according to the recommendations of the IUPAC-IUB Joint Commission on Biochemical Nomenclature (32). All amino acids have the L-configuration. QSAR, quantitative structureactivity relationships; LFER, linear free energy related; n.m.r., nuclear magnetic resonance.

small changes in the biological activity, and concluded on examples in the oxytocin and angiotensin I1 series that the use of correlations between biological activity and these factors would be “better than chance,” although not very reliable for predictive purposes. Neither the side chain properties defined in this work nor the extracted principal components - aliphaticity, hydrogenation, aromaticity, and hydroxythiolation are convenient for use in QSAR studies, since they are not continuous parameters. Darvas et al. (2), stated that conventional LFER parameters (3) alone cannot satisfactorily describe peptide-peptide or peptide-protein interactions, and introduced “peptide-tailored’’ physicochemical descriptors including a summation parameter supposed to represent nonspecific (“aspecific” (2)) side chain in269

J.-L. Fauchire et al. teractions which originate in hydrogen bonds, electrostatic and charge Lransfer effects. Since the summation index contains these effects as a whole, identification of the individual contributions to biological activity is no longer possible. Recently, Kidera et al. (4) analyzed 188 properties of the naturally occurring amino acids in a remarkable work focused mainly on the prediction of the three-dimensional structure of proteins. In addition to bulk and hydrophobicity, these authors identified pstructure preference and r-helix or bend structure preference as the two main representative factors. These two factors, although of paramount importance in protein folding, are of stochastic nature and cannot be established by direct measurements. For the biological activity of peptide drugs, they can hardly be of greater relevance than features such as charge, aromaticity, or presence of hydrogen bond donors or acceptors. Finally, this study cannot be easily extended to non-natural synthetic amino acids currently used in peptide drug design. Several examples have demonstrated the usefulness of QSAR studies of bioactive peptides in order to identify the factors effective in binding or proteolysis and to predict more potent, more stable, or more selective analogues (5-8). One convincing example is the study by Hellberg et al. (9) of bradykinin potentiating peptides in which correlations derived from a small number of derivatives modeled and predicted the activity of a large series of analogues. In the preceding studies, no consistent set of side chain parameters was used, each laboratory relying upon its own developed or measured descriptors for correlation or principal components analyses. The aims of this communication were therefore: (1) to establish a list of selected hydrophobic, steric, electronic, and other parameters for amino acid side chains; (2) to extend the list by our own measurements or computation to a number of unnatural synthetic amino acids; (3) to indicate simple methods for the measurement or predictive calculation of the descriptors from the chemical structure of 270

new side chains; (4) to establish the degree of separation or intercorrelation of the descriptors for the reported values; (5) to identify principal components among the side chain properties described by the parameters. METHODS

The initial set of structural parameters was generally obtained experimentally by direct measurements of the given property of the amino acid or derivative. This was the case for the hydrophobic constant, the polarizability, the pK, of the corresponding carboxylic acid, and for all the steric constants derived from Taft's constants. The n.m.r. chemical shift of the C,-carbon of several amino acids was measured here for the first time. The dH,.-values were obtained with the free amino acid in neutral D 2 0at 20" on a Varian XL300 spectrometer (Prof. J.F. Oth, ETH Zurich) with lock on D 2 0 under proton decoupled conditions and elimination of the Overhauser effect. The reference was the sodium salt of 2,2-dimethyl-2-silapentane-5-sulfonate. Other parameters, such as uregand u,, could be measured on molecular CPK models (10). Finally, other ones were theoretically derived, such as the graph shape index. Constants for new amino acid side chains can generally be calculated by empirical rules or obtained from correlations with various molecular features. Details are given here for each individual case. Correlation analysis and search for principal components was performed by programs of the BMDP library ( 1 1). RESULTS AND DISCUSSION

Hydrophobicity n The Ir-values (Tables 1 and 2 ) express the hydrophobicity of the amino acid side chain according to the equation:

Ir(side chain)

=

log P(amino acid) -

log P(glycine)

2

Id

0.31 -1.01 -0.60 -0.77 1.54 -0.22 -0.64 0.00 0.13 1.80 1.70 -0.99 1.23 1.79 (0.72) -0.04 0.26 2.25 0.96 1.22

Kd

1.28 2.34 1.60 1.60 1.77 1.56 1.56 0.00 2.99 4.19 2.59 1.89 2.35 2.94 2.67 1.31 3.03 3.21 2.94 3.67

I

-b

-

0.55 0.63 0.84 0.71 0.89

-

0.53 0.69 0.58 0.59 0.66 0.71 0.72 0.00 0.64 0.96 0.92 0.78 0.77 0.71

%%d

0.53 0.50 0.70 0.70 0.76

0.52 0.68 0.76 0.76 0.62 0.68 0.68 0.00 0.70 1.02 0.98 0.68 0.78 0.70

UC

2.87 7.82 4.58 4.74 4.47 6.11 5.97 2.06 5.23 4.92 4.92 6.89 6.36 4.62 (4.11) 3.97 4.11 7.68 4.73 4.11

L'

1.52 1.52 1.52 1.52 1.52 1.52 1.52 1.00 1.52 1,90 1.52 1.52 1.52 1.52 (1.52) 1.52 1.73 1.52 1.52 1.90

B,'

2.04 6.24 4.37 3.78 3.41 3.53 3.31 1.00 5.66 3.49 4.45 4.87 4.80 6.02 (4.31) 2.70 3.17 5.90 6.72 3.17

BSg

h

0.062 0.108 0.409 0.298 0.140

~

0.046 0.291 0.134 0.105 0.128 0.180 0.151 0.00 0.230 0.186 0.186 0.219 0.221 0.290

a

1.0 6.13 2.95 2.78 2.43 3.95 3.78 0.00 4.66 4.00 4.00 4.77 4.43 5.89 (2.72) 1.60 2.60 8.08 6.47 3.0

uv'

7.3 11.1 8.0 9.2 14.4 10.6 11.4 0.00 10.2 16.1 10.1 10.9 10.4 13.9 17.8 13.1 16.7 13.2 13.9 17.2

bHCJ

0.11 0.04 0.00 0.03 0.01

-

-0.01 0.04 0.06 0.15 0.12 0.05 0.07 0.00 0.08 -0.01 -0.01 0.00 0.04 0.03

6,'

1 1 0

0 2 0 0 0 1 1

0 2 1 0 1 0

1

0 4 2

nHI

19)

0

0 0 0 2 2 0 2

I

1 0 0

0 3 3 4 0 3 4 0

nnm

=

0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0

1

0

i,"

0 0 0 0 0 0 0 0

0

0 0 0 1 0 0 1 0 0 0 0

iAO

3.83 3.87 4.75 4.30 4.86

-

4.76 4.30 3.64 5.69 3.67 4.54 5.48 3.77 2.84 4.81 4.79 4.27 4.25 4.31

~K,(RCOOH)~

'

Hydrophobicity. bGraph shape index. Upsilon steric parameter. Smoothed upsilon steric parameter. '.':gSTERIMOL length, maximum, and minimum width, respectively. Torsion angles are 0" except for Phe and Tyr where the torsion angle between phenyl and the adjacent group (CH,,NH) is 90'. Polarizability. 'Normalized van der Waals volume. N.m.r. chemical shift of alpha-carbon. Localized electrical effect. Number of hydrogen bond donors. Number of full nonbonding orbitals. ,I Indicator of presence or absence of positive charge in side chain. " Indicaio: of presence or absence of negative charge in side chain. " - Log of dissociation constant of carboxylated side chain. "Torsion angle is 90".

Ala Arg Asn ASP CYS Gln Glu GlY His Ile Leu LYs Met Phe Pro Ser Thr TrP TYr Val

Amino acid

TABLE I Side chain paramerers /16-paramrter set) for narural amino acid side chains (except proline, n

ui

E

a

9

J.-L. Fauchkre et al.

3 3 3 rc. 3

3 3

c

s

-oo

0 3 3

3

co

-

d 2 d 3 d 0 0

-

0 N 0 3 -1

oooooo

N

oooo

CI r i

-

m N hi

oo

-

Amino acid side chain paranietcrs in which P is the partition coefficient of the amino acid and of glycine in octanol/water (1 2). The fundamental n-values for natural amino acids (Table I ) have been obtained at pH 7.1 with amino acid derivatives protected and thus uncharged in the backbone residue the N,-acetyl-amino-acid amides (1 3) - by partitioning in octanol/water and by using a similar equation for evaluation: n(side chain)

=

logP(Ac-amino acid-NH,)

In spite of some controversial acceptance of this scale ( 1 4) the authors consider it highly reliable because P was directly estimated in octanol/water and because of the pertinence of the derivatives and physiologically relevant pH used in measurements. The scale was used for the determination of atomic solvation parameters in proteins (15). The other nvalues (Table 2) have been determined by thin-layer chromatography and the Rf-values converted to n-values in octanol/water, by a method described earlier (1 2). All values reported here have been determined experimentally. New values can be obtained by thin-layer chromatography (12) as far as the L-form of the amino acid is available, or by calculation from the structure using either the fragment contributions without correcting factors as estimated in (12), or those of Hansch & Leo (16) with the appropriate correcting factors.

Gruph shape index = This parameter ( I 7) is a measure of the steric influence of a group which encodes the three attributes: complexity, branching, and symmetry of the group. It can be directly calculated from the molecular graph structure of the substituent, e.g. of the amino acid side chain. The index = is free of inductive, reasonance, or solvation effects. Although = is theoretically derived, it was found to correlate with the Taft’s substituent index Es, according to following equation: - ES = 0.40

=

-

0.60

which allows one to predict additional Taft’s

values that are not attainable by the classical ester hydrolysis model (1 8). Values of = are given in Tables 1 and 2 and the corresponding values for new amino acid side chains can easily be calculated by methods described in (19, 20). = (and the corresponding calculated Es-values) appears as a valuable steric parameter since it is a measure of the directed spatial influence of the group, it is independent of all electrical and solution effects, and it can be calculated for all types of substituents.

Up.yilon steric paramt.ter v This steric parameter 1) (21) was derived from the Taft’s constant Es and expressed a s a function of the minimal van der Waals radius. Values are available for a large number of substituents (21) and in particular for amino acid side chains (22). Unlike the original Taft’s constant, v is expressed in angstriims. In some cases it was necessary for its derivation to use effective values ofu obtained either from correlations of rate constants for acid catalyzed ester hydrolysis or from estimation equations. However, the upsilon parameter. since it is based on the Bondi/van der Waals radii, can be held as a most reliable measure of the steric effect. The values of reported in the next column of Tables 1 and 2, are directly related. although not identical, to those of upsilon. A tight correlation was observed between I ) and the minimal projection surface of the side chain (or of any substituent) taken perpendicular to the C,-C.. bond (23). Using the parabolic correlation obtained (with at least 51 groups) and taking the ii,,,-values on the regression curve, a new set of smoothed steric parameters was obtained. In this set, a few unexplainable discrepancies are eliminated as, for example, the higher value in the original m e t for the side chain of serine ( t i = 0.53) compared to threonine (o = 0.50). New values are attainable by estimating the projection surface of the relevant CPK molecular models of the uncommon side chain and using the same correlation equation (23). This set of ti,,,-values, which essentially describes (as do Es and o ) the steric effect as seen from the reaction center in the model compounds, 273

J.-L. Fauchere et a/. has also proven useful in a number of correlation studies.

STERIMOL multidimensional steric parameters L, B , , B, The STERIMOL constants characterize the steric bulk of a substituent by its dimensions in three different directions in space (24). We use here the revised version of these parameters, which contains the three quantities L, B , , and B, (25). L represents the length of the side chain measured in the direction in which it is attached to the glycine backbone. and B, and B5 are the minimum and maximum width, respectively. of the side chain, measured in directions perpendicular to L. The parameters are calculated by a computer software package directly from the structure of the side chain. The STERIMOL constants which have been shown to be useful in a number of QSAR studies are likely to help to investigate structure-activity relationships in peptides, too, especially in the cases where more than one side chain descriptor is required for steric bulk. Since they are easily derived from structure by calculation, their value can be predicted for new amino acid side chains even prior to synthesis. Polarizability a The polarizability SI is related to the molar refractivity MR, which in turn is experimentally given by: MR = (M/d)(n'

-

l)](n'

+ 2)

(n, index of refraction; M, molecular weight; d, density). Since M R is an additiveconstitutive property of a molecule, it can be easily calculated for any substitutent. From tabulated values of MR for common groups (16) and by the equation: a = (3/(4.rrN))(M/d)(n2- l)((n'

+ 2)

the polarizabilities a can be obtained (cf. also (26)) for known as well as for new amino acid side chains (Tables 1 and 2). Clearly 2 is a function of the molecular volume Mid. and thus a bulk parameter which models dispersion forces. The a-values have been scaled to make the coefficient in the regression equa2 74

tion roughly comparable to those obtained for other parameters. New values are easily obtained by simple arithmetic.

Normalized van deer Uhals volume u y This additional bulk parameter is the van der Waals volume of the amino acid side chain normalized according to the following equation (23): u, (side chain) = [V(side chain)

where V is the measured van der Waals volume on CPK models for the side chain or the hydrogen atom. respectively. From this, uL = 1 for the side chain of alanine and increases by one unit for each additional CH,group. Side chains of amino acids such as neopentylglycine and adamantylalanine, which are characterized by very similar uvalues, are distinct when described by vv. This parameter is easily measurable on CPK models for known as well as for new, not yet synthesi~ed amino acids. As a bulk parameter. it models dispersion forces and is highly correlated to the polarizability cr.

N.m.r. chemical .rhifi of a-carbon 6Hc The n.m.r. chemical shift 6H, (H, magnetic field strength; 6H, chemical shift; 6Hc I3Cchemical shift) of the alpha-carbon in amino acids has been proposed (27) as a descriptor of the electronic properties of the side chain, When expressed in ppm from the "Cchemical shift of the y-carbons of glycine, it can be considered a pure substituent parameter. As a matter of fact, this parameter primarily reflects the shielding of the C,nucleus by the nearby electronic systems of the side chain and thus incorporates the classical inductive and mesomeric effects of the substituent (side chain). However, 6Hc is not free of steric and hydrophobic contributions, as can be seen. for example, from a certain level of intercorrelation with n and E (Fig. 1). Values have been mcasured in a number of cases for the free amino acid and for the amino acid incorporated in a short peptide (28). Several newly measured values are re-

Amino acid side chain parameters P > 99.9%

c

a

1'1

2e r b .

Q

5 ?.a

,.

4

P > 99% P > 95%

.=. -< 3

FIGURE 1 Significance of the linear correlation coefficients (43 degrees of freedom)

ported here for the first time. The parameter is lower by 1.5 f 1 . 1 ppm when the side chain is contained in a peptide. For new amino acids, dH, can be calculated using the empirical rules of Horsley et al. (29). We have tested these rules and observed that in their present form, they do not even permit the calculation of the chemical shift for all natural amino acids. However they can be applied to new side chains according to the same scheme, using the additivity of the fragment contributions to dH,. The constant dH, has been successfully employed in several QSAR studies of bioactive peptides (8, 27).

Hydrogen bonding parameters nH and n,, These integer parameters expressing the number of OH and NH bonds, and the number of full nonbonding orbitals on 0 and N atoms, respectively, have been proposed for amino acid side chains (31). They can be evaluated by simple inspection of the structural formula of the substitutent. In QSAR studies of peptides they often play the role of indicator variables and can be of great help to detect the implication of hydrogen bonds in single side chains among large series of non hydrogen bonding side chains. Charge parameters iBand i, The presence or absence of charges in amino acid side chains can be accounted for by the parameters i, and i, for basic (negatively charged) and acidic (positively charged) groups, respectively. The parameter takes the value 1 or 0 depending on whether such a charge is present or not, but neglecting the fact that ionization may be incomplete at physiologically relevant pH.

Principal component analysis The data contained in Tables 1 and 2 describe properties of 45 amino acid side chains by means of 15 measurable parameters. The selection of these parameters is arbitrary and largely dictated by their availability. Therefore, both redundancy and missing properties cannot be fully excluded. We have investigated the matrix of correlation of the parameters (Fig. 1) and found a high level of significance of the correlation coefficients Localized electrical efSect parameter g, This constant has been clearly defined and between certain pairs of parameters. Since it appropriately scaled (30) for any given sub- can be anticipated that no more than three to stituent, and obtained for a number of amino four distinct properties such as hydrophobicacid side chains (26). Values for several not ity, steric bulk, and electronic features are common side chains are reported here for the expressed by this 15-parameter set, we have first time (Tables 1 and 2). The constant re- investigated the system by principal compresents mainly inductive field effects and is ponent analysis. For the particular choice of well separated from delocalized resonance 45 side chains, initial factor extraction showed that four factors were necessary to contributions. The pK,'s of the carboxylic acids R- explain 75% of the total variance. OrthoCOOH, in which R is an amino acid side gonal factor rotation and sorting out of the chain, are also compiled for natural side factor loadings (those less than 0.25 being set chains in Table 1. However, in contrast to 0, , to zero) led to the pattern given in Fig. 2. the pK, reflects overlapping localized and Three factors were retained and tentatively interpreted as side chain properties. The first delocalized electrical effects. 275

J.-L. Fauchere r t crl.

1

3 i

I I I I I

J I

I

1 1

I

I

I

factor 1

I

I

I

I

factor 2

1

factor 3

FIGURE 2 Compositior? of the tirsr three principal compont.rits: factor loading matrix. The factors explain 66% of the total tariance. Loadings less t h x n 0.25 IiLi\c bcen omitted

factor was clearly related to the volume of the side chain, since its loadings were high for the polarizability r , the van der Waals volume o x . and the two STERIMOL parameters L and B,. An apparent inconsistancy in this respect was the contribution of I ) ~ a~ ~parameter , which should not be primarily related to steric bulk. However. this contribution (0.27) to the first factor was relatively small and near to the threshold value for rejection. The second factor, again, had steric character and was clearly of the Taft type. These steric parameters are vector quantities with both absolute value and direction. They are likely to be proportional to the projection surface of the group perpendicular to the glycine (or peptide) backbone, as are its constitutive steric parameters 11. urCg. I (Es). and B , . Factor 3 appeared to be related to electronic properties as given by the number of nonbonding Tc-orbitals. the number of possible hydrogen bonds and the delocalized electrical (inductive) effect 0 , and the presence of charges. Most interesting \vas the fact that constants 276

~c and 6Hc were alniost evenly loaded in factors 1. 2, and 3 (n ;IS a negative loading in the third factor). This observation does not imply that these parameters are not important for the description of certain side chain properties. but that they are, due to their correlation with other parameters, already sufficiently represented by the first three factors, I n conclusion. our results tend to suggest that three factors are sufficient to describe amino acid side chain properties in LFER correlations. Since the number of parameters and of side chains in\ olved in this study was low. the factor composition cannot be considered generally valid and may vary from one set of side chains 10 another. However, a certain stability \\'as observed; so, for example. omitting 0 , and/or joining i, and i, into a single charge paramctcr did not change the structure of the fiic:ors considerably. Furthermore. the parameters described in this study were closely related to one of the desired properties, such as hydrophobicity, bulkiness. or electronic configuration. There

Amino acid side chain parameters is, therefore, little point in employing these factors in QSAR studies of peptides at present and the original parameters should preferably be used in correlation analysis.

matic side chains of Phe and Tyr 1' were the next to amalgamate. A second cluster contained relatively small side chains containing either a heteroatom or a polar bond: this cluster 2 may be represented by either Thr or propargylglycine. The third cluster was conCluster analysis The small number of side chains and of par- stituted of bulky aromatic side chains conameters used in this study did not make it taining at least one heteroatom, 3, as in possible to reach the ultimate goal of cluster dihydroxyphenylalanine or in pyrazinylalaanalysis: to order the substituents in groups in nine. A fourth cluster was made of the three such a way that each of member may be primary amines Lys, ornithine and diaminoconsidered as a representative of the whole butyric acid 4. A cluster was also formed by group. However, in the course of a prelimin- four fi-branched aliphatic side chains, 5 , as in ary analysis of the full series of available side Ile or in cyclopentylglycine. A next cluster chains (n = 4.9, a few clusters clearly contained bulky aromatic side chains, which, appeared, while a number of miscellaneous in contrast to those in cluster 3, were relativeside chains did not fall into clear-cut groups. ly hydrophobic (cyclohexylalanine also The list of unambiguously formed clusters amalgamated to this cluster) 6; one typical and the corresponding tree diagram are given representative would be 0-benzylserine. The in Fig. 3 . A first cluster 1 contained five side chains of glycine and alanine, as exaliphatic side chains of the about the size of pected, did not amalgamate in the earlier norvaline to which the two uncharged aro- clustering steps and behaved as singular species. This was also the case for several other side chains such as those of Bug, Asp 2 and Glu, or Arg. Although a certain stability 1 4 s 6 in clustering was observed (e.g. the amalgamation sequence was very similar for the subseries of the natural amino acids), further work on larger series will be required to establish more significant clusters. CONCLUSION

4 FIGURE 3 Apparent clusters found by cluster analysis of cases based on a 1 5-parameter set. The diagram show the order of amalgamation of individual clusters (in boxes) and the connections between them. The ordinate distances correspond to the Mahalanobis distances between the amalgamation points.

The collection of parameters compiled in Tables 1 and 2 contains a selection of substituent constants for amino acid side chains. As substituent constants they reflect the properties of the side chain more than those of the amino acid and except for the STERIMOL parameters and the pK,, the reference value for glycine is zero. The list is by no means intended to be complete, but it should quantitatively represent features relevant for, say, investigations of peptide drugreceptor interactions. The constants cover hydrophobic, steric, electronic and hydrogen bond donor/acceptor, and charge properties. No observed conformational preferences have been included since they appear to be consequences of the side chain properties (cf. (26)) as is the biological activity of the corre277

J.-L. Fauchere et ul. 16. Leo. A. ( I 9x8) Lop P and related parameters database. Medicinal Chemistry Project. Pomona College. Clarcmont. ('alifornia 17. Kim. L.B. (1987) Q i r ( i i i r . Srrrcc,. A(.,. Rela/. 6 , 117I22 I 8 Taft. R.W. (19561 i n Srcric Eflivt.c in Orgunic Chenii,srrj, (Neaman. M.S.. ed.). p. 556. Wiley, New York 19. Kicr, L.B. (1986) Jugo. J . Pliurrn. 36, 171-188 ACKNOWLEDGMENT 20. Kier. L.B. (1987) Medic.inu/ Res. Rev. 7, 417-470 This work was supported by the Swiss National Science 21. Charton, M . (1977) In Dcsign of Biophurmaceutical Pt'(J/JW/;