Spectral properties of contact matrix: Application to ... - Springer Link

1 downloads 0 Views 371KB Size Report
Nov 15, 2005 - interesting to compare the protein modelled as a close packing of amino acids, with a random close packing of spheres. The main features of ...
Eur. Phys. J. E 18, 321–333 (2005) DOI: 10.1140/epje/e2005-00037-6

THE EUROPEAN PHYSICAL JOURNAL E

Spectral properties of contact matrix: Application to proteins J.F. Sadoc1,a Laboratoire de Physique des Solides, Universit´e Paris Sud (associ´e au CNRS), Bˆ at. 510, Centre d’Orsay, 91405 Orsay, France Received 3 September 2004 / Received in final form 21 March 2005 c EDP Sciences, Societ` Published online 15 November 2005 –  a Italiana di Fisica, Springer-Verlag 2005 Abstract. A protein can be modelled by a set of points representing its amino acids. Topologically, this set of points is entirely defined by its contact matrix (adjacency matrix in graph theory). The contact matrix characterizing the relation between neighboring amino acids is deduced from Voronoi or Laguerre decomposition. This method allows contact matrices to be defined without any arbitrary cut-off that could induce arbitrary effects. Eigenvalues of these matrices are related with elementary excitations in proteins. We present some spectral properties of these matrices that reflect global properties of proteins. The eigenvectors indicate participation of each amino acids to the excitation modes of the proteins. It is interesting to compare the protein modelled as a close packing of amino acids, with a random close packing of spheres. The main features of the protein are those of a packing, a result that confirms the importance of the dense packing model for proteins. Nevertheless there are some properties, specific to the hierarchical organization of the protein: the primary chain order, the secondary structures and the domain structures. PACS. 87.10.+e General theory and mathematical aspects

1 Introduction The properties of a protein are not only related to its structural stability, but also to its excitation states. A knowledge of it elementary excitations is therefore a fundamental step toward understanding the stability and the structural deformation. Excitations and motion in proteins have been extensively studied from different point of view. The accurate details of these motions can be modelled using molecular dynamics, at least if realistic potential are used [3]. The structure and the dynamics of a protein are characterized by the energy or conformation landscape. The energy landscape is described in 3M dimensions, where M is the number of atoms in the protein. However, it is difficult to avoid tedious numerical calculation if we want to investigate this huge space in order to have systematic information on proteins [1]. Nevertheless there are simplifications that dispense with the need for a precise model. For instance it has been shown by Tirion [2] that at the atomic scale a simple harmonic potential between neighboring atoms gives good results. It is certainly a property, that results from the compactness of proteins. Similarly, but less accurately, in the case of precise local displacements of atoms, are the coarse grained models in which calculation are based on the approximation of proteins to an elastic network connecting rigid components that a

e-mail: [email protected]

correspond to residues. Such models yield the main tendencies in the dynamics. For example in the Gaussian network model, the harmonic hamiltonian of an elastic connected network describes fluctuation dynamics [4,5]. These approaches are related to works on disordered systems and glasses [6,7]. The aim of the present paper is to extract from a very simplified representation of a protein, the relevant properties found in a dense random structure, and those that are specific to proteins. Even if a protein is a very complex object, it must follow the usual constraints. First of all it has to respect geometrical and topological rules, which govern its structure [8]. These rules are simple, but they are in competition for the influence at the level of different components of the protein and this generates the complexity of the protein world. The topology restricts strongly the local geometrical properties of complex objects like proteins. We therefore propose a simplified approach to proteins, in which we consider amino acids as rigid entities and reduce the dynamic to those of a simplified object; the latter is achieved by reducing interaction to just the incidence relationships between amino acids. In order to specify neighborhood and define the incidence relationship we recently presented an analysis of proteins, as a dense packing of entities representing the amino acid (AA). As presented below, incidence relations are defined by a Voronoi decomposition without the ambiguities resulting from the

322

The European Physical Journal E

choice of a cut-off. That said, Voronoi decomposition does not give sizes that correspond well to the real size of the packed objects. The Voronoi decomposition increases the size of small AA (e.g. Gly) and reduces the size of large AA (e.g. Phe). There is, however, a modified Voronoi method, known as Laguerre decomposition that takes into account the size of the packed objects [9]. We have used here the Laguerre decomposition.

2 Dynamical matrix and contact matrix 2.1 Dynamical equations in a simple case Consider a mechanical system made of N identical masses, connected by identical springs. Under these simple conditions, it mechanical excitations are related to the eigenvalues of a N 2 square matrix whose elements ai,j are −1 if the sites i and j are close neighbors and 0 otherwise, except along the diagonal where they are the number of faces of the cell of the mass i. This is simply the solution of the following system of dynamical equations for all masses i, where the xi are displacement of mass. The indices u runs on all the z neighbors of i.  z   ∂ 2 xi m 2 =c (xu − xi ) · ∂t u=1 We set the weight m = 1 and the spring constant c = 1. The system of differentials equations is then a linear system defined by the matrix. Here the displacements xi − xj is a scalar quantity. In a recent paper Atilgan et al. proposed a model taking account of the force balance for each site that takes vectorial displacement [10] into account. Such approaches are interesting in order to yield realistic fluctuations in proteins and to simulate experimental results as B-factors. Nevertheless the excited states behaviors can be obtained from the simplest model. This model supposes that forces are proportional to relative displacements everywhere with the same coefficient. This is a good approximation in densely packed structures; it is however, less accurate for sites close to the surface, where displacements orthogonal and parallel to the surface cause different responses. 2.2 Contact matrix for protein The arrangement of amino acids (AA) in a given protein can usefully be represented by a contact matrix. This is a 2 symmetric NAA matrix with element ai,j = 1 if AA i is in contact with the AA j, or ai,j = 0 otherwise. The contact matrix has already been introduced by several authors, which assert that two AAs are in contact if their distance is smaller than a given cut-off distance (see for instance [11,12] or [10]). Usually, the distance between Cα atoms are used, but, if one views proteins as a packing of AAs, it seems preferable to use the distance between the geometrical AA centres. One advantage of this choice is that it yields a uniform distribution of first neighbor distances, by

contrary selecting Cα positions introduces a peak in the distribution, associated with neighbors along the chain. The Voronoi and Laguerre decomposition define neighborhood unambiguously, without relying to any arbitrary cut-off distance. This is related to the possibility to define a dual structure, the Delaunay tetrahedrization, where space is partitioned into tetrahedra, joined face on faces. Methods defining neighbors through a cut-off distance do not allow defining a tetrahedral structure. This is important as a tetrahedral structure is mechanically over constrained in terms of its number of degrees of freedom. Then a contact matrix, obtained from Voronoi (or Laguerre) decomposition generates the network automatically. Two replica of the network built from the contact matrix will be very close, with only small variations of edge lengths. We suppose here that we can reset a structure which yield to the exact contact matrix, from an imperfect realization only by local change. In 3D structures, this is only approximatively possible, with departure controlled by the mean coordination number average compared to a statistical value close to 13.5 (for infinite structure or for bulk sites in finite structures). The approximation is usually good for structures with mean coordination number between 13 and 15, which is the case of all dense structures ([13]). For triangulated structures, from Euler and Gauss-Bonnet relations, the contact matrix defines entirely the topology: it is a theorem. In 2D the mean coordination number for bulk sites is exactly six. So, from a mathematical point of view, contact matrices obtained using Voronoi or Laguerre decompositions are 2D representations that define a unique topological graph. In graph theory such matrices are called adjacency matrices. In physical language, the entropy of all possible 3D structures described by one contact matrix is zero, or very small. Since topology and metric are highly correlated in a dense packing, the contact matrix encodes most of the geometry of a protein. Nevertheless we should add information on the chirality of proteins because two proteins with the same chemical composition but with opposite chiralities have the same contact matrices. This is why we built [15–17] the contact matrix by stating that two AAs are in contact if their Voronoi or Laguerre cells share a face. We show how to read foldings in contact matrices on the following examples. The first presented protein, is a casein kinase (pdb code [18]: 1csn) (Fig. 1). It is a large molecule with 293 AA organized in two domains, containing α-helices and β-sheets. This example shows, how domains appear as block sub-matrices on the diagonal. The following example is Myohemerythrin (2mhr) (Fig. 2). This is a globular protein (118 AA) made of four helices. The third one is the signal transduction protein chey (Fig. 3) from escherichia coli (3chy). This is a protein (126 AA) formed from a β-sheet sandwiched between five α -helices. The last one (Fig. 4) is a large globular enzyme (3pte) with 347 AA, which show interesting symmetries in its fold. It is clear from these matrices, shown in Figures 1–4 that proteins have a strong, well defined topological order.

J.F. Sadoc: Spectral properties of contact matrix

Fig. 1. Contact matrix (293 × 293) for the 1csn protein. It appears clearly that this protein can be divided in two blocks corresponding to two domains: two block diagonal sub-matrices. The choice of the boundary of these sub-matrices can be such they have a common part.

Fig. 2. Contact matrix (118 × 118) for the myohemerythrin (pdb code: 2mhr) protein. This protein has four long α-helices, with length between 13 and 23.

On these matrices, some secondary structures show up on the edge of the matrix diagonal, as a rather broad stripe for an α-helix and a narrow one for a β-strand. This can be understood since, in an α-helix, AA number i is always connected to the AAs i ± 1, i ± 3, i ± 4 and sometimes to the AAs i ± 2. In a β-strand, AA number i is connected to AAs i ± 1 and i ± 2. Coils are less noticeable. It is important to remark that other contacts are not at all randomly

323

Fig. 3. Contact matrix (126 × 126) for the 3chy protein.

Fig. 4. Contact matrix (347 × 347) for the 3pte protein. For this large protein, the two chain ends are close, as shown by the contacts in the top right corner of the matrix. There are pseudo-symmetries along the anti-diagonal. In this large protein, the sub-matrices that can be observed along the diagonal indicate domains.

spread within the matrix but concentrated along lines that are either perpendicular or parallel to the diagonal. A line perpendicular to the diagonal appears if two secondary structures are antiparallel. Consider the two α-helices in 2mhr. They run from Ala 41 to Ala 64 and from Val 71 to Ile 84. They are linked together by a coil between 67 and 70. As two helices are refolded one onto the other, it occurs very often that if AAs i and j are neighbors, then several AAs i + n and j − n are also neighbors leading to

324

The European Physical Journal E

non-zero matrix elements lying on a segment perpendicular to the diagonal. Moreover, if the chain folds again onto two already folded parts, this may lead to a concentration of contacts along a segment parallel to the diagonal. Many such segments can be seen in the case of 3chy. Another intriguing observation can be made: in the last three cases there is a concentration of contacts in the bottom-left and top-right corners. This means that the two ends of the protein chain are close together. This situation, which occurs very often in protein chains, is worth being investigated.

3 Dynamical matrices for proteins 3.1 Some simple examples Contact matrices and dynamical matrices are symmetrical. Consequently, their eigenvalues are real numbers. From a contact matrix Γ one can define a dynamical matrix by using a diagonal matrix V whose elements vi,i are the numbers of neighbors to the AA i, that is the number of faces of its Voronoi cell. Then, the dynamical matrix is ∆ = V − Γ . By construction the determinant of this matrix, and one of its eigenvalues are zero. The others eigenvalues of ∆ are positive numbers, which are the squares of the frequencies of the excitations. Recall again that the protein is simplified to a set of rigid AA connected by elastic bond to their neighbors define by the Voronoi construction, a topological scheme of the protein. The eigenvectors are the normal modes of excitation. For the four proteins, 1csn, 2mhr, 3chy and 3pte, we have plotted (Fig. 5) the NAA eigenvalues, in increasing order. The corresponding spectrum (Fig. 6) are also presented. In order to consider how the excitations are localized in a protein, it is interesting to consider amplitudes of the eigenvectors. In Figure 7, we have represented some modes of the small protein 2mhr: the eigen modes for the first excited level, for a more generic level and for a high energy level. We observed that the main topics are similar for the studied proteins: - A relatively large gap between the fundamental level at ω = 0, corresponding to a global translation of the protein, and the first level, corresponding roughly to an oscillation of one half of the protein relatively to the other half. - A bell shaped density of states. - A few high energy levels, corresponding to vibrations of isolated AA. 3.2 The role of cut-off It is interesting to discuss the effect of the use of matrices defined by using a cut-off and to compare with matrices deduced by using a rigorous definition of neighborhood. A well adjusted cut-off can give good results because the excitation spectrum of an adjacency matrix is not too sensitive to small changes in the matrix. In

Fig. 5. Frequencies of normal modes (square roots of eigenvalues) for 1csn, 2mhr, 3chy and 3pte proteins (arbitrary units).

Figure 8, we show the contact matrices for the 2mhr protein with a too small cut-off (7 ˚ A) and a too large one (10 ˚ A). By comparison with the matrix shown in Figure 2, obtained from a Laguerre decomposition taking the sizes of the amino acid into account, it appears that some rigorous Laguerre contacts, do not appear even with large cut-off (see for instance the (35, 84) contact). A too large cut-off tends to reduce the width of the matrix spectrum: for a cut-off larger than the protein, all sites are neighbors to each other, and the network is a simplex having one zero frequency, and a highly degenerate finite frequency. A cut-off too small reduces drastically the connectivity of some sites, and consequently adds artificial low frequency modes to the spectrum. It is important to take this

J.F. Sadoc: Spectral properties of contact matrix

Fig. 6. Excitation spectra for matrices corresponding to three proteins (using binned frequencies).

artifact into consideration in studies involving low frequency modes. The Table 1 shows eigenvalues of a dynamical matrix for seven vertices of a regular pentagonal bi-pyramid (five tetrahedra sharing an edge). We consider the vertices as the centers of seven identical spheres closely packed together. If we take as the unit length, the edges of the pentagonal ring, the two apices are at a distance slightly larger than the spheres diameter, so there is a distribution of the edge lengths of the five tetrahedra. Changing the cut-off, it is possible to explore contact matrices different from the ideal one corresponding to contacts along the edges of the tetrahedra. Different sets of eigenvalues correspond to different cut-off. It shows variation of the distribution of the eigenvalue. The problem of representing a graph in space is related to the cut-off discussion. The graph is defined by edges connecting points, then by a contact matrix. Some graphs are 1D (a chain), some others are 2D graphs, some others are 3D graphs (the case concerning us in this paper), but some others could be physically realized only in spaces of high dimensions. For instance the bi-pyramid with a too large cut-off (e) is a simplex in 6D. This shows that unrealistic conditions could rapidly happen. A

325

Fig. 7. Amplitudes for three modes of 2mhr: first excited state, a generic state, and the higher state. Points of the first graph appear as a regular curve: this show that the chain order play an important role. Compare the first state to the positions of secondary structures: h1 19-37, h2 41-64, h3 71-84, h4 93-113.

challenging question is to understand what are the properties of a matrix that is compatible with a 3D network. In other terms, it is very easy to generate a network from a contact matrix if it is a matrix compatible with a 3D tetrahedrization, but otherwise for instance with matrix obtained with a bad cut-off, ambiguities make rebuilding difficult without escaping in higher dimensions. It is like building an electronic circuit on a card with a connection scheme that is not in two dimensions! From a physical point of view, this problem concern models in which we suppose local interactions between objects with a spatial extension (atoms, molecules, sand grains...). The contact network: must be in tetrahedrized 3D space. This is absolutely nescessary if interactions are local, as two objects interact together directly only if they are in true contact. On the other hand, in the case of non-local interactions like unscreened electrostatic forces it is not clear that a graph is a good representation of interactions.

326

The European Physical Journal E

Table 1. Eigenvalues for dynamical matrix of a network defined by the edges of a pentagonal bi-pyramid (five tetrahedra sharing an edge). Neighborhood is defined using different cutoff. a) The cut-off is very small, so only the vertices on the pentagonal ring are connected; there are two additional zero values due to the two isolated apices. b) With a slightly larger cut-off, the two apices are now connected to pentagonal ring sites, but not together. c) All the edges of the five tetrahedra form the network. d) Some second neighbor distances appear also as contact. e) With a cut-off too large, all sites are connected to all other sites. Increasing the connectivity reduces the dispersion of the eigenvalues: from an uniform distribution (a), to a single degenerate non-zero value (e). a b c d e

0 0 0 0 0

0



9− 5 2√ 9− 5 2√ 9− 5 2

7

0



9− 5 2√ 9− 5 2√ 11− 5 2

7

√ 5− 5 2

5

√ 9+ 5 2√ 9+ 5 2

7

√ 5− 5 2√ 9+ 5 2√ 9+ 5 2√ 11+ 5 2

7

√ 5+ 5 2√ 9+ 5 2

7 7 7

√ 5+ 5 2

7 7 7 7

3.3 The role of the environment In order to perform the Voronoi or the Laguerre decomposition, the protein is embedded in an environment of external sites, a random close packing of spheres with the average AA diameter. Contact matrices concern only contacts between AA, and the dynamical matrix has its diagonal elements equal to the number of AA close to a given AA. The determinant of this matrix is zero, and so is its lowest eigenvalue. This mimics a protein free in space, since the ground state is zero, corresponding to free translations. But, if we put on the diagonal the number of all neighbors to AA including environmental sites (the number of faces per Voronoi or Laguerre cells), the dynamical matrix simulates a protein elastically coupled to a cage working as a rigid environment. Then, the ground state is positive, corresponding to the oscillation of the protein in this cage. Some tests (Fig. 9) with such diagonal elements show that the width of the density of state is thereby reduced.

4 Comparison with a close random packing In order to compare these properties to that of a classical random packing we give results obtained with a random packing of spheres obtained with an efficient algorithm [19] derived from the well known Jodrey-Tory algorithm [20]. In this large packing we have extract a cluster of 220 spheres and analyzed the Voronoi cells and the resulting contact matrix, as for proteins. As expected the contact matrix appears as a cloud of random ones and zeros. We give in Figure 10 frequencies ploted with increasing mode numbers. This is very similar to what is presented for proteins in Figure 5. So we can conclude that proteins excitations spectrum are very close to the spectrum of close random packing. This support, the now usual point of view considering that a driving force of the stability of globular proteins is their compactness.

Fig. 8. Contact matrices obtained from distance between geometrical center of AA using cut-off, for the 2mhr protein. Compare with the Figure 3 matrix obtained with a Laguerre decomposition. a) A cut-off of 7 ˚ A, which does not take a large number of contacts especially for contacts between AA far away along the chain. b) A cut-off of 10 ˚ A, which overload the matrix, and nevertheless miss some real contacts.

Notice, incidently that we compare with random close packed structures and not with crystalline fcc structures. Effectively fcc structures are topologically unstable: any infinitesimal displacement change the number of close neighbors from twelve to more than fourteen. For an infinite random packing, the density of state vary with ω 2 at low frequencies. Analyzing the density of states, for contact matrices, it is difficult to conclude how g(ω) vary with ω at low frequencies, as the states are discrete. Nevertheless there are indications [2,21] of a

J.F. Sadoc: Spectral properties of contact matrix

327

Fig. 9. Frequencies and density of state for the protein 2mhr surrounded by a rigid environment. Compare to Figures 5 and 6. In this case the fundamental level has a positive value.

variation in g(ω) ∼ ω rather than g(ω) ∼ ω 2 . It is interesting to notice that for a pure chain it is a constant, so the experimental behavior is in between the three dimensional and the linear one. There are other behaviors indicating that proteins have some properties which differ from those of random close packed structure not only from a structural point of view, but also for dynamics. In Figure 10b, we give the amplitude of displacement for the first exited state classified with sites number. It is interesting to compare to results in Figure 7: in the random close packing case, there is no correlation of the excitation components i,j and i,j+1 as a change in numbering, changes the incidence relation between site j and j + 1. This point could appears trivial, as the numbering of sites in the random packing is not specific, when oppositely, for a protein we have a natural labelling along the chain. Following the hypothesis of a close packing behavior for a protein, we conclude that a re-labelling will not change things and so expect only few correlations between amplitude of excitations at sites j and j ± 1. In a model reducing a protein to a pure topological graph, there is no reason to have a different rˆ ole for edges along the chain or not. The eigenvector shows in Figure 7 for the first mode prove that the chain keep it specificity, even in a topological graph. We still have observed that Voronoi cell faces corresponding to neighbors along the chain [15] have on average more than six side compare to the total mean value of 5.1. This fact goes in the same direction, but is extended here to spectral properties. This prove that the chain structure strongly

Fig. 10. a) Frequencies of normal modes for a close random packing of 220 spheres. b) Displacement of sphere for the first excited state: as expected there is no correlation with labelling, which is not the case for proteins.

govern the spectral properties. This was not obvious with this model, as we have exactly the same coupling constant for AA in contact along the chain or not. It proves the occurrence of a topological signature of the chain structure. 4.1 Gap and compactness The occurrence of a large separation between the fundamental level and the first excited state is an indication of the stability of the structure. We can call it a “gap” even if levels are discrete because the separation is greatly larger than ωmax /N where ωmax is the largest frequency and N , the number of AA. In dense random close packing of a finite number of spheres, there are similar “gaps”, so it appears that such “gaps” are related to the compactness of the structure. Separation δ of the first exited state from the fundamental level is in fact a consequence of the space dimension: in 1D space, this δ  ωmax /N ; then it increase with dimension (see comments in Tab. 1). A small separation for a 3D structure indicates that the structure is badly connected with some floppy domains. 4.2 From graph stability to protein stability The densities of states of the topological graph presented here rise several questions that must be consider in term of biology.

328

The European Physical Journal E

The most important concerns the stability of proteins slightly perturbed. In fact we suppose in all this paper that the Voronoi decomposition topology does not change with small displacement. This is usually the case with generic structures, but it could have exception, the well-known being the crystallographic face centered cubic close packed structure, which is topologically unstable. A small perturbation of the structure, corresponding to small displacement of the AA, will evolve following the time dependence of its contributing modes. It can be expanded on the normal modes and it energy related to the energy of these modes. A gap between the fundamental state and the first level, means that it is impossible to have small deformation at very low energy cost. This is a factor of structural stability. Notice again, that conformation change, as helix changed into strands, does not enter in this analysis. Displacements are too large and they involve connectivity changes that are not supposed here as the contact matrix remains the same. Folding and un-folding are out this range.

Fig. 11. Frequencies for a α-helix of 29 AA, in increasing order.

5 Helices density of state In order to follow the transition between properties of a linear chain and those of a 3D structure, it is interesting to understand how isolated helices behave. An helix is intermediate between a linear structure and a 3D structure.

5.1 Isolated α-helices An α-helix is a simple model of folded protein. In topological sense it is defined by it connection. Consider a chain of AA folded in α-helix with AA numbered by i. The site i is neighbor to sites i ± 1, i ± 3 and i ± 4. So the dynamical matrix for an α-helix with n sites supposing periodic boundary conditions (the AA number 1 is connected to the AA number n) take the form:  6 −1 0 −1 −1 0.. −1 −1 0 −1   −1 6 −1 0 −1 −1 0.. −1 −1  0 −1 6 −1 0 −1 −1 0.. −1 −1     −1 0 −1 6 −1 0 −1 −1 0.. −1         0.. −1 −1 0 −1 6 −1 0 −1 0  ·    −1 0.. −1 −1 0 −1 6 −1 0 −1     −1 −1 0.. −1 −1 0 −1 6 −1 0    0 −1 −1 0.. −1 −1 0 −1 6 −1 −1 0 −1 −1 0.. −1 −1 0 −1 6 

Eigenvalues obtained for a helix of 29 AA is given in Figure 11. It main features are close to those of proteins. A surprising result appears if we follow the separation between the fundamental level and the first excited state. Figure 12 display this behavior for helices between 9 and 39 AA. There is a maximum occurring for n = 13.

Fig. 12. Separation between the fundamental and the first excited level for α-helices of different length. There is a maximum for helices with 13 AA.

5.2 α-helices as linear chains There is another description of excitations in an α-helices which considers it as a linear chain of identical masses with identical elastic couplings between neighbors i ± 1, i±3 and i±4. Therefore excitations are propagating waves with wave vector k (k is expressed in inverse of the length separating two masses of the chain). The dispersion relation giving frequency as function of k, is: ω 2 (k) = 6 − 2 cos (k) − 2 cos (3k) − 2 cos (4k) . This result describe an infinitely long helix. For a finite helix boundary conditions, fix k values defined in k ∈ [−π, π]. If we suppose periodic boundary conditions, possible k are: for odd n or k = −π + 2mπ k = −π + (2m+1)π n n for even n. The index m running from 0 to n − 1 labels the possible modes. On the Figure 13 the function ω (k) is represented with possible values of k corresponding to the particular case of n = 13. In this particular case it appears that the energies (or frequencies) are highly degenerate as there √ √are only three possible values ω 2 = 0, 13 + 13 and 13 − 13, with multiplicity g = 1, 6, 6. This explain why the length n = 13 play a particular role and why this phenomena does not occur for other chain length.

J.F. Sadoc: Spectral properties of contact matrix

Fig. 13. Dispersion relation for α-helices. Dots correspond to the 13 AA helix.

The density of state of an infinite helix (Fig. 13) can be obtained, from the dispersion relation. It linear behavior for small k, leads to a constant density of states, then for larger k, states are condensed resulting in a non-linear behavior of the density of states. Periodic boundary conditions are a very efficient tool in excitations analysis, but they does not correspond to the physical reality of small finite structures. In this case, conditions on the two ends of the chain, supposed free are more realistic. The dispersion relation remains the same, but k ∈ [0, π] with k = mπ n , m running from 0 to n. Then, there is not any special degeneracies for n = 13. As the length of secondary structure in proteins is on average close to 13 it is interesting to well understand this effect. We discuss in a further paragraph this point considering two helices refolded one on the other in an anti-parallel way.

5.3 Other types of helices in proteins It is possible to extend this analysis to other helices found in proteins. These other helices are mainly, the Coxeter helix (known in biology as 3.10 helix), the π-helix and the Pauling helix (or γ-helix). The β-strand can also be considered as a particular degenerate helix. For the Coxeter, α, π and Pauling helices, an AA number i is neighbor with AA number i ± 1, i ± r, i ± (r + 1), with respectively r = 2, 3, 4, 5. For β-strand, i is simply neighbor with i ± 1 and i±2. It follows that dispersion relations are for helices ω 2 (k) = 6 − 2 cos (k) − 2 cos (rk) − 2 cos ((r + 1)k) and for β-strand ω 2 (k) = 4 − 2 cos (k) − 2 cos (2k) . Leading to generic results on density of states close to those observed with the α-helix, except for the specific degeneracy for n = 13, which is never observed with other helices, what ever will be n.

329

Fig. 14. Density of states for α-helices. At low frequencies the density of state is that of a linear structure: a constant. For larger values it increases like in a 3D structure. See the comment on density of state of proteins in the section on random close packing.

5.4 General properties of the density of states for helices The density of states of infinite α-helices is represented in Figure 14. There are divergences corresponding to oscillations of the √ dispersion relation. The state with the largest k has ω = 8 but the largest frequency is ω = 2.9550. √ These value are close to z where z = 6, is the coordination number of AA. This is also of this order for the highest frequency found with dynamical matrices of real proteins using z  8. A helix density of states is not at all, like a Gaussian. They have high values for highest frequencies and for an intermediate frequency (ω = 1.94 with the reduced units) due to maximums of the dispersion relation. Even for not isolated infinite helix there are features reminiscent of such density of states. May be the small irregularities observed in Figure 5 for ω close to 1.7 for the two proteins (2mhr, 3chy), which contains a large number of helices are traces of this. 5.5 Stability of the α-helix of 13 amino acids There is a large gap between the fundamental level and the first excited state of α-helix with length close to 13. But we have related this gap to a specific degeneracy occurring only with periodic boundary conditions. We described the case of two helices of n AA related by two loops of negligible length, the first AA of the first helix being neighbor to the last AA of the last helix. This is a model inspired from the “tightened end fragments” observation of Chomilier, Trifonov and Berezovsky [22,23], which conclude to an increase of the contact probability for two AA separated by 27 chain steps. This is a consequence of an average length of secondary structures of the order of 13. In order to obtain the spectrum of this system we built 2 a contact matrix (2n) built from four n2 block matrices. The two diagonal block are matrices with 0 everywhere except for elements ai,i±1 , ai,i±3 , ai,i±4 = 1 corresponding to the two helices. The two other blocks are matrices with

330

The European Physical Journal E

of 20 AA. The behavior is close to that of real proteins. Figure 15c gives the gap for n from 9 to 40. Here again there is a maximum for n = 13 though the chain as free boundary conditions. We can concluded from this behavior, that excitation waves travelling from a secondary structure to another favor α-helices containing approximatively 13 AA. As a large gap appears only with periodic boundary conditions in single chain, it is not the intrinsic length of the secondary structure, which is favored, but the occurrence of 13 AA embedded in other secondary structures, with excitations travelling between secondary structures. Are α-helices favored compared to other helices for structural reasons (like steric interactions) but also for dynamical reason? This is an open question needing a deep investigation.

6 Amplitude of the excitations The eigenvectors of an excitation matrix give the amplitude of the excitation at sites for each normal mode. Due to the symmetry of the contact (and excitation) matrix, these vectors are orthogonal with real components, so it is possible to expand all excitations on this base. We have seen that the primary structure of the chain has a clear signature in spectral properties. However the secondary structures play also an important role. For low frequencies, the protein oscillations occur between large blocks, very often related to secondary structures. The first excited level corresponds in general to few blocks oscillating in phase opposition. For instance this is observed for the first state of the 2mhr protein (Fig. 7), which is built with four helices. The amplitude of this mode shows that helices behave at low frequency, like rigid rotating objects, with only relative displacement between two secondary structures. Fig. 15. For two anti-parallel helices: (a) frequencies in increasing order for two 20 AA helices; (b) spectra with binned frequencies for these helices; (c) gap for helix with lengths from 9 to 29.

6.1 Participation ratio

0 everywhere except for few elements close to the antidiagonal corresponding to contact between two faces of the helices. These block matrices are of the form:   0 . 000010 0 . 0 0 0 0 0 1   . . . 000 . .   0 . 0 1 0 0 0 0 0 . 1 0 1 0 0 0·   0 . 0 1 0 0 0 0   1 0 . 0 0 0 0 0 01 . 00000

Bell and Dean [24] introduced the idea of the participation ratio, a quantity that tells how many sites a particular mode is associated with. This has been extensively applied to disordered systems like glasses [25]. For an ideally localized state, the participation ratio could approach 1, whereas for delocalized states, the participation ratio could approach the number of atoms or sites N . Suppose we have a normalized eigenvector i whose N components i,j are the excitation amplitudes for each AA of the normal mode i, defined at each site j of the chain. The inverse participation ratio is defined by:  4 1/pi = |i,j | .

Then a dynamical matrix is built changing 1 into −1 and adding diagonal elements equal to the number of 1 in a row. Eigenvalues of this matrix give frequencies of excitations as shown in Figure 15 for a system of two chains

The participation ratio for the 2mhr and the 1csn proteins are shown in Figure 16, plotted as function of the frequencies. Both they display close behaviors. The participation ratio of the compact cluster shows similar behavior

j

J.F. Sadoc: Spectral properties of contact matrix

331

Fig. 17. Participation ratio for the 220 spheres random close packing.

Fig. 16. Participation ratio for 2mhr and 1csn proteins plotted versus the frequency.

(Fig. 17). This indicates again that proteins have properties related to the dense random close structure. For matrices obtained with proteins, the zero frequency level have p = N , then there are few high p modes at low frequencies, with p decreasing rapidly. Then, there are one or two modes with low frequencies and low p. Other modes from low to high frequencies have small participation ratio of the order of 20 corresponding to localized states. Highest frequency modes have p close to 2, indicating that there are only few sites contributing to it. High p at low frequencies indicate collective displacements associated with global motion of secondary structures. The low p values at low frequencies indicate that some part of the structure have some softness. They correspond to dangling of ends or outer loops of the chain. May be this is more visible for a protein than for the random structure? The other participation ratio correspond to localized modes of high energy corresponding to the response to these frequencies of a structure of high stiffness. 6.2 Energy scaling Clearly, there are no units to scale energies and displacements in this approach, where AA mass equal one, elastic coupling constant equal one, ... It is topology and in topology there is no metric. Nevertheless it is possible to make some hypotheses in order to allow comparison with real proteins. In protein there is a temperature Tg ∼ 150 K reminding glass transition temperature observed in glasses [1]. It

is the frontier between small oscillations and more anharmonic displacements. It occurs when low frequency modes are strongly filled by excitations with a large Boltzmann or Bose-Einstein factor. In this case, amplitude for these modes are too large for an harmonic approximation and the notion of individual mode is loosed. Nevertheless localized modes of higher frequencies remain defined. It is a good hypothesis to consider that the cross-over in the behavior of the participation ratio with frequencies, which decrease rapidly then stay relatively constant, occurs at an energy ω ∼ kTg . This gives a scaling for the quantum of energy excitation and for the density of states. Then, with this choice, the largest frequency, is of the same order as the room temperature energy. In order to test this hypothesis, we calculated for the myohemerythrin (pdb code: 2mhr) protein, the quadratic displacement of AA at low and room temperatures.

2  ωi 2 |i (j)| . xj = βωi − 1 e i The quadratic displacement of AA are free of scale, but they are proportional to excitation energy localized on each AA. Recall that i,j are normalized eigenvectors. This description is similar to the Gaussian network model introduced recently [4,5] in the limit of not too low temperatures. the Gaussian model consist to suppose an exponential energy dependence for excitations and then make the calculation with quadratic mean displacement in place of energy or frequency, two proportional quantities. In the Gaussian model the statistic is done on modes (oscillators), since it is done on quanta of excitations is the present approach. It is known that both approaches are equivalent, and it is generally accepted that both give the same results for temperatures not too close to the absolute zero. Nevertheless these two methods are equivalent using the Boltzmann distribution in the first case, and using the Bose-Einstein distribution in the other case. In Figure 18 we have plot the same quantities obtained from the crystallographic Debye or B-factors of atoms in the protein, using an average quadratic displacement for each amino acid of the protein. We remark that the experimental curve is more contrasted, but the position of the

332

The European Physical Journal E

the spectrum of the associated dynamical matrix. It is the main reason to study such a spectrum, even if its relation with experimental results are more qualitative than quantitative.

Fig. 18. Quadratic displacement for the myohemerythrin protein (pdb code: 2mhr) derived from the contact matrix (grey points), and experimental results extracted from the pdb Bfactors for all atoms of AA (black points). The temperature is of the order of 150 K for the two cases, using a scaling of temperatures explained in the text. The vertical scale is arbitrary. As guide for eyes two corresponding lines slightly smoothed, are drawn (black and grey). On the inserted graph the quadratic displacement for the model is plotted against the B-factor’s, in order to estimate correlation between the model and the real protein. The oversimplified model gives reasonable results compare to experiments: peaks on the two curves corresponding to large displacements appear roughly at the same positions.

most fluctuating AA are the same. It is not the aim to predict with this model, numerical values, but to have main feature of protein excitations. Notice that the results for the model are not very sensitive to the temperature scaling (except for the amplitude), so reasonable change in the scaling does not change these conclusions.

7 Conclusions A contact matrix is a topological description of a protein, that is a very efficient summary of its structure as a packing of amino acids. For dense random packed structures, it is now usually accepted that the topology constraints so much the structure, that most of the information is in the contact matrix. It was even proved that the largest eigenvalue and it corresponding eigenvector are sufficient to obtain the whole matrix [26]. This reduces the topological information to the knowledge of N + 1 real positive numbers derived from N 2 binary integers. So contact matrices are certainly tools simplifying the description of proteins and could turn to be an interesting way to classify proteins [17]. There are two important factors governing protein stability: the topology of the fold, and the dynamic of it excitations. These two factors are in fact intrinsically related and it is unrealistic to consider each individually. If we accept the schematic representation of the structure of a protein by a contact matrix, its dynamic is given by

The main behavior of a globular protein structure can be explained as a dense packing of amino acids. This can be observed, using the Voronoi decomposition, from the geometrical and topological parameters characterizing Voronoi cells. The contact matrix of a protein, is obtained directly from a Voronoi decomposition, without any ambiguity from an arbitrary cut-off introduced in order to define neighbors. This matrix contains the topological information on the spectral properties of the protein. This information over simplifies the real dynamical properties, but gives the general features. It has been shown in previous papers [16] that structurally proteins are well described (to first approximation) as a densely packed sets of AA. The spectral properties confirm this point: dynamical matrices obtained from contact matrices have an eigenvalue distribution very similar to dense random structures. The most important characteristic is the large gap between the ground state and the first exited state. Nevertheless, even if a protein can be viewed as a dense packing, it has its own specific properties. For example, the chain loses it meaning when the structure is analyzed as a packing of spherical elastic objects. But some structural details show that this is not completely the case: for instance the number of edges of the faces separating the Voronoi cells corresponding to two consecutive AA along the chain, is significantly larger than the average value. The spectral analysis also confirms the importance of the underlying chain structure: the study of the first eigenvectors shows correlation of their amplitudes with the position along the chain. Low frequency excitations travel along the chain. It is certainly an important property, showing that the covalent bonding along the chain induces some small anisotropy of the Voronoi cells, which has a strong effect on the first eigenvectors. Notice that we have chosen as seeds for the Voronoi or Laguerre decomposition, the points defined by the geometrical center of AA, and not the C-α atoms as is usually done, in order to avoid anisotropy due to the chain. It is interesting to see that even in this case, the chain structure remains visible. A contact matrix shows patterns associated to the three levels of organization found in proteins. The primary structure is trivially given by covalent elements of the two diagonals (i, i ± 1). The secondary structures appear also along the matrix diagonal, and domains divide the matrix into different sub-matrices along the diagonal (Fig. 1). Are these properties also observable on the spectral properties of the matrices? Several remarks on the spectrum and the eigenvectors of the dynamical matrices indicate that this is so. Then the three level of organization in proteins, primary, secondary and even tertiary, have footprints in the excitation spectra. Are excitations the source of this hierarchical organization? This is still an open question that requires further investigation?

J.F. Sadoc: Spectral properties of contact matrix A lot of this work have benefited from helpful discussions with Isabelle Caillebaut, R´emi Jullien, Jean-Paul Mornon and Nicolas Rivier. I would like to thank them, together with their colleagues and students.

References 1. H. Frauenfelder, B. MacMahon, Proc. Natl. Acacad. Sci. USA 95, 4795 (1998) 2. Tirion M., Phys. Rev. Lett. 77, 1905 (1996) 3. A. Kitao, N. Go, Curr. Opin. Struct. Biol. 9, 164 (1999) 4. T. Haliloglu, I. Bahar, B. Erman, Phys. Rev. Lett. 79, 2733 (1997) 5. I. Bahar, A.R. Atilgan, M. Demirel, B. Erman, Phys. Rev. Lett. 80, 2733 (1998) 6. W. Garber, F.M. Tangerman, P.B. Allen, J.L. Feldman, Phil. Mag. Lett. 81, 433 (2001) 7. F. Wooten, K. Winer, D. Weaire, Phys. Rev. Lett. 54, 1392 (1985) 8. D. Baker, Nature 405, 39 (2000) 9. J.F. Sadoc, R. Jullien, N. Rivier, Eur. Phys. J. B 33, 355 (2003) 10. A.R. Atilgan, S.R. Durell, R.L. Jernigan, M.C. Demirel, O. Keskin, I. Bahar, Biophys. J. 80, 505 (2001) 11. M. Vendruscolo, N.V. Dokholyan, E. Paci, M. Karplus, Phys. Rev. E 65, 061910-1 (2002) 12. N. Saitˆ o, Y. Kobayashi, Int. J. Mod. Phys. B 13, 2431 (1999)

333

13. J.F. Sadoc, R. Mosseri, Geomet. Frustr. (Cambridge Univ. Press, 1999) 14. F. Dupuis, J.F. Sadoc, R. Jullien, B. Angelov, A. Soyer, J.-P. Mornon, Bioinformatics; advance access June 24, 2004. See also the software VORO3D on http://www.lmcp.jussieu.fr/ dupuis/ 15. B. Angelov, J.F. Sadoc, R. Jullien, A. Soyer, J.P. Mornon, J. Chomilier, Proteins: Struct. Funct. Genet. 49, 446 (2002) 16. A. Soyer, J. Chomilier, J.P. Mornon, R. Jullien, J.F. Sadoc, Phys. Rev. Lett. 85, 3532 (2000) 17. F. Dupuis, J.F. Sadoc, J.P. Mornon, Proteins: Struct. Funct. Genet. 55, 519 (2004) 18. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. weissig, I.N. Shindyalov, P.E. Bourne, Nucl. Acid Res. 28, 235 (2000); see also the Protein Data Bank on the web site http://www.rcsb.org/pdb/ 19. R. Jullien, F. Jund, D. Caprion, Phys. Rev. E 54, 6035 (1996) 20. W. Jodrey, E. Tory, Phys. Rev. A. 32, 2347 (1985) 21. M. Tirion, Phys. Rev. B 47, 14559 (1993) 22. M. Lamarine, J.P. Mornon, I.N. Berezovsky, J. Chomilier, Cell Mol. Life Sci. 58, 492 (2001) 23. I.N. Berezovsky, V.N. Kirzhner, A. Kirzhner, V.R. Rosenfeld, E.N. Trifonov, Proteins Eng. 15, 955 (2002) 24. R.J. Bell, P. Dean, Discuss. Faraday Soc. 50, 55 (1970) 25. P.B. Allen, J.L. Feldman, J. Fabian, F. Wooten, Phil. Mag. B 79, 1715 (1999) 26. M. Porto, U. Bastolla, H.E. Roman, M. Vendruscolo, Phys. Rev. Lett. 92, 218101 (2004)