Introduction to Molecular Dynamics with GROMACS - KTH

28 downloads 416 Views 2MB Size Report
Introduction to ... some simple analysis programs that are part of Gromacs. .... Te purpose of this tutorial is not to master all parts of Gromacs' simulation and ...
Introduction to Molecular Dynamics with GROMACS Molecular Modeling Course 2007

Simulation of Lysozyme in Water Erik Lindahl ([email protected])

Summary Molecular simulation is a very powerful toolbox in modern molecular modeling, and enables us to follow and understand structure and dynamics with extreme detail – literally on scales where motion of individual atoms can be tracked. This task will focus on the two most commonly used methods, namely energy minimization and molecular dynamics that respectively optimize structure and simulate the natural motion of biological macromolecules. we are first going to set up your Gromacs environments, have a look at the structure, prepare the input files necessary for simulation, solvate the structure in water, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll test some simple analysis programs that are part of Gromacs. You should write a brief report about your work and describe your findings. In particular, try to think about why you are using particular algorithms/choices, and whether there are alternatives. Molecular motions occur over a wide range of scales in both time and space, and the choice of approach to study them depends on the question asked. Molecular simulation is far from the only available method, and when the aim e.g. is to predict the structure of a protein it is often more efficient to use bioinformatics instead of spending thousands or millions of CPU hours. Ideally, the timedependent Schrödinger equation should be able to predict all properties of any molecule with arbitrary precision ab initio. However, as soon as more than a handful of particles are involved it is necessary to introduce approximations. For most biomolecular systems we therefore choose to work with empirical parameterizations of models instead, for instance classical Coulomb interactions between pointlike atomic charges rather than a quantum description of the electrons. These models are not only orders of magnitude faster, but since they have been parameterized from experiments they also perform better when it comes to reproducing observations on microsecond scale (Fig. 1), rather than extrapolating quantum models 10 orders of magnitude. The first molecular dynamics simulation was performed as late as 1957, although it was not until the 1970’s that it was possible to simulate water and biomolecules.

Background & Theory Macroscopic properties measured in an experiment are not direct observations, but averages over billions of molecules representing a statistical mechanics ensemble. This has deep theoretical implications that are covered in great detail in the literature, but even from a practical point of view there are important consequences: (i) It is not sufficient to work with individual structures, but systems have to be expanded to generate a representative ensemble of structures at the given experimental conditions, e.g. temperature and pressure. (ii) Thermodynamic equilibrium properties related to free energy, such as binding constant, solubilities, and relative stability cannot be calculated directly from individual simulations, but require more elaborate techniques. (iii) For equilibrium properties (in contrast to kinetic) the aim is to examine structure ensembles, and not necessarily to reproduce individual atomic trajectories! The two most common ways to generate statistically faithful equilibrium ensembles are Monte Carlo and Molecular Dynamics simulations. Monte Carlo simulations rely on designing intelligent moves to generate new conformations, but since these are fairly difficult to invent most simulations tend to do classical dynamics with Newton’s equations of motion since this also has the advantage of accurately reproducing kinetics of non-equilibrium properties such as diffusion or folding times. When a starting configuration is very far from equilibrium, large forces can cause the simulation to crash or distort the system, and for this reason it is usually necessary to start with an Energy Minimization of the system prior to the molecular dynamics simulation. In addition, energy minimizations are commonly used to refine low-resolution experimental structures. All classical simulation methods rely on more or less empirical approximations called Force fields to calculate interactions and evaluate the potential energy of the system as a function of point-like atomic coordinates. A force field consists of both the set of equations used to calculate the potential energy and forces from particle coordinates, as well as a collection of parameters used in the equations.

lipid bond length lipid normal protein "biology" diffusion vibration rotation rotation folding rapid ribosome membrane transport in around bonds water synthesis protein fodling relaxation ion channel protein folding 10-15s

10-12s

10-9s

10-6s

10-3s

1s

103s

Accessible to atomic-detail simulation today

Fig. 1: Time scales if chemical/biological process and current simulation capabilities For most purposes these approximations work great, but they cannot reproduce quantum effects like bond formation or breaking. All common force fields subdivide potential functions in two classes. Bonded interactions cover covalent bond-stretching, angle-bending, torsion potentials when rotating around bonds, and out-of-plane “improper torsion” potentials, all which are normally fixed throughout a simulation – see Fig. 2. The remaining nonbonded interactions consist of Lennard-Jones repulsion and dispersion as well as electrostatics. These are computed from neighbor lists updated every 5-10 steps. Given the potential and force (negative gradient of potential) for all atoms, the coordinates are updated for the next step. For energy minimization, the steepest descent algorithm simply moves each atom a short distance in direction of decreasing energy, while molecular dynamics is performed by integrating Newton’s equations of motion: ∂V (r1,...,rN ) Fi = − ∂ri

∂ 2ri = Fi ∂t 2 The updated coordinates are then used to evaluate the potential energy again, as shown in the flowchart of Fig. 3. Typical biomolecular simulations € apply periodic boundary conditions to avoid surface artifacts, so that a water molecule that exits to the right reappears on the left; if the box is sufficiently large the molecules will not interact significantly with their periodic copies. This is intimately related to the nonbonded interactions, which ideally should be summed over all neighbors in the resulting infinite periodic system. Simple cut-offs can work for Lennard-Jones interactions that decay very rapidly, but for Coulomb interactions a sudden cut-off can lead to large errors. One alternative is to “switch off” the interaction before the cut-off as shown in Fig. 4, but a better option is to use Particle-Mesh-Ewald summation (PME) to calculate the infinite electrostatic interactions by splitting the summation into short- and long-range parts. For PME, the cut-off only determines the balance between the two parts, and the long-range part is treated by assigning charges to a grid that is solved in reciprocal space through Fourier transforms. Cut-offs and rounding errors can lead to drifts in energy, which will cause the system to heat up during the simulation. To control this, the system is normally coupled to a thermostat that scales velocities during the integration to maintain room temperature. Similarly, the total pressure in the system can be adjusted through scaling the simulation box size, either isotropically or separately in x/y/z dimensions. The single most demanding part of simulations is the computation of nonbonded interactions, since millions of pairs have to be evaluated for each time step. Extending the time step is thus an important way to improve simulation performance, but unfortunately errors are introduced in bond vibrations already at 1 fs. However, in most simulations the bond vibrations are not of interest per se, and can be The purpose of this tutorial is not to master all parts of Gromacs’ simulation and analysis tools in detail, but rather to give an overview and “feeling” for the typical steps used in practical simulations. Since the time available for this exercise is rather limited we will focus on a sample simulation system and perform some simplified analyses - in practice you would typically use one or several weeks for the production simulation. mi

Fig. 2: Typical molecular mechanics interactions used in Gromacs. removed entirely by introducing bond constraint algorithms (e.g. SHAKE or LINCS). Constraints make it possible to extend time steps to 2 fs, and in addition the fixed-length bonds are actually better approximations of the quantum mechanical grounds state than harmonic springs! In principle, the most basic system we could simulate would be water or even a gas like Argon, but to show some of the capabilities of the analysis tools we will work with a protein: Lysozyme. This is a fascinating enzyme that has ability to kill bacteria (kind of the body’s own antibiotic), and is present e.g. in tears, saliva, and egg white. It was discovered by Alexander Fleming in 1922, and one of the first protein X-ray structures to be determined (David Phillips, 1965). The two sidechains Glu35 and Asp52 are crucial for the enzyme functionality. It is fairly small by protein standards (although intermediate to large by simulation standards...) with 164 residues and a molecular weight of 14.4 kdalton - including hydrogens it consists of 2890 atoms (see illustration on front page), although these are not present in the PDB structure since they only scatter X-rays weakly (few electrons). In the tutorial, we are first going to set up your Gromacs environments (might already be done for you), have a look at the structure, prepare the input files necessary for simulation, solvate the structure in water, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll test some simple analysis programs that are part of Gromacs.

Setting up Gromacs & other programs Before starting the actual work, we need a couple of programs that might already be installed on your system. Since the actual installation can depend on your system we will do this part interactively at the tutorial, although you can find more detailed instructions at http://www.gromacs.org. 1. You will need a Unix-type system with an X11 window environment (look at the install disks if you have a mac). You can use CygWin under windows, although this is a bit more cumbersome, and you will also need to install a separate X11 server. 2. Gromacs uses FFTs (fast fourier transforms) for some interactions, in particular PME. While Gromacs comes with the slower FFTPACK libraries built-in, it is a very good idea to install the freely available and faster FFT libraries from http://www.fftw.org. If you have an Intel mac you’re in luck, since the Gromacs package comes with this included. You can also find finished RPMs for x86 Linux on the Gromacs or FFTW sites. If you need to install this from source, unpack the distribution tarball with “tar -xzvf fftw-3.1.2.tar.gz”, go into the directory and issue “./configure --enable-float” (since Gromacs uses single precision by default) followed by “make” and finally become root and issue “make install”. If you do not have root access you can install it under your home directory, using the switch “--prefix=/ path/to/somewhere” when you run the configure script.

4. Install PyMol from the web site http://www.pymol.org, or alternatively VMD. 5. Install the 2D plotting program Grace/ xmgrace. Depending on your platform you might have to install the Lesstif (clone of Motif ) libraries first (www.lesstif.org), and then get the Grace program source code from the site http://plasma-gate.weizmann.ac.il/Grace/. 6. To create beautiful secondary structure analysis plots, you can use the program “DSSP” from http://swift.cmbi.ru.nl/gv/dssp/. This is freely available for academic use but requires a license, so we will use the Gromacs built-in alternative “my_dssp” for this tutorial. If you are following this exercise as part of a lab course, the installation might already have been done for you.

Initial input data: Interaction function V(r) - "force field" coordinates r, velocities v

Compute potential V(r) and forces Fi = iV(r) on atoms

Repeat for millions of steps

3. Install Gromacs from a package, or compile it the same way as FFTW.

Update coordinates & velocities according to equations of motion Collect statistics and write energy/coordinates to trajectory files

More steps?

Yes

No

The PDB Structure - think before you simulate!

Done!

Fig. 3: Simplified flowchart of a standard molecular dynamics simulation, e.g. as implemented in Gromacs.

The first thing you will need is a structure of the protein you want to simulate. Go to the Protein Data Bank at http://www.pdb.org, and search for “Lysozyme”. You should get lots of hits, many of which were determined with bound ligands, mutations, etc. One possible alternative is the PDB entry 1LYD, but feel free to test some other alternative too. It is a VERY good idea to look carefully at the structure before you start any simulations. Is the resolution high enough? Are there any missing residues? Are there any strange ligands? Check both the PDB web page and the header in the actual file. Finally, try to open it in a viewer like PyMOL, VMD, or RasMol and see what it looks like. In PyMOL you have some menus on the right where you can show (“S”) e.g. cartoon representations of the backbone!

Creating a Gromacs topology from the PDB file When you believe you have a good starting structure the next step is to prepare it for simulation. What do we need to prepare? Well, for instance the choice of force fields, water model, charge states of titratable residues, termini, add in the missing hydrogens, etc. You will fairly quickly realize that most of this stuff is static and independent of the protein coordinates, so it makes sense to put it in a separate file called a “topology” for the system. The good news is that Gromacs can create topologies automatically for you with the tool pdb2gmx. There are several places where you can find documentation for the Gromacs commands. You should always get some help with “pdb2gmx -h”, which first includes a brief description of what the program

does, then a list of (optional) input/output files, and finally a list of available flags. The same information is present with “man pdb2gmx” (if it was properly installed), and you can also use the www.gromacs.org website: Choose “documentation” in the top menu, and then “manuals” in the leftside menu. There is both an extensive book describing everything and a shorter online reference to the commands and file formats. For the first command we’ll help you. pdb2gmx should read its input from a coordinate file, and just for fun we’ll pick a specific water model and force field. Type pdb2gmx -f 1LYD.pdb -water tip3p The program will ask you for a force field - for this tutorial I suggest that you use “OPLS-AA/L”, and then write out lots of information. The important thing to look for is if there were any errors or warnings (it will say at the end). Otherwise you are probably fine. Read the help text of pdb2gmx (-h) and try to find out what the different files it just created do, and what options are available for pdb2gmx! The important output files are (find out yourself what they do) conf.gro

topol.top

posre.itp

If you issue the command several times, Gromacs will back up old files so their names start with a hash mark (#) - you can safely ignore (or erase) these for now.

Adding solvent water around the protein If you started a simulation now it would be in vacuo. This was actually the norm 25-30 years ago, since it is much faster, but since we want this simulation to be accurate you are going to add solvent. However, before adding solvent we have to determine how big the simulation box should be, and what shape to use for it. There should be at least half a cutoff between the molecule and the box side, and preferably more. Due to the limited time you be cheap here and only use 0.5nm margin between the protein and box. To further reduce the volume of the box you could explore the box type options and for instance test a “rhombic dodecahedron” box instead of a cubic one. The program you should use to alter the box is “editconf”. This time you will have to figure out the exact command(s) yourself, but the primary options to look at in the documentation are “–f”, “–d”, “–o”, and “–bt”. As an exercise, you could try a couple of different box sizes (the volume is written on the output!) and also other box shapes like “cubic” or “octahedron”. What effect does it have on the volume? The last step before the simulation is to add water in the box to solvate the protein. This is done by using a small pre-equilibrated system of water coordinates that is repeated over the box, and overlapping water molecules removed. The Lysozyme system will require roughly 6000 water molecules with 0.5nm to the box side, which increases the number of atoms significantly (from 2900 to over 20,000). GROMACS does not use a special pre-equilibrated system for TIP3P water since water coordinates can be used with any model – the actual parameters are stored in the topology and force field. For this reason the program only provides pre-equilibrated coordinates for a small 216-water system using a water model called “SPC” (simple point charge). The best program to add solvent water is “genbox”, i.e. you will generate a box of solvent. You can select the SPC water coordinates to add with the flag “–cs spc216.gro”, and for the rest of the choices you can use the documentation. Remember that your topology needs to be updated with the new water molecules! The easiest way to do that is to supply the topology to the genbox command. Before you continue it is once more a good idea to look at this system in PyMOL. However, PyMOL and most other programs cannot read the Gromacs gro file directly. There are two simple ways to create a PDB file for PyMOL:

1. Use the ability of all Gromacs programs to write output in alternative formats, e.g.: genbox –cp box.gro –cs spc216.gro –p topol.top –o solvated.pdb 2. Use the Gromacs editconf program to convert it (use -h to get help on the options): editconf -f solvated.gro -o solvated.pdb If you just use the commands like this, the resulting structure might look a bit strange, with water in a rectangular box even if the system is triclinic/octahedron/dodecahedron. This is actually fine, and depends on the algorithms used to add the water combined with periodic boundary conditions. However, to get a beautiful-looking unit cell you can try the commands (backslash means the entire command should be on a single line): trjconv -s solvated.gro -f solvated.gro -o solv_triclinic.pdb \ -pbc inbox -ur tric trjconv -s solvated.gro -f solvated.gro -o solv_compact.pdb \ -pbc inbox -ur compact We’re not going to tell you what they do - play around and use the -h option to learn more :-)

Running energy minimization The added hydrogens and broken hydrogen bond network in water would lead to quite large forces and structure distortion if molecular dynamics was started immediately. To remove these forces it is necessary to first run a short energy minimization. The aim is not to reach any local energy minimum, so e.g. 200 to 500 steps of steepest descent works very well as a stable rather than maximally efficient minimization. Nonbonded interactions and other settings are specified in a parameter file (em.mdp); it is only necessary to specify parameters where we deviate from the default value, for example: ------em.mdp-----integrator = steep nsteps = 200 nstlist = 10 rlist = 1.0 coulombtype = pme rcoulomb = 1.0 vdw-type = cut-off rvdw = 1.0 nstenergy = 10 -----------------Comments on the parameters used: We choose a standard cut-off of 1.0 nm, both for the neighborlist generation and the coulomb and Lennard-Jones interactions. nstlist=10 means it is updated at least every 10 steps, but for energy minimization it will usually be every step. Energies and other statistical data are stored every 10 steps (nstenergy), and we have chosen the more expensive Particle-Mesh-Ewald summation for electrostatic interactions. The treatment of nonbonded interactions is frequently bordering to religion. One camp advocates standard cutoffs are fine, another swears by switched-off interactions, while the third wouldn’t even consider anything but PME. One argument in this context is that ‘true’ interactions should conserve energy, which is violated by sharp cut-offs since the force is no longer the exact derivative of the potential. On the other hand, just because an interaction conserves energy doesn’t mean it describes nature accurately. In practice, the difference is most pronounced for systems that are very small or with large charges, but the key lesson is

really that it is a trade-off. PME is great, but also clearly slower than cut-offs. Longer cut-offs are always better than short ones (but slower), and while switched interactions improve energy conservation they introduce artificially large forces. Using PME is the safe option, but if that’s not fast enough it is worth investigating reaction-field or cut-off interactions. It is also a good idea to check and follow the recommended settings for the force field used. GROMACS uses a separate preprocessing program grompp to collect parameters, topology, and coordinates into a single run input file (e.g. em.tpr) from which the simulation is then started (this makes it easier to prepare a run on your workstation but run it on a separate supercomputer). As we just said, the command grompp prepares run input files. You will need to provide it with the parameter file above, the topology, a set of coordinates, and give the run input file some name (otherwise it will get the default topol.tpr name). The program to actually run the energy minimization is mdrun - this is the very heart of Gromacs. mdrun has a really smart shortcut flag called “deffnm” to use a string you provide (e.g. “em”) as the default name for all options. If you want to get some status update during the run it is also smart to turn on the verbosity flag (guess the letter, or consult the documentation). The minimization should finish in a couple of minutes.

Carefully equilibrating the water around the protein To avoid unnecessary distortion of the protein when the molecular dynamics simulation is started, we first perform an equilibration run where all heavy protein atoms are restrained to their starting positions (using the file posre.itp generated earlier) while the water is relaxing around the structure. This should really be 50-100ps, but if that appears to be too slow you can make do with 10ps (5000 steps) here. Bonds will be constrained to enable 2 fs time steps. Other settings are identical to energy minimization, but for molecular dynamics we also control the temperature and pressure with the Berendsen weak coupling algorithm. This is mostly done by extending the parameter file used above: ------pr.mdp-----integrator = md nsteps = 5000 dt = 0.002 nstlist = 10 rlist = 1.0 coulombtype = pme rcoulomb = 1.0 vdw-type = cut-off rvdw = 1.0 tcoupl = Berendsen tc-grps = protein non-protein tau-t = 0.1 0.1 ref-t = 298 298 Pcoupl = Berendsen tau-p = 1.0 compressibility = 5e-5 5e-5 5e-5 0 0 0 ref-p = 1.0 nstenergy = 100 define = -DPOSRES

-----------------For a small protein like Lysozyme it should be more than enough with 100 ps (50,000 steps) for the water to fully equilibrate around it, but in a large membrane system the slow lipid motions can require several nanoseconds of relaxation. The only way to know for certain is to watch the potential energy, and extend the equilibration until it has converged. To run this equilibration in GROMACS you should once more use the programs grompp and mdrun, but (i) apply the new parameters, and (ii) start from the final coordinates of the energy minimization.

Running the production simulation The difference between equilibration and production run will be minimal: the position restraints and pressure coupling are turned off, we decide how often to write output coordinates to analyze (say, every 100 steps), and start a significantly longer simulation. How long depends on what you are studying, and that should be decided before starting any simulations! For decent sampling the simulation should be at least 10 times longer than the phenomena you are studying, which unfortunately sometimes conflicts with reality and available computer resources. This will depend a lot on the performance of your computer, but one option is to first run a very short simulation so you have something to work with while learning the analysis tools, and in the meantime you can run a longer simulation in the background. Select the number of steps and output frequencies yourself, but for reasons you’ll see later it is recommended to write coordinates at least every picosecond, i.e. 500 steps. ------run.mdp-----integrator = md nsteps = < select yourself! > dt = 0.002 nstlist = 10 rlist = 1.0 coulombtype = pme rcoulomb = 1.0 vdw-type = cut-off rvdw = 1.0 tcoupl = Berendsen tc-grps = protein non-protein tau-t = 0.1 0.1 ref-t = 298 298 nstxout = < select yourself! > nstvout = < select yourself! > nstxtcout = < select yourself! > nstenergy = < select yourself! > -----------------Storing full precision coordinates/velocities (nstxout/nstvout) enables restart if runs crash (power outage, full disk, etc.). The analysis normally only uses the compressed coordinates (nstxtcout). Use the documentation to figure out the names of the corresponding files. Perform the production run with the commands grompp and mdrun (remember the verbosity flag to see how the run is progressing).

Analysis of the simulation

You don’t have to do exactly these analyses, but try one or a couple of them that sound interesting and then use the documentation website or Google to see what else you can do if you prefer that. At least for the longer ones you will get clearer results with a longer trajectory. Deviation from X-ray structure One of the most important fundamental properties to analyze is whether the protein is stable and close to the experimental structure. The standard way to measure this is the root-mean-square displacement (RMSD) of all heavy atoms with respect to the X-ray structure. GROMACS has a finished program to do this, called g_rms that takes a run input file and a trajectory to produce RMSD output. It is usually a good idea to use a reference structure (the tpr file) from the input before energy minimization, since that is the experimental one. The program will prompt both for a fit group, and the group to calculate RMSD for – choose “Protein-H” (protein except hydrogens) for both. The output will be written to rmsd.xvg, and if you installed the Grace program you will directly get a finished graph that you can display with xmgrace rmsd.xvg. The RMSD increases pretty rapidly in the first part of the simulation, but stabilizes around 0.l9 nm, roughly the resolution of the X-ray structure. The difference is partly caused by limitations in the force field, but also because atoms in the simulation are moving and vibrating around an equilibrium structure. A better measure can be obtained by first creating a running average structure from the simulation and comparing the running average to the X-ray structure, which would give a more realistic RMSD around 0.16 nm. Comparing fluctuations with temperature factors Vibrations around the equilibrium are not random, but depend on local structure flexibility. The root-mean-square-fluctuation (RMSF) of each residue is straightforward to calculate over the trajectory, but more important they can be converted to temperature factors that are also present for each atom in a PDB file. Once again there is a program that will do the entire job - g_rmsf. The output can be written to an xvg file you can view with xmgrace, but also as temperature factors directly in a PDB file (check the output file options for the program). When prompted, you can use the group “C-alpha” to get one value per residue. The overall agreement should be quite good (check the temperature factor field in this and the input PDB file), which hopefully lends further credibility to the accuracy and stability of the simulation - your fluctuations are similar to those seen in the X-ray experiment. Secondary structure (Requires DSSP) Another measure of stability is the protein secondary structure. This can be calculated for each frame with a program such as DSSP. If the DSSP program is installed and the environment variable DSSP points to the binary (do_dssp -h will give you help on that), the GROMACS program do_dssp can create time-resolved secondary structure plots. Since the program writes output in a special xpm (X pixmap) format you probably also need the GROMACS program xpm2ps to convert it to postscript. Both commands are reasonably well documented, and the end result will be an eps file that you can view e.g. with ghostview or some other standard previewer application. Use the group “protein” for the secondary structure calculation. The DSSP secondary structure definition is pretty tight, so it is quite normal for residues to fluctuate around the well-defined state, in particular at the ends of helices or sheets. For a (long) protein folding simulation, a DSSP plot would show how the secondary structures form during the simulation. If you do not have DSSP installed, you can compile the Gromacs built-in replacement “my_dssp” from the src/contrib directory, or hope that your system administrator already did so for you.

Distance and hydrogen bonds With basic properties accurately reproduced, we can use the simulation to analyze more specific details. As an example, Lysozyme appears to be stabilized by hydrogen bonds between the residues GLU22 and ARG137 (the numbers might be slightly different for structures other than 1LYD), so how much does this fluctuate in the simulation, and are the hydrogen bonds intact? Gromacs has some very powerful tools to analyze distances and hydrogen bonds, but to specify what to analyze you need an “index file” with the groups of atoms in question. To create such a file, use the command make_ndx –f run.gro At the prompt, create a group for GLU22 with “r 22”, ARG137 with “r 137” (for residue 22 and 137), and then “q” to save an index file as index.ndx. The distance and number of hydrogen bonds can now be calculated with the commands g_dist and g_hbond. Both use more or less the same input (including the index file you just created). For the first program you might e.g. want the distance as a function of time as output, and for the second the total number of hydrogen bonds vs. time. In both cases you should select the two groups you just created. Two hydrogen bonds should be present almost throughout the simulation, at least for 1LYD. Making a movie A normal movie uses roughly 30 frames/second, so a 10-second movie requires 300 simulation trajectory frames. To make a smooth movie the frames should not be more 1-2 ps apart, or it will just appear to shake nervously. To create a PDB-format trajectory (which can be read by PyMOL and VMD) with roughly 300 frames you can use the command trjconv. The option -e will set an end time (to get fewer frames), or you can use -skip e.g. to only use every fifth frame. Write the output in PDB format by using .pdb as extension, e.g. movie.pdb. Choose the protein group for output rather than the entire system (water move too fast). If you open this trajectory in PyMOL you can immediately play it using the VCR-style controls on the bottom right, adjust visual settings in the menus, and even use photorealistic ray-tracing for all images in the movie. With MacPyMOL you can directly save the movie as a quicktime file, and on Linux you can save it as a sequence of PNG images for assembly in another program. Rendering a movie only takes a few minutes, and if that doesn’t work you can still export PNG files. PyMOL can also create (rendered) PNG files with transparent background if you do: set ray_opaque_background, off

Conclusions & Report The main point of this exercise was to get your feet wet, and at least an idea of how to use MD in practice. There are more advanced analysis tools available, not to mention a wealth of simulation options. Write a brief report about your work and describe your findings. In particular, try to think about why you are using particular algorithms/choices, and whether there are alternatives! If you want to continue we’d like to recommend the Gromacs manual which is available from the website: It goes quite a bit beyond just being a program manual, and explains many of the algorithms and equations in detail! There are also a few videos & slides available online - Google is your friend!