Project 4 Molecular Dynamics Simulations of Biological Structures Due electronically by 5:00pm, Wednesday, 5/31/06
Note that programming projects should be completed individually. You may discuss algorithms with others, but the coding should be done alone.
Contents:
- Introduction
- Project description
a. The simplified energy function
b. Simulating the motion of atoms
c. Extra credit
- Data input and output
- Visualizing your simulations
- Deliverables
- Submission instructions
- Examples
Introduction
Molecular dynamics simulations (MD) allow us to investigate and manipulate the motion and behavior of biological molecules. It is used widely for studying protein stability, ligand binding, molecular interactions, conformational changes, and protein folding in addition to having applications in drug design and structure determination. Conceptually, MD is purely physics-based. Objects move if a force is acting on them, and forces can be derived from the energy associated with those objects. In MD, given initial/current conditions for the individual objects (XYZ coordinates, velocities, etc), we define an energy function to describe the energy of the structure and use forces derived from the energy function to determine the positions and velocities of each object at each subsequent time step. These values are saved periodically as "snapshots" or frames, and a series of frames can then be visualized using existing software such as VMD to produce short movies of the entire structure moving. For simulations of biological molecules such as proteins, the components of the protein structure are usually the atoms or residues.
MD can be implemented with different levels of granularity, from complex all-atom and even more complex quantum simulations to simpler models where groups of atoms (such as an amino acid or a helix) are treated as single objects. There is an obvious tradeoff between speed and resolution, but MD at most levels can provide important insights into structure folding, interaction, and behavior. This is especially useful for the study of proteins, which derive their function from structure.
In this project, you will be given input files with initial values for atom positions in a protein structure, and will be writing code to simulate the movement of those atoms using basic physics formulas and other information provided. In the end, you will create output monitoring the stability of your simulation, as well as output that can be opened with a molecular dynamics visualization program so you can see short clips of the protein moving around under different conditions.
<< back to top
Project description
THE SIMPLIFIED ENERGY FUNCTION
A given protein structure will have a certain total energy depending on the potential and kinetic energies taken over all of the atoms. As discussed in lecture, the potential energy is calculated using an empirically derived function or "force field" containing several terms corresponding to the different contributions to potential energy: bond, angle, torsion, and nonbonded interactions. Deviations in these terms away from their reference values will result in higher energy (and thus greater instability). For this project, we will be using a simplified energy function containing only two terms, one for bond interactions and one for nonbonded interactions (see the lecture notes for a description of the standard 4 term force field). Bond interactions are due to direct chemical connections between atoms, and nonbonded interactions occur between atoms with no direct chemical bond.
Our energy function has the following functional form:

The first term captures the interaction between chemically connected or bonded atoms, which is modeled as a physical spring (if you remember from physics, the potential energy in a spring is generally E = (1/2)kx^2 where x is the distance from the center of the spring and k is the spring constant). The term thus sums the potential energy between all pairs of bonded atoms using the deviation in bond lengths away from their reference lengths. Note that the potential energy for this term is zero when all bonds are at their reference bond distance. Low or zero energy is favorable for the stability of a structure.
The second term approximates all of the nonbonded interactions in the system and we will also use a spring-type equation to model this (this is a big simplification - see the lecture notes for a more canonical expression modeling nonbonded interactions). This term will sum over all pairs of atoms that are NOT bonded to each other, but are within a certain distance from each other. We will use an input parameter to specify the distance cutoff for nonbonded interactions. In reality, we would need to account for charge, electrostatic and van der Waals interactions, but for the purposes of this project we will be modeling all nonbonded interactions with one simple term. Specifically, we will use the deviation in the distances between these pairs of atoms from their reference distance in a known structure.
SIMULATING THE MOTIONS OF ATOMS
In order to make the molecule "move", we must know how the positions of the atoms change over time. We get this by calculating the forces on each atom, using these forces to calculate the velocities of each atom, and then applying these velocities to obtain the new position of each atom for each time step. Force is the derivative of energy, and so you will need to take the derivative of the energy function given above to get an expression for the force on each pair of atoms. Note also that for every action there is an equal and opposite reaction, so for a force calculated between a pair of atoms, you should apply the force positively to one atom in the pair and negatively to the other. This also means that you only need to calculate the energy for each pair of atoms (bonded or nonbonded) once, e.g. if atom A is bonded to atom B, you need to calculate the bond energy A-B or B-A, but not both.
Deriving the forces from the energy function
You should be applying the force individually in each dimension to each appropriate pair of bonded or nonbonded atoms. Since the energy function consists of two additive parts, the derived force equation can also be broken up into two corresponding parts. Note that the two terms of the energy function are essentially identical - this fact should help you simplify part of your code.
Recall that force is a vector and thus has a magnitude as well as a direction. We need to find the component of the force vector in each of the three dimensions (x, y, and z) so that we can apply the force separately in each dimension (this is for practical reasons, since our atom positions are given in 3 dimensional Cartesian coordinates). Note that in general, you should store your data so that you can access and update the x, y, and z values separately for anything that has 3 dimensions.
 Imagine a pair of atoms as a pair of points given by the vectors i and j. The force (bonded or nonbonded) between them can be represented as a vector F of a certain magnitude in the direction of the vector joining i and j. Conceptually, the individual dimensional components of the force can be found by projecting F onto each of the three axes. This is done by multiplying the magnitude of the force by the unit vector along that axis. Since we already have our values separated into 3 dimensions, we just need to calculate the magnitude of that force component for each dimension. This is simply the magnitude of the force multiplied by the magnitude of the unit vector.
The magnitude of the force, F, is the derivative of the energy term with respect to the distance between the two atoms (either b or r, depending on which term you are calculating). The component of the unit vector for each axis is simply the vector between the two atoms in that dimension divided by the length of that vector. Use these two facts to derive an expression for the force acting on a pair of atoms. This expression should be the same for both bonded and nonbonded interactions. Remember that the force is applied equally and oppositely to each member of the atom pair.
Performing a time step
Previously, we asked you to calculate forces, update velocities, and update positions, in that order. This corresponds to the Euler integration method for calculating trajectories. We are now asking you to implement the more commonly used and more stable Verlet integration method, which should not require significant changes to your code.
In Euler integration, velocities and positions are calculated at the end of a time step, after all of the forces have been computed. This is sometimes called a "leapfrog" algorithm, because it essentially jumps an entire time step between updates. Verlet integration is more stable because it calculates positions in the middle of a time step rather than at the end. Using Verlet, one time step consists of first updating the velocities of each atom using the current forces on each atom, then updating the positions of each atom, then calculating the new forces, and finally updating the velocities again. Velocities of each atom are based on the sum of the forces acting on it (note that this means you should be keeping track of the total force on every atom during each time step). You should calculate velocities using the basic physics equation for velocity:

where dt in this case actually is half the given dt value. a is the acceleration, and is equivalent to F/m (Newton's second law of motion). Don't forget to update velocities in three dimensions (Hint: if you already have the force components for each dimension, you can use these to calculate the individual velocity components for each dimension). We will provide you with a constant atom mass for each atom.
The distance each atom traveled during the time step can be calculated from the velocities and the time step using the following equation:
s = v dt
where s is the new relative position, or distance traveled (you will need to add this to the previous position). dt in this case is the full time step value. After calculating new forces acting on each atom, you should then update the velocities again using the same velocity equation. Thus, by the end of a time step, we have updated the velocities by the full time step value, but have updated the positions by the values of the velocities at the halfway point. Remember, each atom is affected by numerous other (bonded and non-bonded) atoms, so you have to track all of these to compute the total force. You can easily compute the dimensional components of position by using the dimensional components of velocity in the above equation. Note that the energies and forces should be reinitialized to zero before each time step.
Note that the pairs of nonbonded atoms technically may change between time steps as a result of atoms moving within or beyond the distance cutoff. For our purposes, you should only find the nonbonded pairs once - at the beginning of the simulation using the input files - and use these as a constant set throughout the simulation.
An important detail in MD is the length of the time step. If you imagine an atom moving in a continuous trajectory, then our simulation is essentially a linear approximation of that trajectory derived by calculating the velocity and position of the atom at regular time intervals. We make the assumption that the force stays constant throughout the length of the time step, when in actuality it is changing constantly, and thus the velocities are changing constantly as well. The accuracy of our approximation depends on the length of the time step - we need to choose a time step that allows us to reproduce the motion of the atom realistically while staying within the limits of our computational resources. Too large of a time step will result in completely inaccurate calculations of velocity and position, and will cause the simulation to become unstable. In your implementation, you should allow the time step ( dt) to be a variable parameter, and we will test the effects of different size time steps on the simulation.
Keeping track of energy
You may have noticed that you never actually need to calculate any energy terms to simulate time steps (the force is all you really need to know, and you should have derived an expression for it). The total energy of the molecule, however, is a good indicator of its stability. It is also a good check on whether your code is working correctly - a proper molecular dynamics simulation with reasonable parameters should have essentially constant total energy (give or take 1 or 2 kiloJoules, in our case). You should keep track of the energy to monitor how your simulation is going and output the energy values at EACH time step. Recall that once the atoms start to move, they will have kinetic energy in addition to the bonded and nonbonded energy. You will need to include all three terms in your total energy calculation and write out all three terms in your output. You do not need to consider kinetic energy when calculating the forces on each atom.
Calculating velocities from temperature
In order for you to understand the effect of the temperature parameter in simulations, we have created four total versions of the input file (50 K, 300 K, 1000 K, and 4500 K, corresponding to cool, normal, hot, and blazing) that will run the same default simulation, but with different initial velocities corresponding to the different temperatures.
The temperature of the simulation primarily affects the initial distribution of atomic velocities. This is discussed at http://www.ch.embnet.org/MD_tutorial/pages/MD.Part1.html#Verlet (and other places).
EXTRA CREDIT: create code that accepts a starting temperature and creates a set of velocities that reflect this temperature, and which have zero initial momentum. The equation used to calculate these initial velocities will be some function of mass and temperature, and will involve the Gas constant and a probability distribution. Correctly implementing the extra credit will add 5% to your overall grade.
SUMMARY OF STEPS FOR CODING THE SIMULATION
- Parse an input file containing atom, coordinate, initial velocity (except where the extra credit applies), connectivity, and parameter information.
- Iterate through 1000 time steps, and for each time step do the following:
a. Update the velocities on each atom (the forces should be initialized to zero when you do this in the first time step, so your velocities won't change yet)
b. Update the positions of each atom
c. Make sure forces and energy are zero (if you are keeping global variables for these)
d. For each interaction pair (bonded and nonbonded):
i. Calculate the potential energy and update the total potential energy
ii. Calculate the force and update the total forces in each dimension for each atom
e. Update the velocities on each atom and calculate the kinetic energy
- Write the appropriate output every 10 time steps.
<< back to top
Data input and output
We will be giving you input "RVC" files (.rvc, r = position, v = velocity, c = connectivity) in tab-delimited format containing (one line per atom) the atom number, x coordinate, y coordinate, z coordinate, initial x velocity, initial y velocity, initial z velocity, and then up to 4 atom numbers corresponding to the atoms "connected", or bonded, to that atom. The first line of the file will be a comment containing the default parameters for the simulation (kB and kN constants, nbCutoff = the distance cutoff for nonbonded atoms, dt = time step, mass, T = temperature,). For example:
# 6pti.pdb: kB=40000.0 kN=400.0 nbCutoff=0.50 dt=0.0010 mass=12.0 T=300.0
atomNum X Y Z Vx Vy Vz connect1 connect2 connect3 ...
...
[etc]
You will use the connectivity information to determine the bonded pairs of atoms, and you will need to calculate inter-atom distances from the initial configuration to find all the nonbonded pairs of atoms. These nonbond reference distances (ro) are fixed through the simulation, and they do not change as the molecule moves, since they represent the "forces" that hold together the initial structural configuration. You should also calculate your reference bond lengths from the input file. Note that we are using a constant mass across all atom types (this is a simplification - in reality, every type of atom has a different atomic mass).
(For those of you doing the extra credit, use any one of the input files for connectivity, coordinate, and parameter information, but ignore the temperature parameter and initial velocity values. You will be specifying the temperature on the command line, and calculating initial velocities for that temperature.)
During simulation, you should output a similar RVC file every 10 time steps, concatenated together into one RVC file, like this example file. You should include the original input RVC file and you should use 1000 as the default number of total steps (feel free to perform longer simulations if you have the time and inclination), so you should have about 100 frames in your output file. You should also write a .out file containing the energies for each time step, similar to this example energy file.
You should then use the following Python script to convert your RVC files into AMBER coordinate files (.crd), which will be needed to visualize your simulations: convertRVCtoCRD.py
To run this script on a machine with Python installed, simply type:
> python convertRVCtoCRD.py input.rvc output.crd
AMBER stands for "Assisted Model Building with Energy Refinement", and refers both to a set of force fields commonly used in MD simulations of biomolecules, as well as to a popular package of MD simulation programs. The visualization program we are using is capable of loading the AMBER file format.
While you are not responsible for keeping track of units or converting between units, be aware that the units for mass, time, temperature, energy, and distance used in the RVC files are atomic mass units (AMUs), picoseconds, Kelvin, kilojoules, and nanometers, respectively. The units for distance used in most other structure files (e.g. PDB and CRD files) are in angstroms, and we have taken care of the unit conversions with the provided Python script.
You will be running your simulation code on BPTI, bovine pancreatic trypsin inhibitor (PDB ID: 6PTI). Download the 4 input files (6pti_*.rvc) from the Project 4 Files directory. Also download our version of the 6PTI PDB file (which has all the hydrogen atoms removed to reduce the number of atoms you'll be simulating). Your code will also need to take in parameters for atom mass, dt, kB, kN, nbCutoff (the distance cutoff for nonbonded pairs), and initial temperature (if you're doing the extra credit).
Perform 8 separate sets of simulations of BPTI with the following parameters (assume default values for unspecified parameters):*
- Default parameters (6pti_300K.rvc, other parameter values are in the header)
- Temperature = 50.0, 1000.0, 4500.0 (use the 3 other input files)
- kB = [100.0, 10000.0]
- kN = [1000.0, 10000.0]
- Atom mass = [2.0, 50.0]
- kB = 100.0 and kN = 10000.0
- time step (dt) = [0.005, 0.01, 0.05, 0.1]
- nbCutoff = [0.25, 0.75]
- EXTRA CREDIT: kB = 100.0 and T = 3000.0
* Be aware that some of these parameter sets (part of set 7 for certain) will result in very large values and may cause overflow errors. You might want to catch those errors when they occur, and should understand that this is one manifestation of the simulation becoming "unstable" - i.e. the molecule is behaving outside its physical bounds. We will be asking you to determine the time step value at which this seems to occur.
These 8 parameter sets will give you 17 separate simulations and should result in 17 RVC and 17 energy output files (18 if you do the extra credit). Convert each RVC file to a CRD file using the provided Python script. You MUST use the following file-naming scheme for each set of simulations (you will lose points if you don't!):
- default.rvc, default.crd, default.out
- T50.rvc, T50.crd, T50.out
- T1000.rvc, etc...
- T4500.rvc, etc...
- kB100.rvc, etc...
- kB10000.rvc, etc...
- kN1000.rvc, etc...
- kN10000.rvc, etc...
- M2.rvc, etc...
- M50.rvc, etc...
- kB100_kN10000.rvc, etc...
- t005.rvc, etc...
- t01.rvc, etc...
- t05.rvc, etc...
- t1.rvc, etc...
- nB25.rvc, etc...
- nB75.rvc, etc...
- Extra credit: kB100_T3000.rvc, etc...
<< back to top
Visualizing your simulations
You will now view movies of your simulations to see how your molecules behave under the different conditions. To do this, you will need to download and install VMD ( http://www.ks.uiuc.edu/Research/vmd/).
After installing, open up VMD. You should see two windows, a graphics window and a "VMD main" control window. In the control window, go to File... New Molecule. This will open up a Molecule File Browser. In the browser window, clock "Browse" and choose the PDB file corresponding to the molecule you want to view (e.g. 6PTI - make sure you download our version here, or else the simulation won't work). Click "Load" (the molecule should appear in line form). Then click "Browse" again and choose the CRD file corresponding to the simulation you want to view. Click "Load" again. You should see the molecule move as the frames load.
Now to play the movie, you press the play button in the bottom right corner of the control window. Pressing that button again will pause the movie. You can adjust the speed of the movie by adjusting the slider bar next to "speed".
When you want to view a different simulation, you need to clear the old one. Select the molecule in the control window, then right click on it. If you are simulating the same molecule (e.g. 6PTI) but want to use a different CRD file, choose "Delete Frames..." and click "Delete". Then go to the Molecule File Browser and Browse and Load the new CRD file. If you want to start over from scratch, choose "Delete Molecule", then go to the Molecule File Browser, Browse and Load the PDB file and Browse and Load the CRD file as before.
Jot down descriptions of how the molecule moves under each parameter setting, and note differences between different simulations. You might want to look over Part II of the quiz at the same time.
For fun (and one of the quiz questions), you may want to play around with different graphical representations. In the control window, go to Graphics... Representations... This should bring up the Graphical Representations window. You can choose different ways to color the molecule and different ways to draw the structure (Warning: "Surf" and other surface drawing methods involve a fair amount of calculation in VMD and so will take a very long time to load. We don't recommend choosing those drawing methods!).
<< back to top
Deliverables
- Commented, working code. (30%)
- README file containing, at the very least, instructions for running your code, full pseudocode, and whether or not you did the extra credit. (20%)
- 17 (or 18, if you did the extra credit) .rvc, .crd, and .out files corresponding to the simulations we specified above. (20%)
- Completed Project 4 quiz: doc | PDF (25%)
- Completed project survey: doc | PDF (5%)
- This is the first year this project has been included in the course, so the survey is for us to get your detailed feedback on it. Please take the time to think about the questions and provide reasonably thoughtful responses (you will get the full 5% if you make an effort at this). Thanks in advance!
<< back to top
Submission instructions
Log in to an elaine machine:
ssh sunetid@elaine.stanford.edu
Place all of your files from 1-5 above into one directory, e.g.
~/biomedin214/p4/submit
Please only include those listed above. Delete executable binaries if you have them (e.g. if you used Java/C/C++); we will compile your source code.
cd into your submit folder:
cd ~/biomedin214/p4/submit
Run the class submission script:
/usr/class/biomedin214/bin/submit p4
Note the space before p4. Be sure to run the script from within your submission folder (where all your files to be submitted are)!
You may submit multiple times; simply re-run the script. Each new submission overwrites the previous submissions. Your submission date will be the final submission received, and late days will be charged accordingly. Don't forget the quiz and the survey!
<< back to top
Examples
To help you debug your code, we are providing some test output RVC files along with their parameters. You should run your code with the same parameters and check that your output is similar to the test output. Numbers may be slightly off - just make sure that you are reasonably close. We will be grading your actual output by the same criteria; i.e. as long as things "look right", you should receive full credit. Also, since we are simulating 1000 time steps, expect your code to take a little while to run. Speedy code might be able to do one simulation in a minute or two, less efficient but working code should still take less than 10 minutes.
Example 1 (kB=1234.0): .rvc, .out Example 2 (mass=16.0): .rvc, .out
Example 3 (nbCutoff=0.65): .rvc, .out
For those doing the extra credit, you can check whether you're calculating reasonable initial velocities by comparing to one of the input files we've given you. Simply use the same temperature (e.g. 50, 300, 1000, or 4500 K), and check that the velocities in your output RVC for the first frame are similar to the given input RVC.
<< back to top
|