folding is one of the most fundamental and fascinating biological processes.
Moreover, it remains a long-standing problem to understand the detailed
mechanisms of protein folding from primary sequence to the native
three-dimensional structures. A number of small proteins with 10 to 100 amino acid
residues fold on the microsecond to sub millisecond timescales, known as
“fast-folding” proteins and serve as excellent model systems to study protein
folding. In this review, I will be discussing two papers that study one of
these “fast-folding” proteins in different ways. One paper will study Chignolin
folding through molecular dynamics simulations 1 – a standard
approach, while the other will study Chignolin folding through enhanced sampling
Proteins being essential to all living organisms are
linear polymers composed of amino acids. Amino acid polymers may also be referred
to as polypeptides, although scientists do not customarily use these terms
interchangeably. The term “Protein” generally refers to naturally occurring
molecules having a particular sequence and a defined 3-dimensional (3D)
structure, whereas “polypeptide” can refer to any polymers of amino acids,
regardless of length, sequence, and structure. “Peptide” is generally reserved
for a short oligomer that often lacks a stable conformation. In this review,
the protein in question is Chignolin. Chignolin is a synthetically constructed
protein consisting of only 10 amino acids. These are: Gly-Tyr-Asp-
Pro-Glu-Thr-Gly-Thr-Try-Gly. The reason why it is of interest is that, not only
is it the smallest known ?-hairpin structured protein known to be stable in a
solution 3, it also plays a massive role in investigating the
mechanisms behind protein folding. Chignolin’s simple structure of two beta
strands in a hairpin form is an ideal modelling system to investigate the
characteristic behaviours involved in protein folding processes and it provides
us information about the early events of protein folding – analogous to
examining cosmic background radiation to obtain information about the early events
of the universe.
The ?-hairpin structure of Chignolin gives it an upper
hand when requiring simulating. Hairpins are the simplest models for
protein-folding study because they exhibit much of the characteristic behaviours
typically seen in folding dynamics of proteins. In recent years, the folding
dynamics and energetics of a variety of ?-hairpin systems, like Chignolin –
such as the CLN025 4, Trp-cage 1 and villin 1,
have been well studied experimentally and theoretically. And report that the
simulated folding occurs on a microsecond timescale, which is perfect for
assessing it by both fast time-resolved experiments and by computer simulations
that use the latest high-performance computers to follow the entire folding
process. Thus, a complete picture of the folding mechanism can be obtained by
combining the experimental and theoretical findings.
Describing the methods
The theoretical underpinnings of molecular dynamics
(MD) simulations are simple – mimic the physical movements of the atoms and
molecules in the sample would do in real life. This allows the study of
biological and chemical systems at the atomistic level on very small
timescales. It complements experiments
while also offering a way to follow processes difficult to discern with
experimental techniques. MD simulations have increased dramatically in size,
complexity, and simulation timescale over the years, while the questions being
answered with these methods have also diversified. An MD simulation primarily
requires the definition of a potential function, this is a description of the
terms by which the particles in the simulation will be assumed to interact.
This is referred to as a force field, as the force is the derivative of this
potential relative to position.
The MD simulation type performed in the first paper
was an all-atom classical MD simulation of Chignolin using the Assisted Model
Building with Energy Refinement (AMBER) 12 simulation package. In this type of
simulation, the system has periodic boundary conditions, meaning that molecules
that exit one side of the system will wrap to the other side of the system. The
specific variation of AMBER force field that was used is known as ‘ff99SB force
field’ 1, this force field is exclusively for proteins only. In
general, the functional form of AMBER force fields is as follows 5
Where the first term depicts the energy between atoms
bonded covalently. The second term depicts the energy due to the geometry of
electron orbitals involved in that type of bonding. The third term depicts the
energy for twisting a bond between the atoms. And the final term depicts the
energy of all the non-bonded atom pairs – firstly van der Waals energies then
The theoretical underpinnings within the method used
in the second paper are not as simple. In this paper, a newly proposed enhanced
conformational sampling method was used on Chignolin named TaBu SeArch
Algorithm (TBSA). General conformational sampling is a computationally
intensive task and is normally defined as the process of modelling the target
conformational space by compiling a set of representative geometries 6.
In the case of protein folding, conformational sampling would be performed over
the space of possible side chain and backbone geometries until the convergence
of the target configuration is reached – in other words, lowest minimum energy.
TBSA is an iterative method in which a set of moves
that the possible side chains and/or backbone atoms can move with respect to
its neighbour. Each move can correspond to a change in bond length, bond angle
or dihedral angle and are all aiming to get to the final configuration, a fully
folded chignolin protein. If the move does not lead to the final product
(folded chignolin), then this move is known as a tabu move and is recorded onto
the tabu list – a list of forbidden moves.
This list however can be updated as the simulation
goes on, when a tabu move has a sufficiently attractive evaluation where it
would result in a solution better than any other move visited, then its tabu
classification is removed – this condition is known as aspiration criterion. The
iterative process is repeated until the final desired configuration is reached.
In the paper, the starting point for the iterative
conformational sampling process was an inverse energy histogram. This was used
to select the rarest occurring states, “seeds”, for the conformational resampling.
Seeds that lead to states with high frequencies were inhibited, while states
with low frequencies were efficiently sampled to explore the unvisited
conformational space. Each seed used for the conformational resampling was
assigned as a trajectory with an initial velocity and MD simulation was then
performed. If the trajectory used from that particular seed did not converge,
then the inverse histogram was updated, and a new seed was selected to start
the process again. This can be better visualised in figure 1 below.
As mentioned before, Chignolin was simulated using the
AMBER 12 software package with the ff99SB force field. The exact model that was
used is known as “SPFP precision model”. Which is similar to most MD
simulations where an energy minimization is performed to ‘relax’ the system and
exclude steric clashes or funky geometry. Then equilibration is implemented to
equilibrate the solvent around the protein. This is then followed by obtaining RMSD
values and radius of gyration values to ensure the system is stable. However,
the slight difference with the SPFP precision model is that it does all this
but replaces double precision arithmetic with fixed point integer arithmetic
for the accumulation of force components. This model significantly boosts
performance on modern GPU hardware without sacrificing numerical accuracy 7
and therefore saves computational time.
Another slight difference in this paper’s MD
simulation compared to a general one is that it is accelerated molecular
dynamics (aMD)simulations. This type of MD simulation enhances the process by
adding a non-negative boost potential to the system when the system potential
is lower than a pre-determined reference energy. The boost potential is given
Where is the original
potential, is the reference energy and
is the acceleration
The root-mean square deviation (RMSD) values and radius of
gyration were calculated using VMD for the C? atoms of specific residues in
chignolin. These specific residues were Tyr2-Trp9. These results obtained
through aMD simulations were cross checked with the Protein Data Bank (PDB) – a
crystallographic database for 3D biological molecules 8 to ensure
the data values obtained were correct. They performed three independent 300ns
aMD simulations of chignolin and compared the observed folded protein with one
from PDB. The main findings they obtained is presented in the figure below.
In the second paper, TBSA was a newly proposed method
and the paper aimed to show if TBSA was a viable enhanced sampling method. Firstly,
to demonstrate, they deposited 18 model structures of the final folded state
(native structure) from previous nuclear magnetic resonance (NMR) experiments 9.
They immediately noticed that the protein formed a stable ?-hairpin structure
in an aqueous solution. Thus, realising that chignolin was suitable for
examining the ability of TBSA and their real simulations began.
As an initial structure (where original histogram came
from) for the folding of chignolin, an extended structure was modelled based on
the amino acid sequences using the LEaP module of the AMBER 11 software 10.
The explanation of this process will not be given as it is a standard MD
simulation using slightly different AMBER software. The conformational TBSA
sampling method was now able to proceed, however to remove the dependence of
the initial structure, they performed several trials of TBSA. They performed 10
trials and each of the initial velocities were differently assigned to the
initial structure based on the Maxwell-Boltzmann distribution in starting
short-time (100ps) MD simulations under NVT, where T was set to 200K. Each
trial had 10 repeated cycles and in each cycle, the seeds were randomly
selected from accumulated snapshots belonging to each energy bin, E, and summed
up to . Here,
Where , , and are the original
histogram, the peak with the maximum value, and the normalization factor for
adjusting the total number of seeds, respectively.
Estimated free energy landscapes were created through
selecting 200 random snapshots of trajectories generated by TBSA trials. These
snapshots were utilized as a reference to create histograms of all-atom RMSDs
(excluding the flexible N- and C-terminal residues). To check to see if these
created histograms generated a system in which it converges to its
stable/native configuration, they were compared to the structure that would
have been obtained using the process in the previous NMR experiments.
The main advantage of using aMD simulations over a
general MD simulation is the significant speed up, thanks to the added boost
potential shown in equation 2. As shown in Figure 2, their simulation seemed to
be a success, with the structure that they obtained in blue matching the PDB
structure in red quite well. The time evolution graph further proves their
findings with the graph predominantly in orange and yellow, corresponding to
‘?-sheet’ and ‘turn’, which is characteristic of a ?-hairpin formation. In the first paper, they did not
exclusively study chignolin but also three other proteins: Trp-cage, villin and
WW domain. And they obtained similar results for those proteins using the same
simulation method, in that they got close to the desired final protein
structure. The fact that this simulation method also worked for other proteins
shows that aMD simulations are useful in for conducting other protein-folding
studies, not just chignolin.
As seen from Figure 3, the brown and green structure
are very close to overlapping. Thus, confirming that TBSA certainly sampled the
native structure desired with high precision. A solid advantage to the
conformational sampling method used (TBSA) in the second paper is that it is
versatile enough to be utilized effectively in many types of MD simulations,
including coarse-grained, implicit and explicit all-atom models. When compared
to a similar enhanced sampling method, Multiscale Free Energy Landscape (MSFEL),
also designed by the same people. They realised that the method was only useful
in coarse-grained models and required very high amounts of computational time
for all-atom models. TBSA efficiently reduces the computational time required
for the calculation of the free energy landscapes 2.
Since both aMD simulations and the TBSA sampling
method reduce computational time, but in different ways. A mixture of the two
would complement each other in a way that aMD simulations would enhance the conformational
sampling of the biomolecules through its boost energy potential and the TBSA
sampling method would reduce the computational time required for the
calculation of the free energy landscapes, hence creating a combination where
the computational time would be immensely reduced – saving time and effort and
understanding chignolin or other protein structures quicker.
Y., Feixas, F., Eun, C. and McCammon, J. (2016). Accelerated Molecular Dynamics
Simulations of Protein Folding. Journal of Computational Chemistry, 37(6),
R., Takano, Y. and Shigeta, Y. (2015). Enhanced conformational sampling method
for proteins based on the TaBoo SeArch algorithm: Application to the folding of
a mini-protein, chignolin. Journal of Computational Chemistry, 36(10),
A., Narumi, T., Futatsugi, N., Yanai, R., Ohno, Y., Okimoto, N. and Taiji, M.
(2007). Folding Dynamics of 10-Residue ?-Hairpin Peptide Chignolin. Chemistry –
An Asian Journal, 2(5), pp.591-598.
S., Akiba, T., Kato, Y., Sawada, Y., Sekijima, M., Ishimura, M., Ooishi, A.,
Watanabe, H., Odahara, T. and Harata, K. (2008). Crystal Structure of a
Ten-Amino Acid Protein. Journal of the American Chemical Society, 130(46),
W., Cieplak, P., Bayly, C., Gould, I., Merz, K., Ferguson, D., Spellmeyer, D.,
Fox, T., Caldwell, J. and Kollman, P. (1995). A Second Generation Force Field
for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. Journal
of the American Chemical Society, 117(19), pp.5179-5197.
S. (2014). Conformational sampling in protein structure prediction. Ph.D. University
Grand, S., Götz, A. and Walker, R. (2013). SPFP: Speed without compromise—A
mixed precision model for GPU accelerated molecular dynamics simulations.
Computer Physics Communications, 184(2), pp.374-380.
(2018). Protein Data Bank. online Available at:
https://en.wikipedia.org/wiki/Protein_Data_Bank Accessed 12 Jan. 2018.
S., Yamasaki, K., Sawada, Y. and Morii, H. (2004). 10 Residue Folded Peptide
Designed by Segment Statistics. Structure, 12(8), pp.1507-1518.
A. Case, T. A. Darden, T. E. Cheatham, III, C. L. Simmerling, J. Wang, R. E.
Duke, R. Luo, R. C. Walker, W. Zhang, K. M. Merz, B. Roberts, B. Wang, S.
Hayik, A. Roitberg, G. Seabra, I. Kolossvary, K. F. Wong, F. Paesani, J.
Vanicek, J. Liu, X. Wu, S. R. Brozell, T. Steinbrecher, H. Gohlke, Q. Cai, X.
Ye, J. Wang, M.-J. Hsieh, G. Cui, D. R. Roe, D. H. Mathews, M. G. Seetin, C.
Sagui, V. Babin, T. Luchko, S. Gusarov, A. Kovalenko, P. A. Kollman, AMBER 11,
University of California, San Francisco, 2010.