Proteinfolding is one of the most fundamental and fascinating biological processes.Moreover, it remains a long-standing problem to understand the detailedmechanisms of protein folding from primary sequence to the nativethree-dimensional structures. A number of small proteins with 10 to 100 amino acidresidues fold on the microsecond to sub millisecond timescales, known as”fast-folding” proteins and serve as excellent model systems to study proteinfolding.
In this review, I will be discussing two papers that study one ofthese “fast-folding” proteins in different ways. One paper will study Chignolinfolding through molecular dynamics simulations 1 – a standardapproach, while the other will study Chignolin folding through enhanced samplingmethods 2.Describing ChignolinProteins being essential to all living organisms arelinear polymers composed of amino acids. Amino acid polymers may also be referredto as polypeptides, although scientists do not customarily use these termsinterchangeably. The term “Protein” generally refers to naturally occurringmolecules having a particular sequence and a defined 3-dimensional (3D)structure, whereas “polypeptide” can refer to any polymers of amino acids,regardless of length, sequence, and structure.
“Peptide” is generally reservedfor a short oligomer that often lacks a stable conformation. In this review,the protein in question is Chignolin. Chignolin is a synthetically constructedprotein consisting of only 10 amino acids. These are: Gly-Tyr-Asp-Pro-Glu-Thr-Gly-Thr-Try-Gly.
The reason why it is of interest is that, not onlyis it the smallest known ?-hairpin structured protein known to be stable in asolution 3, it also plays a massive role in investigating themechanisms behind protein folding. Chignolin’s simple structure of two betastrands in a hairpin form is an ideal modelling system to investigate thecharacteristic behaviours involved in protein folding processes and it providesus information about the early events of protein folding – analogous toexamining cosmic background radiation to obtain information about the early eventsof the universe. The ?-hairpin structure of Chignolin gives it an upperhand when requiring simulating. Hairpins are the simplest models forprotein-folding study because they exhibit much of the characteristic behaviourstypically seen in folding dynamics of proteins.
In recent years, the foldingdynamics and energetics of a variety of ?-hairpin systems, like Chignolin –such as the CLN025 4, Trp-cage 1 and villin 1,have been well studied experimentally and theoretically. And report that thesimulated folding occurs on a microsecond timescale, which is perfect forassessing it by both fast time-resolved experiments and by computer simulationsthat use the latest high-performance computers to follow the entire foldingprocess. Thus, a complete picture of the folding mechanism can be obtained bycombining the experimental and theoretical findings.
Describing the methodsThe theoretical underpinnings of molecular dynamics(MD) simulations are simple – mimic the physical movements of the atoms andmolecules in the sample would do in real life. This allows the study ofbiological and chemical systems at the atomistic level on very smalltimescales. It complements experimentswhile also offering a way to follow processes difficult to discern withexperimental techniques. MD simulations have increased dramatically in size,complexity, and simulation timescale over the years, while the questions beinganswered with these methods have also diversified.
An MD simulation primarilyrequires the definition of a potential function, this is a description of theterms by which the particles in the simulation will be assumed to interact.This is referred to as a force field, as the force is the derivative of thispotential relative to position. The MD simulation type performed in the first paperwas an all-atom classical MD simulation of Chignolin using the Assisted ModelBuilding with Energy Refinement (AMBER) 12 simulation package. In this type ofsimulation, the system has periodic boundary conditions, meaning that moleculesthat exit one side of the system will wrap to the other side of the system. Thespecific variation of AMBER force field that was used is known as ‘ff99SB forcefield’ 1, this force field is exclusively for proteins only. Ingeneral, the functional form of AMBER force fields is as follows 5: Where the first term depicts the energy between atomsbonded covalently.
The second term depicts the energy due to the geometry ofelectron orbitals involved in that type of bonding. The third term depicts theenergy for twisting a bond between the atoms. And the final term depicts theenergy of all the non-bonded atom pairs – firstly van der Waals energies thenelectrostatic energies. The theoretical underpinnings within the method usedin the second paper are not as simple. In this paper, a newly proposed enhancedconformational sampling method was used on Chignolin named TaBu SeArchAlgorithm (TBSA). General conformational sampling is a computationallyintensive task and is normally defined as the process of modelling the targetconformational space by compiling a set of representative geometries 6.
In the case of protein folding, conformational sampling would be performed overthe space of possible side chain and backbone geometries until the convergenceof the target configuration is reached – in other words, lowest minimum energy. TBSA is an iterative method in which a set of movesthat the possible side chains and/or backbone atoms can move with respect toits neighbour. Each move can correspond to a change in bond length, bond angleor dihedral angle and are all aiming to get to the final configuration, a fullyfolded chignolin protein. If the move does not lead to the final product(folded chignolin), then this move is known as a tabu move and is recorded ontothe tabu list – a list of forbidden moves. This list however can be updated as the simulationgoes on, when a tabu move has a sufficiently attractive evaluation where itwould result in a solution better than any other move visited, then its tabuclassification is removed – this condition is known as aspiration criterion. Theiterative process is repeated until the final desired configuration is reached. In the paper, the starting point for the iterativeconformational sampling process was an inverse energy histogram. This was usedto select the rarest occurring states, “seeds”, for the conformational resampling.
Seeds that lead to states with high frequencies were inhibited, while stateswith low frequencies were efficiently sampled to explore the unvisitedconformational space. Each seed used for the conformational resampling wasassigned as a trajectory with an initial velocity and MD simulation was thenperformed. If the trajectory used from that particular seed did not converge,then the inverse histogram was updated, and a new seed was selected to startthe process again. This can be better visualised in figure 1 below.
Describingthe papersAs mentioned before, Chignolin was simulated using theAMBER 12 software package with the ff99SB force field. The exact model that wasused is known as “SPFP precision model”. Which is similar to most MDsimulations where an energy minimization is performed to ‘relax’ the system andexclude steric clashes or funky geometry. Then equilibration is implemented toequilibrate the solvent around the protein.
This is then followed by obtaining RMSDvalues and radius of gyration values to ensure the system is stable. However,the slight difference with the SPFP precision model is that it does all thisbut replaces double precision arithmetic with fixed point integer arithmeticfor the accumulation of force components. This model significantly boostsperformance on modern GPU hardware without sacrificing numerical accuracy 7and therefore saves computational time. Another slight difference in this paper’s MDsimulation compared to a general one is that it is accelerated moleculardynamics (aMD)simulations. This type of MD simulation enhances the process byadding a non-negative boost potential to the system when the system potentialis lower than a pre-determined reference energy. The boost potential is givenby: Where is the originalpotential, is the reference energy and is the accelerationfactor. The root-mean square deviation (RMSD) values and radius ofgyration were calculated using VMD for the C? atoms of specific residues inchignolin.
These specific residues were Tyr2-Trp9. These results obtainedthrough aMD simulations were cross checked with the Protein Data Bank (PDB) – acrystallographic database for 3D biological molecules 8 to ensurethe data values obtained were correct. They performed three independent 300nsaMD simulations of chignolin and compared the observed folded protein with onefrom PDB. The main findings they obtained is presented in the figure below. In the second paper, TBSA was a newly proposed methodand the paper aimed to show if TBSA was a viable enhanced sampling method. Firstly,to demonstrate, they deposited 18 model structures of the final folded state(native structure) from previous nuclear magnetic resonance (NMR) experiments 9.
They immediately noticed that the protein formed a stable ?-hairpin structurein an aqueous solution. Thus, realising that chignolin was suitable forexamining the ability of TBSA and their real simulations began. As an initial structure (where original histogram camefrom) for the folding of chignolin, an extended structure was modelled based onthe amino acid sequences using the LEaP module of the AMBER 11 software 10.The explanation of this process will not be given as it is a standard MDsimulation using slightly different AMBER software.
The conformational TBSAsampling method was now able to proceed, however to remove the dependence ofthe initial structure, they performed several trials of TBSA. They performed 10trials and each of the initial velocities were differently assigned to theinitial structure based on the Maxwell-Boltzmann distribution in startingshort-time (100ps) MD simulations under NVT, where T was set to 200K. Eachtrial had 10 repeated cycles and in each cycle, the seeds were randomlyselected from accumulated snapshots belonging to each energy bin, E, and summedup to . Here, Where , , and are the originalhistogram, the peak with the maximum value, and the normalization factor foradjusting the total number of seeds, respectively. Estimated free energy landscapes were created throughselecting 200 random snapshots of trajectories generated by TBSA trials. Thesesnapshots were utilized as a reference to create histograms of all-atom RMSDs(excluding the flexible N- and C-terminal residues).
To check to see if thesecreated histograms generated a system in which it converges to itsstable/native configuration, they were compared to the structure that wouldhave been obtained using the process in the previous NMR experiments. DiscussionThe main advantage of using aMD simulations over ageneral MD simulation is the significant speed up, thanks to the added boostpotential shown in equation 2. As shown in Figure 2, their simulation seemed tobe a success, with the structure that they obtained in blue matching the PDBstructure in red quite well. The time evolution graph further proves theirfindings with the graph predominantly in orange and yellow, corresponding to’?-sheet’ and ‘turn’, which is characteristic of a ?-hairpin formation. In the first paper, they did notexclusively study chignolin but also three other proteins: Trp-cage, villin andWW domain. And they obtained similar results for those proteins using the samesimulation method, in that they got close to the desired final proteinstructure. The fact that this simulation method also worked for other proteinsshows that aMD simulations are useful in for conducting other protein-foldingstudies, not just chignolin.
As seen from Figure 3, the brown and green structureare very close to overlapping. Thus, confirming that TBSA certainly sampled thenative structure desired with high precision. A solid advantage to theconformational sampling method used (TBSA) in the second paper is that it isversatile enough to be utilized effectively in many types of MD simulations,including coarse-grained, implicit and explicit all-atom models. When comparedto a similar enhanced sampling method, Multiscale Free Energy Landscape (MSFEL),also designed by the same people.
They realised that the method was only usefulin coarse-grained models and required very high amounts of computational timefor all-atom models. TBSA efficiently reduces the computational time requiredfor the calculation of the free energy landscapes 2. Since both aMD simulations and the TBSA samplingmethod reduce computational time, but in different ways. A mixture of the twowould complement each other in a way that aMD simulations would enhance the conformationalsampling of the biomolecules through its boost energy potential and the TBSAsampling method would reduce the computational time required for thecalculation of the free energy landscapes, hence creating a combination wherethe computational time would be immensely reduced – saving time and effort andunderstanding chignolin or other protein structures quicker.
References1 Miao,Y., Feixas, F., Eun, C. and McCammon, J. (2016). Accelerated Molecular DynamicsSimulations of Protein Folding.
Journal of Computational Chemistry, 37(6),p.n/a-n/a.2 Harada,R., Takano, Y. and Shigeta, Y. (2015). Enhanced conformational sampling methodfor proteins based on the TaBoo SeArch algorithm: Application to the folding ofa mini-protein, chignolin. Journal of Computational Chemistry, 36(10),pp.
763-772.3 Suenaga,A., Narumi, T., Futatsugi, N.
, Yanai, R., Ohno, Y., Okimoto, N. and Taiji, M.(2007). Folding Dynamics of 10-Residue ?-Hairpin Peptide Chignolin.
Chemistry –An Asian Journal, 2(5), pp.591-598.4 Honda,S., Akiba, T., Kato, Y.
, Sawada, Y., Sekijima, M., Ishimura, M., Ooishi, A.,Watanabe, H., Odahara, T. and Harata, K. (2008).
Crystal Structure of aTen-Amino Acid Protein. Journal of the American Chemical Society, 130(46),pp.15327-15331.5 Cornell,W.
, Cieplak, P., Bayly, C., Gould, I.
, Merz, K., Ferguson, D., Spellmeyer, D.,Fox, T., Caldwell, J. and Kollman, P. (1995). A Second Generation Force Fieldfor the Simulation of Proteins, Nucleic Acids, and Organic Molecules.
Journalof the American Chemical Society, 117(19), pp.5179-5197.6 Subramaniam,S. (2014). Conformational sampling in protein structure prediction. Ph.
D. Universityof Wisconsin–Madison.7 LeGrand, S., Götz, A.
and Walker, R. (2013). SPFP: Speed without compromise—Amixed precision model for GPU accelerated molecular dynamics simulations.
Computer Physics Communications, 184(2), pp.374-380.8 En.
wikipedia.org.(2018). Protein Data Bank. online Available at:https://en.wikipedia.org/wiki/Protein_Data_Bank Accessed 12 Jan.
2018.9 Honda,S., Yamasaki, K., Sawada, Y. and Morii, H.
(2004). 10 Residue Folded PeptideDesigned by Segment Statistics. Structure, 12(8), pp.
1507-1518.10 D.A. Case, T.
A. Darden, T. E. Cheatham, III, C.
L. Simmerling, J. Wang, R. E.Duke, R. Luo, R. C.
Walker, W. Zhang, K. M. Merz, B. Roberts, B. Wang, S.Hayik, A.
Roitberg, G. Seabra, I. Kolossvary, K. F.
Wong, F. Paesani, J.Vanicek, J.
Liu, X. Wu, S. R. Brozell, T. Steinbrecher, H. Gohlke, Q.
Cai, X.Ye, J. Wang, M.-J. Hsieh, G. Cui, D. R. Roe, D.
H. Mathews, M. G. Seetin, C.
Sagui, V. Babin, T. Luchko, S. Gusarov, A.
Kovalenko, P. A. Kollman, AMBER 11,University of California, San Francisco, 2010.