Sampling Algorithm Development:  Given an accurate force field, the design problem becomes finding the lowest energy sequence for a given challenge.  Since there are 20 amino acids possible at each position, this may require searching through the 20x20x20…=20Nres sequences for a new designed protein with Nres residues.  Because the optimal structure to solve a given challenge is in general not known in advance, alternative backbone conformations must be searched as well—also a challenge since even with the conservative estimate of three states per residue there are ~3Nres conformations for an Nres protein.   IPD researchers are developing algorithms for efficiently searching through these vast sequence and structure spaces to find very low energy solutions that solve the specified design challenge.

Sampling Methods

Novel methods for conformational sampling and refinement in protein structure prediction are a major research focus at the IPD.  Our researchers have developed tools improving both sampling methods and potential functions used in structure prediction.  These tools include a novel way of aggregating information from multiple evolutionarily related homologues through explicit template recombination(1).  Furthermore, I have shown that including backbone bond angle flexibility in structure refinement results in improved protein structure discrimination than conventional methods (2).  These approaches were both successfully employed in the 10th critical assessment of structure prediction (CASP) competition (3), where the approach was one of the top automated methods worldwide.


1. Y. Song*, F. DiMaio*, R. Y.-R. Wang, D. Kim, C. Miles, T.J. Brunette, J. Thompson and D. Baker (2013) High resolution comparative modeling with RosettaCM. Structure. 21:1735-42.

2. P. Conway*, M. Tyka*, F. DiMaio*, D. Konerding and D. Baker (2013). Relaxation of backbone bond geometry improves protein energy landscape modeling.  Protein Sci. 23:47-55.

3. D. Kim*, F. DiMaio*, R. Y.-R. Wang, Y. Song, D. Baker (2013). One contact for every twelve residues is sufficient for accurate topology-level protein structure modeling. Proteins. 82:208-18.