The Matrix of Protein Design

The Matrix movie (1999) depicts a future in which the reality perceived by most humans is actually a computer simulated reality called “the Matrix”.  Published today in Sciencethe Baker lab and collaborators report on a new kind of Matrix –  a new reality for large scale computational protein design which can achieve massive data driven improvements in our ability to design highly stable, small proteins from scratch.

Illustration by Gabe Rocklin


Following the White Rabbit, Postdoctoral fellow Dr. Gabe Rocklin led a group of scientists to design and test over 15,000 new mini-proteins (which do not exist in nature) to see whether they form stable folded structures. Even major protein design studies in the past few years have generally examined only 50 to 100 designs.  Synthetic DNA technology and high throughput screening permitted the group to conduct large-scale testing of structural stability of multitudes of computationally designed proteins.  In turn, this allows them to perform a “global analysis of protein folding using massively parallel design, synthesis and testing“.  

Through iterative improvements in the design process, the group arrived at 2,788 stable mini-protein structures, which is at least 50-fold more proteins than have ever been characterized from natural sources for similar sized proteins.  Their small size and stability may be advantageous for treating diseases when the drug needs to avoid the immune system and reach the inside of a cell.

The publication Abstract is a step into the Matrix as Morpheus explains,

Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Though these forces are “encoded” in the thousands of known protein structures, “decoding” them is challenging due to the complexity of natural proteins that have evolved for function, not stability. Here we combine computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for over 15,000 de novo designed miniproteins, 1,000 natural proteins, 10,000 point-mutants, and 30,000 negative control sequences, identifying over 2,500 new stable designed proteins in four basic folds. This scale — three orders of magnitude greater than that of previous studies of design or folding—enabled us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment, and promises to transform computational protein design into a data-driven science.

The research has been recognized by opinion leaders and media outlets as a major step for computational protein design.  See articles in Science, UW Newsbeat, Science Daily, Chemistry, and GEN.