Model, design, and analyze protein structures
The Rosetta software suite includes algorithms for computational modeling and analysis of protein structures. Rosetta has enabled notable scientific advances in computational biology, including de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes.
Rosetta began in the laboratory of Dr. David Baker at the University of Washington as a structure prediction tool but has since been adapted to solve common computational macromolecular problems. Development of Rosetta now happens among the members of RosettaCommons, which include government laboratories, institutes, research centers, and partner corporations.
Rosetta is available to all non-commercial users for free and to commercial users for a fee. Visit rosettacommon.org to get started.
Predict protein structures from amino acid sequences
RoseTTAFold is a software tool that uses deep learning to quickly and accurately predict protein structures based on amino acid sequences alone. Without the aid of such software, it can take years of laboratory work to determine the structure of just one protein. With RoseTTAFold, a protein structure can be computed in as little as ten minutes on a single gaming computer.
RoseTTAFold is a three-track neural network, meaning it simultaneously considers patterns in protein sequences, how a protein’s amino acids interact with one another, and a protein’s possible three-dimensional structure. In this architecture, one-, two-, and three-dimensional information flows back and forth, allowing the network to collectively reason about the relationship between a protein’s chemical parts and its folded structure.
As reported in Science, our team has used RoseTTAFold to compute hundreds of new protein structures, including many poorly understood proteins from the human genome. We also generated structures directly relevant to human health, including for proteins associated with problematic lipid metabolism, inflammation disorders, and cancer cell growth. And we have shown that RoseTTAFold can be used to build models of complex biological assemblies in a fraction of the time previously required.
A generative model for protein design
RoseTTAFold Diffusion (RF Diffusion) is a guided diffusion model for generating new proteins. It outperforms existing protein design methods across a broad range of problems and has been used to generate ultra-high affinity binders through pure computation and a series of novel symmetric assemblies that we have experimentally validated via electron microscopy.
With prior design methods, tens of thousands of protein designs may have to be tested before finding a single one that performs as intended. Using RF Diffusion, we find that as little as one design must be tested.
Image generation tools like DALL-E produces high-quality images that have never existed before using a diffusion model, which is a machine-learning algorithm that specializes in adding and removing noise. Diffusion models begin with grainy bits of static and gradually remove noise until a clear picture is formed. Additional software guides this de-noising process. RF Diffusion uses this strategy to turn clouds of disconnected atoms into coherent protein structures.
Rapid sequence design
ProteinMPNN is a powerful tool for protein sequence design. It takes a protein structure as input and quickly identifies new amino acid sequences that are likely to fold into that backbone. Combined with structure prediction tools like RoseTTAFold or AlphaFold, it can be used to iteratively create stable proteins with novel structures and sequences.
Computers are smart, but they sometimes miss important things. The same goes for researchers. This is where Foldit comes in: everyday people playing Foldit can help discover better protein designs through their unique creativity and ingenuity.