Just as convincing images of cats can be created using artificial intelligence, new proteins can now be made using similar tools. In a new report in Nature, we describe the development of a neural network that “hallucinates” proteins with new, stable structures.
“For this project, we made up completely random protein sequences and introduced mutations into them until our neural network predicted that they would fold into stable structures,” said co-lead author Ivan Anishchenko, PhD, an acting instructor in the Baker lab at the Institute for Protein Design. “At no point did we guide the software toward a particular outcome — these new proteins are just what a computer dreams up.”
In the future, it should be possible to steer the artificial intelligence so that it generates new proteins with useful features. “We’d like to use deep learning to design proteins with function, including protein-based drugs, enzymes, you name it,” said co-lead author Sam Pellock, a postdoctoral scholar in the Baker lab.
The research team, which included scientists from UW Medicine, Harvard University, and Rensselaer Polytechnic Institute (RPI), generated two thousand new protein sequences that were predicted to fold. Over 100 of these were produced in the laboratory and studied. Detailed analysis on three such proteins confirmed that the shapes predicted by the computer were indeed realized in the lab.
“Our NMR studies, along with X-ray crystal structures determined by the University of Washington team, demonstrate the remarkable accuracy of protein designs created by the hallucination approach”, said co-author Theresa Ramelot, a senior research scientist at RPI in Troy, New York.
Gaetano Montelione, a co-author and professor of chemistry and chemical biology at RPI, notes “The hallucination approach builds on observations we made together with the Baker lab revealing that protein structure prediction with deep learning can be quite accurate even for a single protein sequence with no natural relatives. The potential to hallucinate brand new proteins that bind particular biomolecules or form desired enzymatic active sites is very exciting”.
“This approach greatly simplifies protein design,” said senior author David Baker. “Before, to create a new protein with a particular shape, people first carefully studied related structures in nature to come up with a set of rules that were then applied in the design process. New sets of rules were needed for each new type of fold. Here, by using a deep-learning network that already captures general principles of protein structure, we eliminate the need for fold-specific rules and open up the possibility of focusing on just the functional parts of a protein directly.”
“Exploring how to best use this strategy for specific applications is now an active area of research, and this is where I expect the next breakthroughs,” said Baker.
Funding was provided by the National Science Foundation (1937533, MCB2032259), National Institutes of Health (DP5OD026389, GM120574, P30GM124165, S10OD021527), Department of Energy (DE-AC02-06CH11357) Open Philanthropy, Eric and Wendy Schmidt by recommendation of the Schmidt Futures program, Audacious Project, Washington Research Foundation, Novo Nordisk Foundation, and Howard Hughes Medical Institute. The authors also acknowledge computing resources from the University of Washington and Rosetta@Home volunteers.