The 15th BBVA Foundation Frontiers of Knowledge Award in Biology and Biomedicine has gone to David Baker, Demis Hassabis and John Jumper “for their contributions to the use of artificial intelligence for the accurate prediction of the three-dimensional structure of proteins.”
From the BBVA Foundation:
Baker – a Professor of Biochemistry at the University of Washington and a Howard Hughes Medical Institute Investigator – developed the RoseTTAFold program, while Hassabis and Jumper – CEO and senior research scientist respectively at AI company DeepMind – are the creators of AlphaFold2. “Both computing methods,” the committee explains, “rely on a sophisticated machine-learning technique known as deep learning to predict the shape of proteins with unprecedented accuracy, similar to that of experimentally-determined structures, and with exceptional speed.”
“This breakthrough,” it concludes, “is revolutionizing our understanding of how the amino acid sequence of proteins leads to uniquely ordered three-dimensional structures. Scientists are now using these new methods to predict protein conformations, design entirely new proteins and identify novel drug targets.”
“Until now,” said committee secretary Óscar Marín, “it took years of arduous lab work to predict the structure of even a single protein, but with the advances achieved by the three awardees we now need just a few minutes on the computer.” For the Director of the Medical Research Council Centre for Neurodevelopmental Disorders at King’s College London, thanks to the work done by Baker, Hassabis and Jumper “we are going to make far faster progress in future in developing treatments for multiple diseases.”
A technological “shortcut” to predict the structure of proteins
The DNA of our cells contains all the instructions we need to develop, survive and reproduce. But proteins are the workhorses that keep all keeping all these functions going, and it is their three-dimensional structure that determines their exact mission.
To know the specific role a protein fulfils, it is not enough to know the DNA sequence encoding it, or even to identify the amino acid sequence into which the genetic information is translated. The key to understanding how a protein will act lies in the arrangement in space it adopts through folding, but deciphering this in the lab is a slow and rather scattergun process. And predicting its function from its chemical composition is likewise a complex and uncertain task.
“Scientists always assumed that it was just too hard to understand how proteins fold. To try and deduce it from the underlying physical principles, you need a vast quantity of computing resources to even guess at their most stable form,” explained Dario Alessi, a committee member and head of the MRC Protein Phosphorylation and Ubiquitylation Unit at Dundee University (United Kingdom), shortly after the decision was reached. “But the awardees have come up with an AI-driven shortcut using a deep-learning technique.”
“I believe AlphaFold represents really the first powerful example of how deep learning is able to capture the complexity of biological systems and really develop mathematical understandings of extraordinarily complex things,” declared Jumper in an interview granted after hearing of the award. “It is very, very difficult to handle the extraordinary complexity that you see in a living cell, but I think with this technology we can really capture that complexity.”
“AlphaFold has already made a huge impact on biological research in quite a short space of time,” adds fellow laureate Demis Hassabis. “We know that over a million researchers have used the structures predicted by AlphaFold in their research, and pretty much every pharma company in the world has been using AlphaFold in their drug discovery programs.”
“De novo” proteins to block viruses and cancer cells
As well as predicting how naturally-occurring proteins will fold, the RoseTTAFold program led by David Baker has also proved able to design completely new proteins based on a simple description of their target functions. The program can thus obtain proteins to block not only flu virus or COVID-19 proteins, but also cancer cells, and its results have been successfully tested in the lab.
“New proteins can be improved medicines, so there are many new and exciting medical applications, for example, creating new vaccines or new cancer treating medications,” Baker explains. Some decades back, this American biochemist and computational biologist began exploring ways to deduce the structure of proteins guided by the principles of physics, and wrote his findings into an algorithm known by the name Rosetta. The new method performed fairly well with small proteins but demanded large computational resources and expert knowledge to get it working properly.
In parallel, Demis Hassabis and John Jumper decided to use artificial intelligence to solve the problem in a quicker, more accessible way. Jumper led a team using available deep-learning tools and vast quantities of data on the sequences and structures of known proteins, and set to work training the neural network.
This first iteration, which they called AlphaFold, was launched in 2018. “We had the best system in the world at the time,” says Jumper, “but it was still far, far off from what we knew was the kind of accuracy needed to be really experimentally relevant.”
They accordingly set to work to design a better system. Starting from scratch, they decided to take all the knowledge they possessed on how proteins fold and feed it into the neural network. So as well as the information provided by known proteins, the network also had some knowledge about the folding mechanism built into its design.
“This enabled the network to learn dramatically more efficiently from the existing data,” Jumper affirms. In December 2020 they entered the new tool, AlphaFold2, for an international challenge where it would have to prove itself against competing systems. Their resounding success went far beyond the researchers’ expectations. AlphaFold2 achieved in a few short days what would have taken years of work in the lab.
When announcing AlphaFold2, Jumper had outlined some of its underlying concepts, and Baker was quick to take note. “We started having meetings every week in my group,” he recalls, “and we started to systematically go through different ideas and experimenting, and that ultimately led to RoseTTAFold.”
The product was launched a few months later. The level of accuracy was comparable to that of AlphaFold2, plus it came with an added functionality. Not only could it reliably predict a protein’s structure from its amino acid sequence in hours or even minutes, it could also run the process in reverse, determining the corresponding amino acid sequence from a protein of a given shape.
Open source tools for the biomedical research community
Nowadays both RoseTTAFold and AlphaFold2 are freely available to the scientific community, and recent upgrades have practically equalized the computing times required by each.
Although these AI tools have not entirely supplanted experimental methods, they have made a strong appearance at their side, revolutionizing the whole of biology. So much so that Dario Alessi describes them as “the first real demonstration of how artificial intelligence will transform the field.”
He recalls that his own laboratory had spent three years unraveling the structure of the PPM1H protein through experimental techniques when AlphaFold came along. “We had the structure and were just about to publish it when AlphaFold appeared. Out of curiosity we compared the structures and they were totally identical, not a single significant difference in 547 amino acids,” he relates, still astounded at the program accomplishing in minutes what had taken years of work.
Thanks to these tools, almost all documented proteins – not only human but those of animals, plants and even bacteria – have yielded up their structural secrets. And this knowledge will find immediate application in the creation of new drugs and vaccines.
“We have already seen AlphaFold being applied to a huge range of problems,” says Hassabis. “Some of the things we’re most excited about it being used for are drug discovery, for example, to combat antibiotic resistance, or to try and find cures for diseases like malaria.”
Jumper, in fact, has collaborated with a University of Oxford research group working on a malaria vaccine. Most vaccines contain fragments of the protein of the infectious agent, but to decide which fragment is best, you need to know the structure of the candidate protein. The Oxford team, says Jumper, “were unsure about the structure of the protein they needed, and this was stopping them from figuring out the right construct. They used AlphaFold to predict the structure, so were able to understand which fragments might work and how to make a vaccine from them.”
Computational biologist Gonzalo Jiménez Osés, Principal Investigator at CIC bioGUNE in Bilbao and one of the nominators of the new laureates, explains one of the most promising facets of this contribution in the biomedicine area: “Among AlphaFold’s successes has been to integrate the vast amount of genetic and structural information contributed by scientists over the decades to open access databanks into an advanced neural network together with a sophisticated machine-learning algorithm, and one immediate byproduct will be in new drug design. In classic drug development, we will certainly discover novel therapeutic targets, but, more important still, we will rapidly arrive at a more precise understanding of the network of protein interactions occurring in diseases such as cancer and immune system disorders, and this will lead to new treatments, because computer simulations of these complex processes will be far more reliable.”
The revolution in purpose-designed proteins for more sophisticated medications
For the moment, the biggest impact for new vaccine and drug creation lies in the design of proteins à la carte. The latest RoseTTAFold version even allows us to create proteins from simple descriptions. “It’s like DALL-E but for proteins,” Baker explains, referring to the AI system where users can generate images from simple text prompts. “So for example, you can tell RoseTTAFold: design a protein which blocks this flu virus protein, or design a protein which will block these cancer cells. RoseTTAFold will then make those proteins. We’ve made them in the lab, and we find that they have exactly those functions.”
An anti-coronavirus vaccine created with RoseTTAFold is now being used in South Korea. And new purpose-designed anti-cancer medicines are being tested in human clinical trials. There are even plans to develop a nasal spray that protects against COVID and other respiratory viruses.
“We believe that almost all of medicine will be transformed by the protein design revolution,” says Baker. “Most medicines today are made by making small modifications to the proteins which already exist in nature. Now that we can design completely new proteins, we can develop much more improved, more sophisticated medicines that, for example, can treat cancer without the side effects, be made very quickly upon the outbreak of a new pandemic, and in general will be more precise and more robust.”
David Baker, Demis Hassabis and John Jumper were nominated by two institutions: on behalf of the Spanish Society for Biochemistry and Molecular Biology (SEBBM) by its President Isabel Varela Nieto; and on behalf of CIC bioGUNE (Center for Cooperative Research in Biosciences) by José M. Mato, its General Director; Jesús Jiménez Barbero, Scientific Director; Gonzalo Jiménez-Osés, Principal Investigator in the Computational Chemistry Lab; and Óscar Millet, Principal Investigator in the Precision Medicine and Metabolism Lab.
Laureate bio notes
David Baker (Seattle, Washington, United States, 1962), with a PhD in Biochemistry from the University of California, Berkeley, is currently the Director of the Institute for Protein Design, a Howard Hughes Medical Institute Investigator, the Henrietta and Aubrey Davis Endowed Professor in Biochemistry, and an adjunct professor of genome sciences, bioengineering, chemical engineering, computer science, and physics at the University of Washington. Author of more than 570 research papers – with over 142,000 citations and an h-index of 201 – he holds more than 100 patents, has co-founded 11 firms and is the Director of Rosetta Commons, a consortium of labs and researchers that develop biomolecular structure prediction and design software.
Demis Hassabis (London, United Kingdom, 1976) was a chess master at 13 and completed his secondary school studies with top grades two years early, prompting the University of Cambridge to ask him to wait a year before enrolling. During this gap year, at age 17, he was lead programmer on the Theme Park video game, which sold 10 million copies. After graduating from Cambridge in computer science, he founded games developer Elixir Studios, which he sold in 2005, going on to earn a PhD in Cognitive Science at University College London (UCL) in 2009. He then pursued postdoctoral research in artificial intelligence at MIT, Harvard and back at UCL. In 2010 he co-founded DeepMind, staying on there as CEO after it was acquired by Google in 2014. He is also founder and CEO of Isomorphic Labs. Author of over 120 published papers, he has over 98,000 citations and an h-index of 78 on Google Scholar.
John Jumper (Little Rock, Arkansas, United States, 1985) completed a science degree at Vanderbilt University then started on a doctorate in Theoretical Condensed Matter Physics at the University of Cambridge. Realizing the discipline was not for him, he left it in a master’s degree and took up a post at the firm D. E. Shaw Research, working on the computer simulation of proteins. Three years later, in 2011, he began doctorate studies in theoretical chemistry at the University of Chicago, applying machine-learning techniques to the study of protein dynamics. In October 2017, ten months after earning his PhD, he joined DeepMind where he is currently Senior Staff Research Scientist. Included by Time magazine in its “100 Next” list for 2021, he has published some 50 papers with over 16,000 citations and an h-index of 20 on Google Scholar.
Biology and Biomedicine committee and evaluation support panel
The jury in this category was chaired by Angelika Schnieke, Chair of Animal Biotechnology, Emerita at the Technical University of Munich (Germany). The secretary was Óscar Marín, Professor of Neuroscience and Director of the MRC Centre for Neurodevelopmental Disorders at King’s College London (United Kingdom). Remaining members were Dario Alessi, Director of the MRC Protein Phosphorylation and Ubiquitylation Unit at Dundee University (United Kingdom); Lélia Delamarre, Director and Distinguished Scientist in the Department of Cancer Immunology at Genentech (United States); Robin Lovell-Badge, Senior Group Leader and Head of the Laboratory of Stem Cell Biology and Developmental Genetics at the Francis Crick Institute (United Kingdom); Ursula Ravens, Guest Scientist in the Institute of Experimental Cardiovascular Medicine of the University of Freiburg (Germany); Ali Shilatifard, Robert Francis Furchgott Professor of Biochemistry and Pediatrics at Northwestern University Feinberg School of Medicine (United States); and Bruce Whitelaw, Director of the Roslin Institute and Professor of Animal Biotechnology in the Royal (Dick) School of Veterinary Studies (RDSVS) of the University of Edinburgh (United Kingdom).
The evaluation support panel was coordinated by José M. Mato, General Director of CIC bioGUNE and CIC biomaGUNE, and formed by Edurne Berra, CIC BioGUNE Associate Principal Investigator in the Hypoxia Area; Jerónimo Bravo Sicilia, tenured researcher and Director of the Institute of Biomedicine of Valencia (IBV, CSIC); Arkaitz Carracedo, CIC bioGUNE Principal Investigator in the Cancer Area; Óscar Millet, CIC bioGUNE Principal Investigator in the Precision Medicine and Metabolism Area; Liset M. de la Prida, research scientist in the Cajal Institute (IC, CSIC); James D. Sutherland, CIC BioGUNE Associate Principal Investigator in the Developmental Biology Area; and Isabel Varela Nieto, research professor at the Alberto Sols Biomedical Research Institute (IIBM, CSIC-UAM).
About the BBVA Foundation Frontiers of Knowledge Awards
The BBVA Foundation centers its activity on the promotion of world-class scientific research and cultural creation, and the recognition of talent.
The BBVA Foundation Frontiers of Knowledge Awards, funded with 400,000 euros in each of their eight categories, recognize and reward contributions of singular impact in science, technology, the humanities and music, privileging those that significantly enlarge the stock of knowledge in a discipline, open up new fields, or build bridges between disciplinary areas. The goal of the awards, established in 2008, is to celebrate and promote the value of knowledge as a public good without frontiers, the best instrument at our command to take on the great global challenges of our time and expand the worldviews of individuals for the benefit of all humanity. Their eight categories address the knowledge map of the 21st century, from basic knowledge to fields devoted to understanding and interrelating the natural environment by way of closely connected domains such as biology and medicine or economics, information technologies, social sciences and the humanities, and the universal art of music.
The BBVA Foundation has been aided in the evaluation of nominees for the Frontiers Award in Biology and Biomedicine by the Spanish National Research Council (CSIC), the country’s premier public research organization. CSIC appoints members to the evaluation support panels made up of leading experts in the corresponding knowledge area, who are charged with undertaking an initial assessment of the candidates proposed by numerous institutions across the world, and drawing up a reasoned shortlist for the consideration of the award committees. CSIC is also responsible for designating each committee’s chair and participates in the selection of its members, thus helping to ensure objectivity in the recognition of innovation and scientific excellence.