A diffusion model for protein design

A team led by Baker Lab scientists Joseph Watson, David Juergens, Nate Bennett, Brian Trippe, and Jason Yim has created a powerful new way to design proteins by combining structure prediction networks and generative diffusion models. The team demonstrated extremely high computational success and tested hundreds of A.I.-generated proteins in the lab, finding that many may be useful as medications, vaccines, or even new nanomaterials. This research is available as a preprint on bioRvix titled “Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models.

The software tool DALL-E produces high-quality images that have never existed before using something called a diffusion model, which is a machine-learning algorithm that specializes in adding and removing noise. Diffusion models for image generation begin with grainy bits of static and gradually remove noise until a clear picture is formed. Additional pieces of software guide this de-noising process so that the new images end up matching what was asked for.

We have developed a guided diffusion model for generating new proteins. With prior design methods, tens of thousands of molecules may have to be tested before finding a single one that performs as intended. Using the new design method, dubbed RF diffusion, the team had to test as little as one per design challenge. RF diffusion outperforms existing protein design methods across a broad range of problems. Highlights include a picomolar binder generated through pure computation and a series of novel symmetric assemblies experimentally confirmed by electron microscopy.

“These works reveal just how powerful diffusion models can be for protein design,” says Watson. “It’s extremely exciting,” added Juergens, “and it’s really just the beginning.”

RF diffusion can generate novel proteins that bind to molecular targets
RF diffusion can be configured to produce symmetric or asymmetric oligomers.