After AlphaFold: A Protein Folding Competition Seeks Next Big Breakthrough
“In a sense, the problem is solved,” computational biologist John Moult declared in late 2020. London-based DeepMind had just swept a biennial competition co-founded by Molt that tests teams’ abilities to predict protein structures — one of biology’s coolest challenges — with its revolutionary AI tool AlphaFold.
Two years later, Molt’s competition, the Critical Assessment of Structure Prediction (CASP), is still running in Alpha Fold’s long shadow. Results for this year’s edition (CASP15)—revealed over the weekend at a conference in Antalya, Turkey—show that the most successful method for predicting protein structures from their amino acid sequences involved AlphaFold, which relies on an AI approach called Deep Learning. “Everyone uses AlphaFold,” says Yang Zhang, a computational biologist at the University of Michigan in Ann Arbor.
However, AlphaFold’s progress has opened the door to new challenges in protein structure prediction–some of which are included in this year’s CASP–that may require new approaches and more time to fully address. “The low-hanging fruit has been picked,” says Mohammed al-Quraishi, a computational biologist at Columbia University in New York City. “Some of the next problems will be more difficult.”
The CASP program began in 1994, with the goal of achieving accuracy in the field of protein structure prediction—an advance that would accelerate efforts to understand the building blocks of cells and advance drug discovery. During the competition year, teams were tasked with using computational tools to predict the structures of proteins that had been determined using experimental methods such as X-ray crystallography and cryo-electron microscopy, but had not yet been launched.
Entries are evaluated on how well predictions for entire proteins, or independently folding subunits called domains, agree with experimental structures. Some of AlphaFold’s predictions in CASP14 were more or less indistinguishable from experimental models – the first time such accuracy had been achieved.
Since its unveiling in CASP14, AlphaFold has become ubiquitous in life sciences research. DeepMind released the basic code for the program in 2021 so that anyone could run the program, and the AlphaFold database updated this year contains predicted structures — of varying quality — for every protein from nearly all the organisms represented in the genome databases, a total of more than 200 million. proteins.
AlphaFold’s success and newfound ubiquity presented a challenge for Molt, who works at the University of Maryland, Rockville, and his colleagues as they planned this year’s CASP. “People say, ‘Oh, we don’t need CASP anymore, problem solved.'” “I think this is exactly the wrong way.”
In CASP15, the most successful teams were those that adapted and built on AlphaFold in various ways, resulting in modest gains in predicting the shape of individual proteins and domains. “The accuracy is already so high that it’s hard to get much better,” Molt says.
To make competition more important in a post-AlphaFold world, Moult and his team have added new challenges and modified some of the existing ones. New tests include determining how proteins interact with other molecules such as drugs and predicting the multiple forms some proteins can take. Over the past decade, CASPs have included “pools” of multiple interacting proteins, Molt says, but accurate prediction of the structure of these molecules has received additional focus this year.
“It’s the right thing to do,” Zhang says, because prediction of the structures of individual proteins, or domains — the bread and butter of previous CASPs — has been largely solved by AlphaFold. Determining the shape of protein complexes, in particular, presents an important new challenge for the field, because there is much room for improvement, says Arne Elofsson, a protein bioinformatics scientist at Stockholm University.
AlphaFold was initially designed to predict the shape of individual proteins. But within days of releasing it to the public, other scientists showed that the software could be “hacked” to model how multiple proteins interact. In the months that followed, researchers came up with myriad ways to improve AlphaFold’s ability to handle complexes. DeepMind has released an update called AlphaFold-Multimer with this goal in mind.
These efforts seem to have paid off, because CASP15 experienced a marked increase in the number of microassemblies, compared to previous competitions, mainly due to the methods it adapted to AlphaFold. “It’s a new game for us to be close to experimental precision with complexes.” Molt says. “We’ve had some failures, too.”
For example, the teams made astonishingly accurate predictions for a viral particle of unknown function made up of two intertwined, intertwined proteins. This type of shape is an apotheosis of pre-AlphaFold artifacts, says Ezgi Karaca, a computational structural biologist at the Izmir Center for Biomedicine and Genomics in Turkey, who evaluated this type of AlphaFold prefold. Karaca adds that the standard version of AlphaFold failed to accurately model the shape of the giant 20-chain bacterial enzyme, but some teams predicted the protein’s structure by applying additional hacks to the network.
Meanwhile, teams have struggled to predict complexes involving immune molecules called antibodies — including several linked to the SARS-CoV-2 protein — and related molecules called nanobodies. But there have been glimpses of success in some teams’ predictions, Karaca says, suggesting that the breakthrough to AlphaFold will be useful for predicting the shape of these medically important molecules.
This year’s CASP was also notable for the absence of DeepMind. The company didn’t say why it wasn’t participating, but it did release a short statement during CASP15 congratulating the teams that took part. (At the same time, it rolled out an update to AlphaFold to help researchers measure their progress against the network.)
Other researchers say that competition represents a significant time commitment, which the company may have felt better spent on other challenges. “It would have been nice if they had participated,” Molt says. But, he adds, “because the methods are so good, they won’t be able to make another big leap.”
The researchers say that making significant improvements to the AlphaFold will take time, and possibly require new innovations in machine learning and protein structure prediction. One area under development is the application of ‘language models’, such as those used in predictive text tools, to predict protein structures. But these methods—including one developed by the social networking giant Meta—didn’t perform nearly as well in CASP15 as the AlphaFold-based tools did.
However, such tools may be useful for predicting how mutations will alter protein structure—one of several major challenges in protein structure prediction that have emerged as a result of AlphaFold’s success. Thanks to this, the field is no longer focused on a single goal, says Al-Quraishi. “There are a large number of these problems.”
#AlphaFold #Protein #Folding #Competition #Seeks #Big #Breakthrough