This week, the artificial intelligence (AI) program, AlphaFold, developed by Google’s DeepMind, has solved a decades-old problem in biology: determining a protein’s 3D structure based only on its amino acid sequence.
The results were announced at the 14th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP14), where Alphafold beat 100 other participating teams.
“Building on the work of hundreds of researchers across the globe, an AI program called AlphaFold, created by London-based AI lab DeepMind, has proved capable of determining the shape of many proteins. It has done so to a level of accuracy comparable to that achieved with expensive and time-consuming lab experiments,” wrote the organizers in a statement.
The protein folding problem
Proteins are the building blocks of life, working as intricate machines that control every process within our cells and bodies, such as antibodies that help ward off infection, and regulation of blood sugar. Their precise function is determined by their unique 3D structures, which are spontaneously assembled and held together through different attractive and repulsive forces predetermined by their linear amino acid sequence.
“Even tiny rearrangements of these vital molecules can have catastrophic effects on our health, so one of the most efficient ways to understand disease and find new treatments is to study the proteins involved,” said Dr. John Moult, a computational biologist at the University of Maryland, who co-founded CASP in 1994.
Since Christian Anfinsen was awarded the Nobel Prize in 1972 for showing that it should be possible to determine the shape of proteins based on their sequence of amino acids, scientists have been trying for decades to find an efficient way of determining how a linear string of amino acids can be used to map out the intricate loops, folds, and pleats of a protein’s final functional form.
While research in recent years has been bringing us ever closer, current gold standard techniques—such as nuclear magnetic resonance (NMR) spectroscopy and X-ray analysis—used to solve protein structures today can be difficult, expensive, and time consuming. Of the 200 million known proteins, we have only solved a small percentage of their structures, and with a growing number of new proteins added to the database every year, our current methods will not allow us to keep up.
“There are tens of thousands of human proteins and many billions in other species, including bacteria and viruses, but working out the shape of just one requires expensive equipment and can take years,” said Moult.