Duke researcher helps develop large language model to predict antibody structures, support disease prevention

A research team led by a Duke professor recently developed a novel computational model to predict antibody structures, a significant breakthrough for disease prevention efforts.

A study published in December by Rohit Singh, assistant professor of biostatistics and bioinformatics, and his team proposes a new computational model called antibody mutagenesis-augmented processing, or “AbMAP,” that can predict antibody structures and binding strengths based on amino acid sequences.

“Antibodies are incredibly important proteins … from a basic science perspective, as we try to understand our immune system,” Singh said, noting that understanding their structure is essential in developing therapeutics and disease prevention methods.

This research breakthrough may allow scientists to analyze millions of potential antibodies to find the key few that could be used to treat the viruses that cause COVID-19 and other infectious diseases.

Previous researchers have developed protein language models, a type of artificial intelligence model that predicts protein structures from their amino acid sequence. However, this approach is not as successful in helping predict the structure of antibodies, since unlike most other protein sequences, antibodies are extremely variable.

“The body makes billions of different antibodies by mixing together a few genes in some key regions,” essentially generating a large number of possible antibodies until one works, Singh said.

Antibodies are Y-shaped in structure, with their hypervariable regions — the “key regions” that Singh mentioned — at the tips of the Y, which bind to pathogens. While this property is extremely useful for the human body in protecting against a wide range of pathogens, it makes it incredibly difficult to predict the antibodies. Singh’s research aims to address this challenge.

To design the model, the researchers created modules based on existing protein language models. One was trained from 3,000 antibody structures on hypervariable sequences to learn which sequences generate specific structures. Another was trained on the strength of bonds between various antibody structures and three different antigens.

Their discovery was made through a general strategy known as “transfer learning.”

“You can take a model that is designed for one task, and then you figure out how to tweak it just enough so you can adapt it for another task,” Singh said. “The insight you have is that the two tasks are similar, but not identical and so the challenge you try to address is, which parts of the model should you change? How should you adapt it?”

To adapt the model to antibodies, the team utilized a strategy called contrastive learning. They gave the model the hypervariable region and a random sequence, asking the model to identify and analyze the hypervariable region. Then, the researchers scored how well the model could generate a representation for hypervariable regions of known antibody structures, testing its accuracy and training it to improve.

“By carefully feeding the model contrastive examples and comparing them and saying, ‘okay, finesse that so that you start capturing what is happening in structure,’ we are able to get the model to capture the hypervariable regions in a transfer learning setting,” Singh said.

AbMAP has the power to revolutionize antibody therapeutics. The new technology has numerous applications for disease prevention work, including investigating immune responses to specific pathogens and understanding the binding of antibodies and antigens.

For experimental screening studies in particular, the model assists in understanding the structure of antibodies’ hypervariable regions, how tightly they bind and how well they neutralize antigens. Determining these structural and functional aspects of antigen and antibody binding can allow researchers to pinpoint more powerful options for disease-fighting therapies.

“If you have … tried maybe one or two point mutations, then [you] can use them and learn from them and propose more extensive mutations that might give you even more strength,” Singh said.

Regarding his team’s future plans, Singh noted that they are “definitely interested in making better antibodies.” He added that there is also “a very robust immunology and vaccine design effort between the Duke Human Vaccine Institute and others,” pointing to a recent $5 million grant the University received from the National Institutes of Health to “study immune responses.”

“There are folks who say, ‘let me just take a standard language model … [and try to] make it work better on antibodies,’ and that is great. There are other folks who say, ‘the antibodies are so different that we need to make something very custom for that,’ and more power to them as well,” Singh said. “The way we have approached it is that, ‘can we have the best of both worlds?’”


Srilakshmi Venkatesan

Srilakshmi Venkatesan is a Trinity first-year and a staff reporter for the news department.

Discussion

Share and discuss “Duke researcher helps develop large language model to predict antibody structures, support disease prevention” on social media.