- An AI language model has created proteins as good as those perfected over a million years of evolution.
- Salesforce’s ProGen has designed sequences based on the “sentences” of biological proteins.
- Scientists are investigating whether AI could identify a cure for disorders like rheumatoid arthritis and multiple sclerosis.
Artificial intelligence is a master of imitation. Whenever scientists design an AI, whether it’s to mimic human language or master a game like chess, it’s either matches or far exceeds the capabilities of its biological creators. Now AI has proven that it can even master the art of biology itself.
Researchers from the University of California, San Francisco, the University of California, Berkeley, and Salesforce Research, a science arm of the SF-based software company, have developed an AI that can copy evolution itself . This doesn’t mean that the AI created some kind of evolutionary superior superhuman (yet), but instead the AI designed 20 amino acid sequences that make up the proteins. Compared to the handwork of nature, some of the sequences worked as well as those generated over millions of years of evolution. The researchers published their findings in the magazine Natural biotechnology.
Interestingly, the scientists didn’t design an AI from scratch, but rather repurposed one from an unlikely domain: a language model. The researchers used Salesforce’s ProGen natural language processing capabilities and focused on biological protein “sentences”, essentially a language of amino acids.
“In the same way that words are strung together one by one to form text sentences, amino acids are strung together one by one to form proteins,” Nikhil Naik, director of AI research at Salesforce Research, Told Motherboard. “Building on this insight, we apply neural language modeling to proteins to generate realistic yet novel protein sequences.”
After training ProGen on 280 million proteins, the AI was “iteratively optimized by learning to predict the probability of the next amino acid given previous amino acids in a raw sequence,” according to the paper. The team ultimately zeroed in on five specific artificial proteins and compared them to an enzyme found in chicken eggs called “chicken egg white lysozyme” – two of the AI-generated proteins compared favorably.
Overall, Salesforce estimates that 73% of NextGen’s proteins might work, compared to 59% of natural proteins, and found that the AI was also able to detect evolutionary patterns (although it wasn’t specifically designed for this). AI engineered human proteins before, but this is the first time an AI language model has pulled off the feat.
But the team doesn’t just want to answer the question of whether AIs in language models can design proteins. Because proteins are the basis of many diseases, the Salesforce AI Research team is already investigating how ProGen could identify the treatment for disorders such as rheumatoid arthritis and multiple sclerosis.
So while some AIs are trained to beat humans at their own game (literally), language models like ProGen could one day put one on evolution itself and help humans combat some of the health issues. most debilitating in the world.
Darren lives in Portland, has a cat, and writes/edits about science fiction and how our world works. You can find his previous stuff at Gizmodo and Paste if you look hard enough.