AI model from Google DeepMind reads recipe for life in our DNA

An artificial intelligence model developed by Google’s DeepMind, named AlphaGenome, is poised to revolutionize our comprehension of DNA, the intricate blueprint that governs the construction and operation of the human body. This groundbreaking AI holds the potential to unlock new avenues for disease understanding and the discovery of novel medicines, according to researchers involved in its development. AlphaGenome is designed to assist scientists in deciphering why subtle variations within an individual’s DNA can predispose them to a range of conditions, including high blood pressure, dementia, and obesity. Furthermore, it promises to significantly expedite the scientific community’s grasp of genetic disorders and the complexities of cancer. While the creators acknowledge that the model is not infallible, experts have lauded it as an "incredible feat" and a "major milestone" in the field. Natasha Latysheva, a research engineer at DeepMind, articulated the vision for AlphaGenome, stating, "We see AlphaGenome as a tool for understanding what the functional elements in the genome do, which we hope will accelerate our fundamental understanding of the code of life."

The human genome, a vast repository of genetic information, comprises approximately three billion DNA letters, denoted by the chemical bases Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). A mere fraction, around 2%, of this genetic material consists of genes, which provide the instructions for synthesizing all the proteins essential for the body’s growth and functionality. The remaining 98%, often referred to as the "dark genome," remains less understood but plays a critical role in orchestrating gene usage throughout the body. It is within this largely uncharted territory that many mutations linked to diseases are identified. AlphaGenome possesses the remarkable ability to analyze one million DNA letters at a time, offering unprecedented insight into the enigmatic dark genome. Beyond simply identifying gene locations, the AI can predict the dark genome’s influence on various biological processes, such as gene expression – determining whether a gene is highly active or suppressed – and gene splicing, the mechanism by which the body can generate diverse proteins from a single gene. Crucially, AlphaGenome can forecast the precise impact of altering even a single DNA letter, a capability with profound implications for genetic research.

Natasha Latysheva expressed considerable enthusiasm for the AI model’s capacity to illuminate which mutations contribute to disease and to pinpoint the underlying causes of rare genetic disorders. She further elaborated on its potential applications, suggesting that AlphaGenome could serve as an invaluable component in the drug discovery pipeline, ultimately facilitating the development of new therapeutic interventions. The AI’s utility extends to the realms of synthetic biology and the design of novel DNA sequences, which could be instrumental in advancing gene therapies. The scientific validation of AlphaGenome has been published in the prestigious journal Nature, and its availability for non-commercial use since last year has already seen it adopted by approximately 3,000 scientists worldwide.

AI model from Google DeepMind reads recipe for life in our DNA

Dr. Gareth Hawkes, affiliated with the University of Exeter, is leveraging AlphaGenome to investigate how genetic mutations might influence an individual’s susceptibility to obesity and diabetes. Large-scale studies that have sequenced the complete genetic code of tens of thousands of individuals have identified DNA variants associated with these conditions; however, these variants are frequently located within the dark genome. Dr. Hawkes commented on the challenge, stating, "They’re directly impacting some important piece of biology that we don’t really understand." AlphaGenome enables researchers to rapidly generate predictions about the function of these variants, thereby streamlining their subsequent laboratory testing. "Those predictions will help to inform which biological processes those genetic variants might be impacting, and potentially lead to drug developments," Dr. Hawkes explained. He further conveyed his optimism, noting, "I wouldn’t say the dark side of the genome is solved by AlphaGenome, but it’s a big leap. I’m really excited."

The field of cancer research stands to benefit significantly from the accelerated insights provided by AlphaGenome. The AI model has demonstrated its ability to differentiate between mutations that drive cancer progression and are thus potential therapeutic targets, and those that are merely incidental. Dr. Robert Goldstone, head of genomics at the Francis Crick Institute, hailed AlphaGenome as a "major milestone in the field of genomic AI," characterizing the breakthrough as an "incredible technical feat" due to its "ability to predict gene expression from DNA sequence alone." Professor Ben Lehner, who leads generative and synthetic genomics at the Wellcome Sanger Institute, reported that AlphaGenome has been rigorously tested across more than half a million experiments, consistently demonstrating strong performance. Despite these impressive results, he cautioned that the model is "far from perfect" and that substantial work remains. Professor Lehner emphasized the current era as a period of immense scientific progress, where the synergistic combination of the UK’s global leadership in genomics, biomedical research, and artificial intelligence is driving transformative advancements in biology and medicine.

It is worth noting that the team at DeepMind achieved significant recognition in 2024, winning the Nobel Prize in Chemistry for their pioneering work on AlphaFold, an AI system adept at predicting the three-dimensional structures of proteins. Pushmeet Kohli, vice president of science and strategic initiatives at Google DeepMind, expressed a broad vision for the future, stating, "I think we are at the start of a new era of scientific progress, and AI is going to enable a number of different breakthroughs."

The operational mechanics of AlphaGenome distinguish it from large language models like ChatGPT, which are designed to predict the subsequent word in a sequence. Instead, AlphaGenome functions as a "sequence-to-function model," meticulously examining how alterations within the DNA sequence impact the ultimate biological outcome. The model was trained on extensive, publicly accessible databases derived from experiments conducted on human and mouse cells. While there is a general consensus on the AI model’s efficacy, ongoing refinement is deemed necessary. Its accuracy exhibits limitations in certain areas, such as predicting how genes are regulated over considerable distances, exceeding 100,000 DNA letters. The development team also aims to enhance the model’s precision across different tissue types. For instance, a neuron in the brain and a cardiac muscle cell share identical genetic codes; however, their distinct cellular functions arise from the differential utilization of these genetic instructions within each cell type. This intricate regulation of gene expression across diverse cellular environments represents a key area for future improvement in AlphaGenome’s capabilities.