AI Model Offers Map of How Genes Work Together in Different Cellular Contexts

Mei 22, 2026 - 04:00
 0  0
AI Model Offers Map of How Genes Work Together in Different Cellular Contexts

Scientists at the Icahn School of Medicine at Mount Sinai have created a new artificial intelligence (AI) model that helps reveal how genes function together inside human cells, offering a powerful new way to understand biology and disease. Their study, headed by Avi Ma’ayan, PhD, professor of pharmacological sciences and director of the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai, introduces a gene set foundation model (GSFM) designed to learn patterns in how genes are grouped and function across thousands of biological contexts.

The work draws inspiration from advances in large language models (LLMs) such as ChatGPT, which learn how words gain meaning depending on their context. In a similar way, a GSFM learns how genes behave differently depending on their cellular “context.”

The model provides a new way to understand the structural and functional organization of genes and their products inside human cells. This improved understanding could eventually support the development of better diagnostics, biomarkers, and therapies. By mapping how genes relate to one another across many biological situations, the GSFM creates a reference framework that can help scientists interpret complex multiomics datasets more effectively, say the investigators. The organization of genes within cells remains one of the major unsolved questions in biology,” Ma’ayan noted. “The GSFM helps address this by learning from millions of gene groupings derived from published research and gene expression datasets.”

Ma’ayan is senior corresponding author of the team’s published paper in Patterns, titled “GSFM: A gene set foundation model pre-trained on a massive collection of diverse gene sets.”

In their paper the authors explained, “Genes are a bit like words, and gene sets are a bit like sentences, because words are reused in different contexts to express unique meanings, and cells reuse genes to carry out different biological functions.”

“Genes rarely act alone,” Ma’ayan further noted. “Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell. A single gene can play different roles in different settings, much like a word can have different meanings in different sentences. Just as modern language models learn the meaning of words from context, we asked whether AI could learn the ‘meaning’ of genes in the same way. Our GSFM was designed to do exactly that.”

To build the model, the researchers compiled millions of gene sets from published scientific studies and gene expression datasets. In total, the system learned from hundreds of thousands of independent research efforts.

The AI model was trained in a way similar to solving a puzzle: it was given part of a gene set and asked to predict the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact.

The AI model was then benchmarked against other approaches and demonstrated strong performance, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this, the model was trained using gene sets from publications up to a defined cutoff date, and then tested on whether it could predict discoveries reported in studies published after that cutoff date.

“Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information,” Ma’ayan stated. “This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology.”

The team’s studies showed that the new model can help identify the function of poorly understood genes without immediate laboratory experiments, highlight genes involved in disease processes, and suggest potential new drug targets and biomarkers. The model offers a reusable knowledge system for many types of biomedical research data analysis tasks—for example, improved gene set enrichment analysis. In essence, the researchers suggested, GSFM offers a new “map” of how genes work together in different contexts. “Unlike prior methods that are mainly based on similarity of all genes to annotated genes, GSFM’s architecture can capture the more complex non-linear and multi-modal relationships between genes and the gene modules these genes constitute,” the investigators wrote. “GSFM’s ability to predict genes held out from known gene sets can be useful for many applications in computational systems biology.”

GSFMs could enhance existing bioinformatics tools and improve the interpretation of data collected with omics technologies. One immediate application is in gene set enrichment analysis, a widely used method in molecular biology research. By improving how scientists interpret gene groupings, the model may help uncover new biological insights from both existing and future datasets.

“Like the way LLMs predict the next word in a sentence, GSFM guesses the next missing gene when presented with a gene set,” the scientists stated. “With this power, GSFM can be used to reliably assign the most likely functions to understudied genes, and make gene set enrichment analysis more precise, ranking the most relevant enriched terms when presented with any query gene set.”

The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is to integrate it with language-based models to generate natural-language explanations of gene functions. Another future direction is combining GSFM with drug-focused AI models, with the long-term aim of predicting how drugs interact with cells and supporting the design of new therapeutics.

“In summary, GSFM’s ability to distil knowledge from large amounts of unlabeled gene sets automatically, and to do so successfully across multiple sources of knowledge, can be translated into many ‘‘low-hanging fruit’’ hypotheses that could be tested in wet lab experiments to rapidly advance knowledge in biomedical research,” the investigators concluded.

The gene pages and the GSFM model are accessible at https://gsfm.maayanlab.cloud and https://github.com/MaayanLab/gsfm.

The post AI Model Offers Map of How Genes Work Together in Different Cellular Contexts appeared first on GEN - Genetic Engineering and Biotechnology News.

Apa Reaksi Anda?

Suka Suka 0
Kurang Suka Kurang Suka 0
Setuju Setuju 0
Tidak Setuju Tidak Setuju 0
Bagus  Bagus 0
Berguna Berguna 0
Hebat Hebat 0
Edusehat Platform Edukasi Online Untuk Komunitas Kesehatan Agar Mendapatkan Informasi Dan Pengetahuan Terbaru Tentang Kesehatan Dari Nasional Maupun Internasional. || An online education platform for the health community to obtain the latest information and knowledge about health from both national and international sources.