Why AI Alone Isn’t Enough for Oligonucleotide Discovery

Mei 14, 2026 - 05:10
 0  0
Why AI Alone Isn’t Enough for Oligonucleotide Discovery

AI is reshaping drug discovery, and nucleic acid–based medicines, including mRNAs, gene therapy, and oligonucleotide therapeutics, are no exception. By optimizing sequences and chemical modifications for experimental testing, AI accelerates discovery timelines, which  is particularly critical for oligo therapeutics, a modality central to the n=1 rare diseases, which afflict mostly young patients for whom there is additional urgency.

However, a familiar caveat remains: AI is only as powerful as the data from which it learns. How can we provide enough high-quality input data to fuel this engine and design next-generation precision medicine?

A typical workflow for developing an AI-powered oligo predictive model begins with collecting experimental outcomes of oligo sequences, with each sequence annotated using a defined set of features. This data is then used to train AI models that identify patterns associated with improved activity and safety.

However, as is often the case with pioneering technologies such as oligonucleotides, the scarcity of data is a major problem. To overcome this limitation, scientists trawl public resources such as publications and patents to extract this data. ASOptimizer, OligoAI, and eSkip-Finder are examples of newer oligo-predicting AI models that are trained using publicly available data.

While these models are advancing in the right direction, relying primarily on this data comes with several disadvantages, such as:

  • inconsistent experimental conditions between the datasets,
  • limited diversity in sequences and chemistries,
  • lack of negative data, and
  • insufficient coverage of critical information such as toxicity and off-target effects.

Furthermore, since data sourcing and annotating often require the use of automated, AI-powered tools, there is a risk of mislabeling and misinterpretation. As such, correlation statistics between predicted and experimental values for these models are not too high, generally hovering between 0.4 and 0.7.1,2, 3

Building the data foundation for AI drug discovery

The most valuable training data is:

  • designed to span broad chemical space and probe critical safety features,
  • produced under controlled conditions,
  • consistently processed and annotated, and
  • generated in a controlled environment, ideally internally.

Large-scale screening campaigns are essential in that context as they provide the dense, reliable, datasets required to train AI models and extract meaningful insights for sequence and chemistry prediction.

Ming Wang, PhD
Ming Wang, PhD

Brett Monia, CEO of Ionis Pharmaceuticals, describes this reality as “hard, brutal screening–screening a lot of oligonucleotides with different decorations, different amounts of chemistries, different sequences. We have plenty of (design) rules, but we still don’t have enough.”4

One way to address this challenge is through intentional screening design: deliberately varying sequence motifs and positional chemistries within screening libraries to systematically explore chemical landscapes and expand the empirical foundation on which both rules and AI models are built.

With the advent of faster and more affordable transcriptomic technologies, high-throughput RNA-seq can now be incorporated into oligonucleotide screening workflows. This method enables the systematic detection of off-target effects, including those that arise through mechanisms beyond straightforward.5,6

While these approaches generate large and complex datasets, they represent a critical investment—one that lays the foundation for a faster, more efficient, and ultimately more cost‑effective future of oligo discovery.

Digital infrastructure, engineering AI-ready data at scale

While generating large datasets may be hard and brutal, managing, curating, and analyzing them doesn’t need to be. For data to be truly reliable and trustworthy, quality must be engineered from the start. Important aspects to consider include:

  • A single source of truth–a centralized FAIR data repository, where all data is systematically stored and governed for controlled access and use;
  • Comprehensive metadata capture, including protocols, batch numbers, and reagent references to ensure results can be interpreted correctly and are not driven by experimental artifacts;
  • Automated quality control and data analysis of large screens for large‑scale screens, ensuring consistent, efficient, and reproducible data processing; and
  • Consistent ontology and nomenclature for oligo sequences and their chemistries, as exemplified by Roche’s open-source tool (HelmShaker) for translating molecules into HELM notation.

In practice, these principles are implemented through integrated digital infrastructures that combine molecular registration systems with automated analytics across diverse experimental modalities such as high‑throughput screening, next‑generation sequencing, mass spectrometry, and chromatography.

Such approaches are increasingly used across the pharmaceutical and biotechnology sectors to manage oligonucleotide ADME, process development, and screening data, thus helping teams maintain data integrity and continuity throughout the oligo discovery and development lifecycle.

AI promises to redefine what is possible in oligo discovery, and the field is already beginning to see its impact. But AI alone is not the breakthrough—data is. Only large, high‑quality experimental datasets, generated intentionally and prospectively, can unlock AI’s full predictive power.

Organizations that invest early in both systematic data generation and robust data infrastructure will be best positioned to lead the next wave of oligonucleotide discovery. This shift is especially urgent for n = 1 rare diseases, where speed, precision, and learning from every experiment can make the difference between possibility and progress.

Ming Wang, PhD, is scientific business manager at Genedata.

References:

1Hwang, G., Kwon, M., Seo, D., Kim, DH., Lee, D., Lee, K., Kim, E., Kang, M., Ryu, JH., ASOptimizer: Optimizing antisense oligonucleotides through deep learning for IDO1 gene regulation. Mol Ther Nucleic Acids. 2024 Apr 6;35(2):102186. doi: 10.1016/j.omtn.2024.102186

2Chiba, S., Lim, KRQ., Sheri, N., Anwar, S., Erkut, E., Shah, MNA., Aslesh, T., Woo, S., Sheikh, O., Maruyama, R., Takano, H., Kunitake, K., Duddy, W., Okuno, Y., Aoki, Y., Yokota, T. eSkip-Finder: a machine learning-based web application and database to identify the optimal sequences of antisense oligonucleotides for exon skipping. Nucleic Acids Res. 2021 Jul 2;49(W1):W193-W198. doi: 10.1093/nar/gkab442

3Hill, B., Jaques, M.R., Nair, RR., Whiffin, N., Wood, MJA., Sanders, SJ., Oliver, PL., Hill, AC., Rinaldi, C. Accurately modelling RNase H-mediated antisense oligonucleotide efficacy. bioRxiv. 2025 Oct 30. https://doi.org/10.1101/2025.10.29.685292

4Accelerating Oligonucleotide Therapeutics. Evotec eBook.

5Pekker, D., Kuntz, S., McArthur, M., Nicholson-Shaw, T., Yanke, S., Mukhopadhyay, S. A Dose-Response Model for Accurate Detection and Quantification of Transcriptome-Wide Gene Knockdown for Oligonucleotide-Based Medicines. bioRxiv. 2024 May 29. https://www.biorxiv.org/content/10.1101/2024.05.28.596270v1.full.pdf

6In-silico siRNA Off-Target Predictions: What Should We Be Looking For? OTS Oligonucleotide Therapeutics Society, Webinar, 2024 May 2

 

 

The post Why AI Alone Isn’t Enough for Oligonucleotide Discovery appeared first on GEN - Genetic Engineering and Biotechnology News.

Apa Reaksi Anda?

Suka Suka 0
Kurang Suka Kurang Suka 0
Setuju Setuju 0
Tidak Setuju Tidak Setuju 0
Bagus  Bagus 0
Berguna Berguna 0
Hebat Hebat 0
Edusehat Platform Edukasi Online Untuk Komunitas Kesehatan Agar Mendapatkan Informasi Dan Pengetahuan Terbaru Tentang Kesehatan Dari Nasional Maupun Internasional. || An online education platform for the health community to obtain the latest information and knowledge about health from both national and international sources.