omnigenome

OmniGenome: A comprehensive toolkit for genome analysis.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

📦 Installation · 🚀 Getting Started · 🧬 Model Support · 📊 Benchmarks · 🧪 Application Tutorials · 📚 Paper

🔍 What You Can Do with OmniGenBench?

🧬 Benchmark effortlessly — Run automated and reproducible evaluations for genomic foundation models
🧠 Understand your models — Explore interpretability across diverse tasks and species
⚙️ Run tutorials instantly — Use click-to-run guides for genomic sequence modeling
🚀 Fine-tune and infer efficiently — Accelerated workflows for fine-tuning and inference on GFMs on downstream tasks

Installation

Requirements

Before installing OmniGenBench, ensure you have the following:

Python: 3.10 or higher (3.12 recommended for best compatibility)
PyTorch: 2.6.0 or higher (with CUDA support for GPU acceleration)
Transformers: 4.46.0 or higher (HuggingFace library)

PyPI Installation (Recommended)

Install the latest stable release from PyPI:

# Create dedicated conda environment (recommended)
conda create -n omnigen_env python=3.12
conda activate omnigen_env

# Install OmniGenBench
pip install omnigenbench -U

Source Installation (For Development)

Clone the repository and install in editable mode for development:

git clone https://github.com/yangheng95/OmniGenBench.git
cd OmniGenBench
pip install -e .

Note: For RNA structure prediction and design features, ViennaRNA is required. Install via conda: conda install -c bioconda viennarna

Quick Start

OmniGenBench provides unified interfaces for model inference, automated benchmarking, and fine-tuning across 30+ genomic foundation models and 80+ standardized tasks.

Auto-inference via CLI

Run inference with fine-tuned models on genomic sequences:

# Single sequence inference (TF binding prediction)
ogb autoinfer \
    --model yangheng/ogb_tfb_finetuned \
    --sequence "ATCGATCGATCGATCG" \
    --output-file predictions.json

# Batch inference from file (translation efficiency prediction)
ogb autoinfer \
    --model yangheng/ogb_te_finetuned \
    --input-file sequences.json \
    --batch-size 64 \
    --output-file results.json

Auto-inference via Python API

Programmatic inference with three-line workflow:

from omnigenbench import ModelHub

# Load fine-tuned model from HuggingFace Hub
model = ModelHub.load("yangheng/ogb_tfb_finetuned")

# Predict transcription factor binding (919 TFs, multi-label classification)
outputs = model.inference("ATCGATCGATCGATCGATCGATCGATCGATCG" * 10)
print(outputs)
# {'predictions': array([1, 0, 1, ...]), 
#  'probabilities': array([0.92, 0.15, 0.87, ...])}

# Interpret results
import numpy as np
binding_sites = np.where(outputs['predictions'] == 1)[0]
print(f"Predicted binding: {len(binding_sites)}/919 transcription factors")

More Examples: See Getting Started Guide and AutoInfer Examples for advanced usage patterns.

Auto-benchmark via CLI

Automated benchmarking with statistical rigor (multi-seed evaluation):

# Evaluate model on RGB benchmark (12 RNA tasks) with 3 random seeds
ogb autobench \
    --model yangheng/OmniGenome-186M \
    --benchmark RGB \
    --seeds 0 1 2 \
    --trainer accelerate

# Legacy command (still supported for backward compatibility)
# autobench --config_or_model "yangheng/OmniGenome-186M" --benchmark "RGB"

Output: Results include mean ± standard deviation for each metric (e.g., MCC: 0.742 ± 0.015, F1: 0.863 ± 0.009)

Visualization: See AutoBench GIF for workflow demonstration.

Auto-benchmark via Python API

Programmatic benchmarking with flexible configuration:

from omnigenbench import AutoBench

# Initialize benchmark
gfm = 'LongSafari/hyenadna-medium-160k-seqlen-hf'
benchmark = "RGB"  # Options: RGB, BEACON, PGB, GUE, GB
bench_size = 8
seeds = [0, 1, 2, 3, 4]  # Multi-seed for statistical rigor

# Run automated evaluation
bench = AutoBench(
    benchmark=benchmark,
    config_or_model=gfm,
    overwrite=False  # Skip completed tasks
)
bench.run(autocast=False, batch_size=bench_size, seeds=seeds)

Advanced Usage: See Benchmarking with LoRA for parameter-efficient fine-tuning during evaluation.

Supported Models

OmniGenBench provides plug-and-play evaluation for 30+ genomic foundation models, covering both RNA and DNA modalities across multiple species. All models integrate seamlessly with the framework's automated benchmarking and fine-tuning workflows.

Representative Models

Model	Params	Pre-training Corpus	Key Features
OmniGenome	186M	54B plant RNA+DNA tokens	Multi-modal encoder, structure-aware, plant-specialized
Agro-NT-1B	985M	48 edible-plant genomes	Billion-scale DNA LM with NT-V2 k-mer vocabulary
RiNALMo	651M	36M ncRNA sequences	Largest public RNA LM with FlashAttention-2
DNABERT-2	117M	32B DNA tokens, 136 species (BPE)	Second-generation DNA BERT with byte-pair encoding
RNA-FM	96M	23M ncRNA sequences	High performance on RNA structure prediction tasks
RNA-MSM	96M	Multi-sequence alignments	MSA-based evolutionary modeling for RNA
NT-V2	96M	300B DNA tokens (850 species)	Hybrid k-mer vocabulary, cross-species
HyenaDNA	47M	Human reference genome	Long-context (160k-1M tokens) autoregressive model
SpliceBERT	19M	2M pre-mRNA sequences	Fine-grained splice-site recognition
Caduceus	1.9M	Human chromosomes	Ultra-compact reverse-complement equivariant DNA LM
RNA-BERT	0.5M	4,000+ ncRNA families (Rfam)	Compact RNA BERT with nucleotide-level masking

Complete Model List: See Appendix E of the paper for all 30+ supported models, including PlantRNA-FM, UTR-LM, MP-RNA, CALM, and more.

Model Access: All models are available on HuggingFace Hub and can be loaded with ModelHub.load("model-name").

Benchmarks

OmniGenBench supports five curated benchmark suites covering both sequence-level and structure-level genomics tasks across species. All benchmarks are automatically downloaded from HuggingFace Hub on first use.

Suite	Focus	#Tasks / Datasets	Representative Tasks
RGB	RNA structure + function	12 tasks (SN-level)	Secondary structure, solvent accessibility, degradation
BEACON	RNA (multi-domain)	13 tasks	Base pairing, mRNA design, RNA contact prediction
PGB	Plant long-range DNA	7 categories	PolyA signal, enhancer, chromatin, splice site (up to 50kb context)
GUE	DNA general understanding	36 datasets (9 tasks)	TF binding, core promoter, enhancer, epigenetics
GB	Classic DNA classification	9 datasets	Human/mouse enhancers, promoter variant classification

Evaluation Protocol: All benchmarks follow standardized protocols with multi-seed evaluation (typically 3-5 runs) for statistical rigor. Results report mean ± standard deviation for each metric.

Accessing Benchmarks: Use AutoBench(benchmark="RGB") or ogb autobench --benchmark RGB to automatically download and evaluate on any suite.

Tutorials

RNA Design

RNA design is the inverse problem of RNA structure prediction: given a target secondary structure (in dot-bracket notation), design RNA sequences that fold into that structure. OmniGenBench provides both CLI and Python API for RNA sequence design using genetic algorithms enhanced with masked language modeling.

CLI Usage

# Basic RNA design for a simple hairpin structure
ogb rna_design --structure "(((...)))"

# Design with custom parameters for better results
ogb rna_design \
    --structure "(((...)))" \
    --model yangheng/OmniGenome-186M \
    --mutation-ratio 0.3 \
    --num-population 200 \
    --num-generation 150 \
    --output-file results.json

# Design complex structure (stem-loop-stem)
ogb rna_design \
    --structure "(((..(((...)))..)))" \
    --num-population 300 \
    --num-generation 200 \
    --output-file complex_design.json

Note: RNA design is now available through the unified ogb command interface.

Python API Usage

from omnigenbench import OmniModelForRNADesign

# Initialize model
model = OmniModelForRNADesign(model="yangheng/OmniGenome-186M")

# Design sequences for target structure
sequences = model.design(
    structure="(((...)))",      # Target structure in dot-bracket notation
    mutation_ratio=0.5,          # Mutation rate for genetic algorithm
    num_population=100,          # Population size
    num_generation=100           # Number of generations
)

print(f"Designed {len(sequences)} sequences:")
for seq in sequences[:5]:
    print(f"  {seq}")

Key Features:

🧬 Multi-objective genetic algorithm with MLM-guided mutations
⚡ Automatic GPU acceleration for large populations
📊 Real-time progress tracking with early termination
🎯 Returns multiple optimal solutions (up to 25 sequences)
💾 JSON output format for downstream analysis

Common Structure Patterns:

Simple hairpin: "(((...)))"
Stem-loop-stem: "(((..(((...)))..)))"
Multi-loop: "(((...(((...)))..(((...))).)))"
Long stem: "((((((((....))))))))"

The comprehensive tutorials of RNA Design can be found in:

RNA Design Examples - Comprehensive examples
RNA Design README - Detailed documentation
RNA Design Tutorial - Interactive notebook

You can find a visual demo of RNA Design here.

RNA Secondary Structure Prediction

RNA secondary structure prediction is a fundamental problem in computational biology, where the goal is to predict the secondary structure of an RNA sequence. In this demo, we show how to use OmniGenBench to predict the secondary structure of RNA sequences using a pre-trained model. The tutorials of RNA Secondary Structure Prediction can be found in Secondary_Structure_Prediction_Tutorial.ipynb(examples/rna_secondary_structure_prediction/00.ipynb).

You can find a visual example of RNA Secondary Structure Prediction here.

Citation

@article{yang2024omnigenbench,
      title={OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking}, 
      author={Heng Yang and Jack Cole, Yuan Li, Renzhi Chen, Geyong Min and Ke Li},
      year={2024},
      eprint={https://arxiv.org/abs/2505.14402},
      archivePrefix={arXiv},
      primaryClass={q-bio.GN},
      url={https://arxiv.org/abs/2505.14402}, 
}

License

OmniGenBench is licensed under the Apache License 2.0. See the LICENSE file for more information.

Contribution

We welcome contributions to OmniGenBench! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request on GitHub.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.4.8a0 pre-release

Nov 26, 2025

0.4.7a0 pre-release

Nov 26, 2025

0.4.6a0 pre-release

Nov 26, 2025

0.4.5a0 pre-release

Nov 25, 2025

0.4.4a0 pre-release

Nov 24, 2025

0.4.3a0 pre-release

Nov 24, 2025

0.4.2a0 pre-release

Nov 16, 2025

0.4.1a0 pre-release

Nov 14, 2025

0.4.0a0 pre-release

Nov 11, 2025

0.3.28a0 pre-release

Nov 2, 2025

0.3.27a0 pre-release

Nov 1, 2025

0.3.26a0 pre-release

Nov 1, 2025

0.3.25a0 pre-release

Oct 31, 2025

0.3.24a0 pre-release

Oct 31, 2025

0.3.23a0 pre-release

Oct 29, 2025

0.3.22a0 pre-release

Oct 24, 2025

0.3.21a0 pre-release

Oct 17, 2025

0.3.20a0 pre-release

Oct 15, 2025

0.3.17a0 pre-release

Oct 10, 2025

0.3.15a0 pre-release

Oct 1, 2025

0.3.13a0 pre-release

Sep 11, 2025

0.3.12a0 pre-release

Sep 11, 2025

0.3.11a2 pre-release

Sep 5, 2025

0.3.11a0 pre-release

Sep 5, 2025

0.3.10a0 pre-release

Aug 17, 2025

0.3.9a0 pre-release

Aug 17, 2025

0.3.8a0 pre-release

Aug 17, 2025

0.3.7a0 pre-release

Aug 6, 2025

0.3.6a0 pre-release

Aug 1, 2025

0.3.5a0 pre-release

Jul 25, 2025

0.3.4a0 pre-release

Jul 21, 2025

0.3.3a0 pre-release

Jul 17, 2025

0.3.0a0 pre-release

Jul 12, 2025

0.2.7a0 pre-release

Jun 28, 2025

0.2.6a1 pre-release

Jun 20, 2025

0.2.6a0 pre-release

Jun 16, 2025

0.2.5a0 pre-release

May 30, 2025

0.2.4a9 pre-release

May 8, 2025

0.2.4a3 pre-release

Apr 19, 2025

0.2.4a2 pre-release

Apr 18, 2025

0.2.4a0 pre-release

Apr 9, 2025

0.2.3a0 pre-release

Feb 26, 2025

0.2.1a0 pre-release

Feb 13, 2025

0.2.0a0 pre-release

Feb 5, 2025

0.1.6a1 pre-release

Jan 13, 2025

0.1.5a1 pre-release

Jan 2, 2025

0.1.4a1 pre-release

Oct 28, 2024

0.1.4a0 pre-release

Oct 28, 2024

0.1.3a0 pre-release

Oct 28, 2024

0.1.2a0 pre-release

Oct 11, 2024

0.1.1a0 pre-release

Oct 3, 2024

0.1.0a0 pre-release

Sep 13, 2024

0.0.11a0 pre-release

Sep 7, 2024

0.0.9a0 pre-release

Aug 19, 2024

0.0.6a0 pre-release

Jun 3, 2024

0.0.4a0 pre-release

May 5, 2024

0.0.3a0 pre-release

May 2, 2024

0.0.2a0 pre-release

Apr 29, 2024

0.0.1b0 pre-release

Jul 15, 2025

0.0.1a0 pre-release

Apr 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnigenome-0.4.8a0.tar.gz (55.7 kB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

omnigenome-0.4.8a0-py3-none-any.whl (13.8 kB view details)

Uploaded Nov 26, 2025 Python 3

File details

Details for the file omnigenome-0.4.8a0.tar.gz.

File metadata

Download URL: omnigenome-0.4.8a0.tar.gz
Upload date: Nov 26, 2025
Size: 55.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for omnigenome-0.4.8a0.tar.gz
Algorithm	Hash digest
SHA256	`3f7ba2a844936d52e23e8cd83b9c2294871e113af33362bf36afe7ff198cc80a`
MD5	`e5059d2bfff364eb55b9ec5200758b09`
BLAKE2b-256	`c60ea7f8d740986cc099de511a82b632886d71af1910256e9e59fb367cb34cd2`

See more details on using hashes here.

File details

Details for the file omnigenome-0.4.8a0-py3-none-any.whl.

File metadata

Download URL: omnigenome-0.4.8a0-py3-none-any.whl
Upload date: Nov 26, 2025
Size: 13.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for omnigenome-0.4.8a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80a1a22ca315f78670c1a9e9a818e0a793d675f1acbce57096bb82b883bb138f`
MD5	`6c777bcc9d39ab8fad7e45664fcc6c3f`
BLAKE2b-256	`0a796d1649c1cc4ee10a02f16615d259da9c507f8204455e32caaec8418b1fc1`

See more details on using hashes here.

omnigenome 0.4.8a0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

📦 Installation · 🚀 Getting Started · 🧬 Model Support · 📊 Benchmarks · 🧪 Application Tutorials · 📚 Paper

🔍 What You Can Do with OmniGenBench?

Installation

Requirements

PyPI Installation (Recommended)

Source Installation (For Development)

Quick Start

Auto-inference via CLI

Auto-inference via Python API

Auto-benchmark via CLI

Auto-benchmark via Python API

Supported Models

Representative Models

Benchmarks

Tutorials

RNA Design

CLI Usage

Python API Usage

RNA Secondary Structure Prediction

More Tutorials

Citation

License

Contribution

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes