Skip to main content

PyTorch implementation of AlphaGenome

Project description

AlphaGenome PyTorch

A PyTorch port of AlphaGenome, the DNA sequence model from Google DeepMind that predicts hundreds of genomic tracks at single base-pair resolution from sequences up to 1M bp.

We strive to make it an accessible, readable, and hackable implementation — for integrating into existing PyTorch pipelines, fine-tuning on custom datasets, and building on top of.

Installation

Installation from PyPI:

pip install alphagenome-pytorch

Installation from repo:

pip install git+https://github.com/genomicsxai/alphagenome-pytorch

For fine-tuning (incl. BigWig data loading):

pip install alphagenome-pytorch[finetuning]  # adds pyBigWig, pyfaidx

Quick Start

from alphagenome_pytorch import AlphaGenome

# Load pretrained model
model = AlphaGenome.from_pretrained('alphagenome.pt', device='cuda')

# Create one-hot encoded DNA sequence in NLC format (batch=1, length=131072, channels=4)
# Channels: A=0, C=1, G=2, T=3
sequence = np.random.randint(0, 4, size=(1, 131072))
dna_onehot = torch.tensor(np.eye(4)[sequence], dtype=torch.float32)

# Inference (handles dtype casting, returns float32 outputs)
outputs = model.predict(dna_onehot, organism_index=0)  # organism: 0=human, 1=mouse

# outputs['atac'][1]   -> (B, 131072, 256) ATAC at 1bp
# outputs['atac'][128] -> (B, 1024, 256) ATAC at 128bp
# outputs['contact_maps'] -> (B, 28, 64, 64) 3D contact

The weights for this port are available on Hugging Face.

Extracting Embeddings

Use model.encode() to get embeddings without running prediction heads — useful for building custom heads or analyzing representations:

# Get embeddings (128bp only for efficiency)
emb = model.encode(dna_onehot, organism_index=0, resolutions=(128,))
emb['embeddings_128bp']  # (B, 1024, 3072) at 128bp

Fine-tuning

Train a new head on your data with frozen trunk (linear probing) or with LoRA adapters:

from alphagenome_pytorch import AlphaGenome, TransferConfig, load_trunk, prepare_for_transfer

# Load trunk, freeze, add custom heads
model = AlphaGenome()
model = load_trunk(model, 'alphagenome.pt')
model = prepare_for_transfer(model, TransferConfig(
    mode='lora',
    new_heads={'atac': {'modality': 'atac', 'num_tracks': 1}},
    lora_rank=8,
))

The easiest way to start with fine-tuning is to use scripts/finetune.py that implements a flexible CLI interface:

# LoRA fine-tuning
python scripts/finetune.py --mode lora --lora-rank 8 \
    --genome hg38.fa --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights alphagenome.pt

# Multi-GPU
torchrun --nproc_per_node=4 scripts/finetune.py --mode lora ...

See examples/notebooks/finetuning_gm12878_demo.ipynb for an example of linear probing on ATAC-seq data.

Numerical Parity with JAX

This port is validated against the original JAX model, including per-head and full forward pass output comparisons as well as loss values and gradients.

See a compiled ARCHITECTURE_COMPARISON.md for some technical details.

Model Outputs

Head Tracks Resolutions Description
atac 256 1bp, 128bp Chromatin accessibility
dnase 384 1bp, 128bp DNase-seq
procap 128 1bp, 128bp Transcription initiation
cage 640 1bp, 128bp 5' cap RNA
rnaseq 768 1bp, 128bp RNA expression
chip_tf 1664 128bp TF binding
chip_histone 1152 128bp Histone modifications
contact_maps 28 64×64 3D chromatin contacts
splice_sites 4 1bp Splice site classification (D+, A+, D−, A−)
splice_junctions 734 pairwise Junction read counts (367 tissues × 2 strands)
splice_site_usage 734 1bp Fraction of transcripts using splice site

See more information about model outputs in the official AlphaGenome documentation.

Example Notebooks

Citation

@article{avsec2026alphagenome,
  title={Advancing regulatory variant effect prediction with AlphaGenome},
  author={Avsec, {\v{Z}}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and others},
  journal={Nature},
  volume={649},
  number={8099},
  pages={1206--1218},
  year={2026},
  publisher={Nature Publishing Group UK London}
}
bioRxiv preprint
@article{avsec2025alphagenome,
    title = {AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model},
    author = {Avsec, {\v Z}iga and Latysheva, Natasha and Cheng, Jun and ...},
    year = {2025},
    journal = {bioRxiv},
    doi = {10.1101/2025.06.25.661532}
}

Acknowledgements

We acknowledge Phil Wang, Miquel Anglada-Girotto, and Xinming Tu as developers of an older AlphaGenome PyTorch port unrelated to this repo. Note that the PyPI namespace is now linked to this repo.

License

This project is a port of the google-deepmind/alphagenome_research repository licensed under the Apache License, Version 2.0:

Copyright 2026 Google LLC

The weights are subject to the model terms.

This port is licensed under the Apache License, Version 2.0 (Apache 2.0):

Copyright 2026 Danila Bredikhin, Martin Kjellberg, Christopher Zou, Alejandro Buendia, Xinming Tu, Anshul Kundaje

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this except in compliance with the License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphagenome_pytorch-0.3.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alphagenome_pytorch-0.3.0-py3-none-any.whl (169.9 kB view details)

Uploaded Python 3

File details

Details for the file alphagenome_pytorch-0.3.0.tar.gz.

File metadata

  • Download URL: alphagenome_pytorch-0.3.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for alphagenome_pytorch-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6e7c0b585c2781dedef0787b6c5a5cd4dfd3f1ec3003c3297a69bb8722d68030
MD5 ef18875c84fba5d8dd75e155781f9fbe
BLAKE2b-256 2bf866c48a515ddbd6de782063cfe920b9335525af28af366edfd33a5e6980dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphagenome_pytorch-0.3.0.tar.gz:

Publisher: publish.yml on genomicsxai/alphagenome-pytorch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file alphagenome_pytorch-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for alphagenome_pytorch-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c0623f4f2bf78d5dd7fedce3a16f45aefa5259235e4735564f4c2690790031a
MD5 788ebdcca0616a7a48fd127a76cc249a
BLAKE2b-256 60bbd5f1b9f67e26d0ba202c13819c9466714453f11fbdd260137740ca6411b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphagenome_pytorch-0.3.0-py3-none-any.whl:

Publisher: publish.yml on genomicsxai/alphagenome-pytorch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page