PyTorch implementation of AlphaGenome

Project description

AlphaGenome PyTorch

A PyTorch port of AlphaGenome, the DNA sequence model from Google DeepMind that predicts hundreds of genomic tracks at single base-pair resolution from sequences up to 1M bp.

We strive to make it an accessible, readable, and hackable implementation — for integrating into existing PyTorch pipelines, fine-tuning on custom datasets, and building on top of.

Installation

Installation from PyPI:

pip install alphagenome-pytorch

Installation from repo:

pip install git+https://github.com/genomicsxai/alphagenome-pytorch

For fine-tuning (incl. BigWig data loading):

pip install alphagenome-pytorch[finetuning]  # adds pyBigWig, pyfaidx

Quick Start

import torch
import numpy as np
from alphagenome_pytorch import AlphaGenome

# Load pretrained model
model = AlphaGenome.from_pretrained('alphagenome.pt', device='cuda')

# Create one-hot encoded DNA sequence in NLC format (batch=1, length=131072, channels=4)
# Channels: A=0, C=1, G=2, T=3
sequence = np.random.randint(0, 4, size=(1, 131072))
dna_onehot = torch.tensor(np.eye(4)[sequence], dtype=torch.float32).cuda()

# Inference (handles dtype casting, returns float32 outputs)
outputs = model.predict(dna_onehot, organism_index=0)  # organism: 0=human, 1=mouse

The weights for this port are available on Hugging Face.

Output structure

Each genomic-track head returns a dict mapping resolution → tensor:

outputs['atac'][1]           # (1, 131072, 256)   ATAC-seq  at 1 bp
outputs['atac'][128]         # (1, 1024,   256)   ATAC-seq  at 128 bp
outputs['dnase'][1]          # (1, 131072, 384)   DNase     at 1 bp
outputs['cage'][128]         # (1, 1024,   640)   CAGE      at 128 bp
outputs['chip_histone'][128] # (1, 1024,   1152)  ChIP-hist at 128 bp only

Contact maps are returned as a single tensor (no resolution dict):

outputs['contact_maps']      # (1, 64, 64, 28)   3D chromatin contacts

Splice heads return dicts of tensors:

outputs['splice_sites']['probs']  # (1, 131072, 5)  splice site classes

Padding

Track dimensions are padded (e.g. ATAC has 167 real human tracks but the tensor has 256 channels). Real tracks come first; the rest are zeros. Use named_outputs=True to auto-strip padding:

from alphagenome_pytorch.named_outputs import NamedOutputs, TrackMetadataCatalog
catalog = TrackMetadataCatalog.load_builtin(organism=0)
model.set_track_metadata_catalog(catalog)

named = model.predict(dna_onehot, organism_index=0, named_outputs=True)
named.atac[1].shape                  # (1, 131072, 167)  — padding removed
named.atac[1].tracks[-1].track_name  # 'UBERON:0015143 ATAC-seq'

# Filter by metadata
named.rna_seq[128].select(strand='+')
named.chip_tf[128].select(transcription_factor='CTCF')
named.atac[1].select(biosample_type='tissue', ontology_curie='UBERON:0015143')

Extracting Embeddings

Use model.encode() to get embeddings without running prediction heads — useful for building custom heads or analyzing representations:

# Get embeddings (128bp only for efficiency)
emb = model.encode(dna_onehot, organism_index=0, resolutions=(128,))
emb['embeddings_128bp']  # (B, 1024, 3072) at 128bp

Fine-tuning

Train a new head on your data with frozen trunk (linear probing) or with LoRA adapters:

from alphagenome_pytorch import AlphaGenome, TransferConfig, load_trunk, prepare_for_transfer

# Load trunk, freeze, add custom heads
model = AlphaGenome()
model = load_trunk(model, 'alphagenome.pt')
model = prepare_for_transfer(model, TransferConfig(
    mode='lora',
    new_heads={'atac': {'modality': 'atac', 'num_tracks': 1}},
    lora_rank=8,
))

The easiest way to start with fine-tuning is to use scripts/finetune.py that implements a flexible CLI interface:

# LoRA fine-tuning
python scripts/finetune.py --mode lora --lora-rank 8 \
    --genome hg38.fa --modality atac --bigwig *.bw \
    --train-bed train.bed --val-bed val.bed \
    --pretrained-weights alphagenome.pt

# Multi-GPU
torchrun --nproc_per_node=4 scripts/finetune.py --mode lora ...

See examples/notebooks/finetune_linear_probe.ipynb for an example of linear probing on ATAC-seq data.

Numerical Parity with JAX

This port is validated against the original JAX model, including per-head and full forward pass output comparisons as well as loss values and gradients.

See a compiled ARCHITECTURE_COMPARISON.md for some technical details.

Model Outputs

Head	Tracks (human)	Dimension (padded)	Resolutions	Description
atac	167	256	1bp, 128bp	Chromatin accessibility
dnase	305	384	1bp, 128bp	DNase-seq
procap	12	128	1bp, 128bp	Transcription initiation
cage	546	640	1bp, 128bp	5' cap RNA
rna_seq	667	768	1bp, 128bp	RNA expression
chip_tf	1617	1664	128bp	TF binding
chip_histone	1116	1152	128bp	Histone modifications
contact_maps	28	28	64×64	3D chromatin contacts
splice_sites	5	5	1bp	Splice site classification (D+, A+, D−, A−, none)
splice_junctions	734	734	pairwise	Junction read counts (367 tissues × 2 strands)
splice_site_usage	734	734	1bp	Fraction of transcripts using splice site

Tracks column shows the number of real human tracks (without padding). Dimension is the raw output tensor size — padding fills the gap. When using named_outputs=True, padding is stripped by default. See named outputs guide for details.

See more information about model outputs in the official AlphaGenome documentation.

Example Notebooks

Demo — Basic inference and JAX comparison
Variant Scoring — Effect prediction
In Silico Mutagenesis — ISM analysis
TAL1 Mutation Example - TAL1 variant effect and ISM (Figure 6 from AlphaGenome)
Fine-tuning — ATAC-seq linear probing
Fine-tuning — MPRA (encoder-only)

Citation

@article{avsec2026alphagenome,
  title={Advancing regulatory variant effect prediction with AlphaGenome},
  author={Avsec, {\v{Z}}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and others},
  journal={Nature},
  volume={649},
  number={8099},
  pages={1206--1218},
  year={2026},
  publisher={Nature Publishing Group UK London}
}

bioRxiv preprint

@article{avsec2025alphagenome,
    title = {AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model},
    author = {Avsec, {\v Z}iga and Latysheva, Natasha and Cheng, Jun and ...},
    year = {2025},
    journal = {bioRxiv},
    doi = {10.1101/2025.06.25.661532}
}

Acknowledgements

We acknowledge Phil Wang, Miquel Anglada-Girotto, and Xinming Tu as developers of an older AlphaGenome PyTorch port unrelated to this repo. Note that the PyPI namespace is now linked to this repo.

License

This project is a port of the google-deepmind/alphagenome_research repository licensed under the Apache License, Version 2.0:

Copyright 2026 Google LLC

The model parameters, output, and any derivatives thereof remain subject to Google DeepMind’s AlphaGenome Model Terms.

This port is licensed under the Apache License, Version 2.0 (Apache 2.0):

Copyright 2026 Danila Bredikhin, Martin Kjellberg, Christopher Zou, Alejandro Buendia, Xinming Tu, Anshul Kundaje

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this except in compliance with the License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details

Release history Release notifications | RSS feed

This version

0.3.1

Apr 17, 2026

0.3.0

Mar 11, 2026

0.2.8

Feb 4, 2026

0.2.7

Feb 2, 2026

0.2.6

Feb 1, 2026

0.2.5

Feb 1, 2026

0.2.4

Jan 22, 2026

0.2.3

Jan 21, 2026

0.2.2

Jan 21, 2026

0.2.1

Jan 20, 2026

0.2.0

Jan 19, 2026

0.1.1

Aug 4, 2025

0.1.0

Jul 24, 2025

0.0.40

Jul 8, 2025

0.0.39

Jul 2, 2025

0.0.38

Jul 2, 2025

0.0.37

Jul 2, 2025

0.0.36

Jul 1, 2025

0.0.35

Jul 1, 2025

0.0.34

Jul 1, 2025

0.0.33

Jul 1, 2025

0.0.32

Jul 1, 2025

0.0.31

Jul 1, 2025

0.0.30

Jul 1, 2025

0.0.29

Jul 1, 2025

0.0.28

Jul 1, 2025

0.0.27

Jul 1, 2025

0.0.26

Jun 30, 2025

0.0.25

Jun 30, 2025

0.0.24

Jun 30, 2025

0.0.23

Jun 30, 2025

0.0.22

Jun 30, 2025

0.0.21

Jun 30, 2025

0.0.20

Jun 29, 2025

0.0.19

Jun 29, 2025

0.0.18

Jun 29, 2025

0.0.17

Jun 29, 2025

0.0.16

Jun 29, 2025

0.0.15

Jun 29, 2025

0.0.14

Jun 28, 2025

0.0.12

Jun 28, 2025

0.0.11

Jun 28, 2025

0.0.10

Jun 28, 2025

0.0.9

Jun 28, 2025

0.0.8

Jun 27, 2025

0.0.7

Jun 27, 2025

0.0.6

Jun 27, 2025

0.0.5

Jun 27, 2025

0.0.4

Jun 27, 2025

0.0.3

Jun 27, 2025

0.0.2

Jun 27, 2025

0.0.1

Jun 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphagenome_pytorch-0.3.1.tar.gz (2.4 MB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alphagenome_pytorch-0.3.1-py3-none-any.whl (190.1 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file alphagenome_pytorch-0.3.1.tar.gz.

File metadata

Download URL: alphagenome_pytorch-0.3.1.tar.gz
Upload date: Apr 17, 2026
Size: 2.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alphagenome_pytorch-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`5a22de9b4aaad6c3798d8e752d10580e4cb0d6ce4b53e3d822c35a8bb00b04a3`
MD5	`29d55ba4dd8f3a43507698b742b86a80`
BLAKE2b-256	`823acd1577a6d33de806207c8be7c1722780c33b7e2617c64dcf24e8323f9310`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphagenome_pytorch-0.3.1.tar.gz:

Publisher: publish.yml on genomicsxai/alphagenome-pytorch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alphagenome_pytorch-0.3.1.tar.gz
- Subject digest: 5a22de9b4aaad6c3798d8e752d10580e4cb0d6ce4b53e3d822c35a8bb00b04a3
- Sigstore transparency entry: 1331769131
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: genomicsxai/alphagenome-pytorch@0f1bd2e78948d30aa2511cdf98a2b14f53c9c2d1
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/genomicsxai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0f1bd2e78948d30aa2511cdf98a2b14f53c9c2d1
- Trigger Event: push

File details

Details for the file alphagenome_pytorch-0.3.1-py3-none-any.whl.

File metadata

Download URL: alphagenome_pytorch-0.3.1-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 190.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for alphagenome_pytorch-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a9530b4d326c6a359df0f8faed4073208fba04a2fecb7c38573591556ad83cb`
MD5	`df1b6d89a6f5d78a691f273eaff74c3d`
BLAKE2b-256	`d3489e82423976ad7283a452163e2b5bd4b7d1b2a3842612324941419e97a890`

See more details on using hashes here.

Provenance

The following attestation bundles were made for alphagenome_pytorch-0.3.1-py3-none-any.whl:

Publisher: publish.yml on genomicsxai/alphagenome-pytorch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: alphagenome_pytorch-0.3.1-py3-none-any.whl
- Subject digest: 0a9530b4d326c6a359df0f8faed4073208fba04a2fecb7c38573591556ad83cb
- Sigstore transparency entry: 1331769231
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: genomicsxai/alphagenome-pytorch@0f1bd2e78948d30aa2511cdf98a2b14f53c9c2d1
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/genomicsxai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0f1bd2e78948d30aa2511cdf98a2b14f53c9c2d1
- Trigger Event: push

alphagenome-pytorch 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

AlphaGenome PyTorch

Installation

Quick Start

Output structure

Padding

Extracting Embeddings

Fine-tuning

Numerical Parity with JAX

Model Outputs

Example Notebooks

Citation

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance