Skip to main content

An implementation of AlphaGenome in PyTorch.

Project description

AlphaGenome_PyTorch

*Image generated by Nano Banana 2*

Docs

The docs directory contains instructions on environment setup, explanations of the data structure and model architecture, and running examples. It's strongly recommended to read the model.md and data.md markdown files in the */AlphaGenome_PyTorch/docs/guides directory before running examples so that you can understand why the metadata and dummy data are structured the way they are.

Quick Start

Very easy quick start to get embeddings:

import torch, random
from alphagenome_pt import AlphaGenome, AlphaGenomeConfig, DataBatch, SequenceEncoder
S = 2048
metadata = {'organisms': ['human', 'mouse']}
model_cfg = AlphaGenomeConfig(max_seq_len=S, num_channels=96, metadata=metadata)
model = AlphaGenome(model_cfg)
seq_encoder = SequenceEncoder()
dna_sequence = seq_encoder.encode("".join(random.choices("ACGT", k=S)))
data = DataBatch(dna_sequence=dna_sequence, organism_index=torch.tensor([0]))
predictions, embeddings = model(data)
print(embeddings.embeddings_1bp.shape, embeddings.embeddings_1bp.shape, embeddings.embeddings_pair.shape)

Environment

See */AlphaGenome_PyTorch/docs/environment for instructions on how to set up a UV environment to run AlphaGenome_PyTorch.

Guides

See */AlphaGenome_PyTorch/docs/guides for explanations on the AlphaGenome model and its data structure (very helpful for understanding examples).

Examples

See */AlphaGenome_PyTorch/docs/examples for examples of:

  • Masked Language Modeling (MLM) training (train_mlm.py)
  • Training on Downstream Tasks (RNA-Seq, CAGE, ATAC, Splice Sites Classification/Usage/Junction) (train_downstream.py)
  • MLM Pretraining --> Training on Downstream Tasks (train_downstream_from_pretrained.py)

Acknowledgements

This repository is a reimplementation of the AlphaGenome model in PyTorch, with an added option for Masked Language Modeling (MLM).

Within the alphagenome_pt directory, some components are direct ports of the released AlphaGenome code Link1 Link2 (licensed under Apache License 2.0), some are reimplementations based on pseudocode from the BioArXiV paper, and others are original additions (e.g., the MLM head). Attribution is made clear at the top of each file in the alphagenome_pt directory. The docs and tests directories are original work (with LLM coding assistance).

Developing this project used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

Intended Audience

This intended audience of this repo is model trainers: those who might want to take the AlphaGenome architecture and train it in a way that gives them some flexibility over hyperparameters, and/or do the training in PyTorch rather than Jax. If you can prepare a batch of tensor data and set up a train/val/test loop, but don't want the hassle of making sure every linear layer and norm is placed correctly while replicating the architecture, then this repo is for you. The added MLM pretraining head is also a plus.

Other Implementations

There is another AlphaGenome PyTorch implementation out here by Phillip Wang (a.k.a. LucidRains) which is quite good. The GitHub page is down as of March 2nd, 2026, but the PyPi package remains. The main advantages of that implementation (as of version 0.2.8) are in evaluation (loading the published weights and running variant scoring). The main advantage of this implementation is research training (an MLM head and track masks that can vary by batch in training). This implementation also has a .loss() function in the model to compute multi-resolution losses for you, and one head per task with dense weight tensor of shape [O, D, T] rather than separate weights tensors of shape [D, T] for each [organism x task], which is mathematically equivalent but more in-line with the original AlphaGenome implementation.

Reaching Out

Want a new feature or find a bug? Feel free to leave an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alphagenome_pt-0.1.0.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alphagenome_pt-0.1.0-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file alphagenome_pt-0.1.0.tar.gz.

File metadata

  • Download URL: alphagenome_pt-0.1.0.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for alphagenome_pt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b87a86bd8a3a5818d582b4dcd070b2fd7dce97b2f91db4a24955ea7ffe244b4
MD5 4f81a53bda5231e053daeddb0e79108e
BLAKE2b-256 9009bcfb5dfacb15e5606afb8d8538a02e3ff7c5118ac48ae835b28e2d6b81bf

See more details on using hashes here.

File details

Details for the file alphagenome_pt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: alphagenome_pt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for alphagenome_pt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8f4a19e4f6dbc6bd1182a613c2d80e98a1be716c1ae6a89ac8d79b8c218157f
MD5 617885fd5edd0880c8cbb2ded91f748d
BLAKE2b-256 0ba28397f5ebfcab1d5f710b612ca9806f8d3459c2859dccf4ce1f26e2190524

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page