Skip to main content

Unified interface for genomic foundation model embeddings

Project description

genebeddings

Unified interface for extracting embeddings from genomic foundation models.

Overview

genebeddings provides:

  • Standardized wrappers for 16 genomic foundation models (transformers, CNNs, state-space models, track predictors)
  • Geometric analysis tools for single-variant and epistasis embeddings
  • Embedding storage via SQLite key-value store
  • Benchmarking utilities for pathogenicity prediction

Installation

pip install -e .

Install with model-specific dependencies:

pip install -e ".[nt]"          # Nucleotide Transformer
pip install -e ".[borzoi]"      # Borzoi
pip install -e ".[alphagenome]" # AlphaGenome (requires JAX + GPU)
pip install -e ".[all]"         # Common models

Quick Start

Embeddings

from genebeddings.wrappers import NTWrapper

model = NTWrapper()
embedding = model.embed("ACGTACGT" * 100, pool="mean")  # (hidden_dim,) numpy array

Nucleotide Predictions

probs = model.predict_nucleotides("ACGTNACGT", positions=[4])
# [{'A': 0.1, 'C': 0.2, 'G': 0.3, 'T': 0.4}]

Track Predictions

from genebeddings.wrappers import BorzoiWrapper

borzoi = BorzoiWrapper()
tracks = borzoi.predict_tracks("ACGT" * 131_072)  # (num_tracks, length) numpy array

Variant Geometry

from genebeddings import SingleVariantGeometry

geom = SingleVariantGeometry(wt_embedding, mut_embedding)
print(geom.cosine_distance, geom.euclidean_distance)

Supported Models

Wrapper Architecture Max Input Capabilities
AlphaGenomeWrapper Encoder-Transformer-Decoder (JAX) 1M bp embed, tracks, variants
BorzoiWrapper CNN (PyTorch) 524K bp embed, tracks
CaduceusWrapper Bidirectional SSM ~131K tokens embed, nucleotides
DNABERTWrapper Transformer (BPE) Model-dep. embed, nucleotides
Evo2Wrapper SSM Very long embed, nucleotides, generate
GPNMSAWrapper Transformer + MSA Model-dep. embed (MSA), nucleotides
HyenaDNAWrapper Hyena SSM Up to 1M bp embed
NTWrapper Transformer (k-mer) Long embed, nucleotides
RiNALMoWrapper Transformer (RNA) Model-dep. embed, nucleotides
SpliceAIWrapper CNN Model-dep. embed, splice sites
SpliceBertWrapper Transformer Model-dep. embed, nucleotides

See wrappers/summary.md for full details.

Testing

python quick_test.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genebeddings-0.1.0.tar.gz (139.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genebeddings-0.1.0-py3-none-any.whl (169.1 kB view details)

Uploaded Python 3

File details

Details for the file genebeddings-0.1.0.tar.gz.

File metadata

  • Download URL: genebeddings-0.1.0.tar.gz
  • Upload date:
  • Size: 139.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for genebeddings-0.1.0.tar.gz
Algorithm Hash digest
SHA256 82ac8db9164161878d9318b7e1566dc0599df56d5641b58f36a30288dc6530e6
MD5 07bbaddb19d3c2ed10ba4b5cc47f9916
BLAKE2b-256 e23bf1f89e2f631a71b7c82d3d5cb3241c4a49cc64f6d155eb35e13ddb382d01

See more details on using hashes here.

File details

Details for the file genebeddings-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: genebeddings-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 169.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.10

File hashes

Hashes for genebeddings-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 00c404718640887aa999949a781651ef3a92b2f80a96b392e5137db1bd593803
MD5 288b11861f32db1dc49f5641b7084fc2
BLAKE2b-256 1785add2fc28044d61b2a0b4a0bc7df93fe8fb3927ae8c20b088b94beb92a100

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page