Unified interface for genomic foundation model embeddings
Project description
genebeddings
Unified interface for extracting embeddings from genomic foundation models.
Overview
genebeddings provides:
- Standardized wrappers for 16 genomic foundation models (transformers, CNNs, state-space models, track predictors)
- Geometric analysis tools for single-variant and epistasis embeddings
- Embedding storage via SQLite key-value store
- Benchmarking utilities for pathogenicity prediction
Installation
pip install -e .
Install with model-specific dependencies:
pip install -e ".[nt]" # Nucleotide Transformer
pip install -e ".[borzoi]" # Borzoi
pip install -e ".[alphagenome]" # AlphaGenome (requires JAX + GPU)
pip install -e ".[all]" # Common models
Quick Start
Embeddings
from genebeddings.wrappers import NTWrapper
model = NTWrapper()
embedding = model.embed("ACGTACGT" * 100, pool="mean") # (hidden_dim,) numpy array
Nucleotide Predictions
probs = model.predict_nucleotides("ACGTNACGT", positions=[4])
# [{'A': 0.1, 'C': 0.2, 'G': 0.3, 'T': 0.4}]
Track Predictions
from genebeddings.wrappers import BorzoiWrapper
borzoi = BorzoiWrapper()
tracks = borzoi.predict_tracks("ACGT" * 131_072) # (num_tracks, length) numpy array
Variant Geometry
from genebeddings import SingleVariantGeometry
geom = SingleVariantGeometry(wt_embedding, mut_embedding)
print(geom.cosine_distance, geom.euclidean_distance)
Supported Models
| Wrapper | Architecture | Max Input | Capabilities |
|---|---|---|---|
| AlphaGenomeWrapper | Encoder-Transformer-Decoder (JAX) | 1M bp | embed, tracks, variants |
| BorzoiWrapper | CNN (PyTorch) | 524K bp | embed, tracks |
| CaduceusWrapper | Bidirectional SSM | ~131K tokens | embed, nucleotides |
| DNABERTWrapper | Transformer (BPE) | Model-dep. | embed, nucleotides |
| Evo2Wrapper | SSM | Very long | embed, nucleotides, generate |
| GPNMSAWrapper | Transformer + MSA | Model-dep. | embed (MSA), nucleotides |
| HyenaDNAWrapper | Hyena SSM | Up to 1M bp | embed |
| NTWrapper | Transformer (k-mer) | Long | embed, nucleotides |
| RiNALMoWrapper | Transformer (RNA) | Model-dep. | embed, nucleotides |
| SpliceAIWrapper | CNN | Model-dep. | embed, splice sites |
| SpliceBertWrapper | Transformer | Model-dep. | embed, nucleotides |
See wrappers/summary.md for full details.
Testing
python quick_test.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
genebeddings-0.1.0.tar.gz
(139.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
genebeddings-0.1.0-py3-none-any.whl
(169.1 kB
view details)
File details
Details for the file genebeddings-0.1.0.tar.gz.
File metadata
- Download URL: genebeddings-0.1.0.tar.gz
- Upload date:
- Size: 139.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82ac8db9164161878d9318b7e1566dc0599df56d5641b58f36a30288dc6530e6
|
|
| MD5 |
07bbaddb19d3c2ed10ba4b5cc47f9916
|
|
| BLAKE2b-256 |
e23bf1f89e2f631a71b7c82d3d5cb3241c4a49cc64f6d155eb35e13ddb382d01
|
File details
Details for the file genebeddings-0.1.0-py3-none-any.whl.
File metadata
- Download URL: genebeddings-0.1.0-py3-none-any.whl
- Upload date:
- Size: 169.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00c404718640887aa999949a781651ef3a92b2f80a96b392e5137db1bd593803
|
|
| MD5 |
288b11861f32db1dc49f5641b7084fc2
|
|
| BLAKE2b-256 |
1785add2fc28044d61b2a0b4a0bc7df93fe8fb3927ae8c20b088b94beb92a100
|