A package for biological embeddings in the perturbation experimental space
Project description
embpy
embpy is a Python toolkit for generating biological embeddings with one unified API.
Use it to embed genes, proteins, small molecules, morphology perturbations, and single cells; annotate the resulting objects; and compare embeddings with scverse-friendly plotting and analysis utilities.
What embpy Does
- Embeds biological entities through
BioEmbedder.embed(...). - Resolves biological identifiers into model-ready inputs, such as gene sequences, protein sequences, SMILES strings, and morphology images.
- Returns AnnData, tables, or payloads with provenance and canonical IDs.
- Stores generated embeddings outside
.X, using.obsm,.varm, or.unsaccording to the entity type. - Adds real metadata annotations for genes, proteins, molecules, and cell lines.
- Provides plotting and comparison helpers for embedding quality checks.
Install
Pixi is recommended for development and GPU work:
pixi install -e default
pixi run -e default verify
For a pip install:
pip install embpy
For optional GPU/model extras, see the technical guide.
Quick Start
from embpy import BioEmbedder
embedder = BioEmbedder(device="auto", organism="human")
Embed genes with multiple model families:
gene_adata = embedder.embed(
["TP53", "EGFR", "MYC"],
entity_type="gene",
id_type="symbol",
model=["hyenadna_tiny_1k", "esm2_8M", "minilm_l6_v2"],
output="anndata",
)
gene_adata.varm.keys()
gene_adata.uns["embeddings"].keys()
Embed gene perturbation labels as row-aligned action embeddings:
# pert_adata.obs["perturbation"] contains symbols such as TP53/MYC.
pert_adata = embedder.embed(
pert_adata,
entity_type="gene",
obs_column="perturbation",
id_type="symbol",
model="esm2_650M",
output="anndata",
is_perturbation=True,
key="X_pert_esm2_650M",
)
pert_adata.obsm["X_pert_esm2_650M"]
Embed proteins:
protein_adata = embedder.embed(
["TP53", "EGFR", "BRCA1"],
entity_type="protein",
id_type="symbol",
model="esm2_8M",
output="anndata",
)
Embed small molecules:
smiles = [
"CC(=O)OC1=CC=CC=C1C(=O)O", # aspirin
"Cn1cnc2c1c(=O)n(C)c(=O)n2C", # caffeine
]
molecule_adata = embedder.embed(
smiles,
entity_type="molecule",
id_type="smiles",
model="morgan_fp",
output="anndata",
key="X_morgan_fp",
)
Embed cells from AnnData with model-aware preprocessing:
cell_adata = embedder.embed(
adata,
entity_type="cell",
model="pca",
preprocessing="auto",
output="anndata",
key="X_pca",
)
cell_adata.obsm["X_pca"]
cell_adata.uns["embpy_cell_embeddings"]
Annotate and plot:
from embpy import tl, pl
molecule_adata.obs["smiles"] = molecule_adata.obs_names
molecule_adata = tl.annotate_molecules(
molecule_adata,
column="smiles",
sources=["structural", "bioactivity", "ontology"],
)
pl.plot_embedding_space(
molecule_adata,
obsm_key="X_morgan_fp",
method="pca",
color="mol_logp",
)
Tutorials
The tutorials are organized by biological entity:
Each notebook uses real BioEmbedder.embed(...) calls, real annotation APIs,
and embpy plotting/comparison utilities.
Model Families
embpy supports models across:
- DNA and regulatory sequence models
- protein language and structure models
- small-molecule fingerprints and chemical language models
- single-cell foundation models and classical baselines
- morphology models for HPA and JUMP-style images
- text models for biological descriptions
Use:
embedder.list_available_models()
for the model keys available in your environment.
Output Contract
BioEmbedder.embed(...) follows a scverse-friendly output contract:
- genes are feature-like and live in
.varmby default - gene perturbation labels use
is_perturbation=Trueand live in.obsm - proteins are feature-like and live in
.varm - molecules, text, sequences, and cells are observation-like and live in
.obsm - perturbation/action embeddings can be kept entity-aligned in
.uns .Xremains expression/count-like data or a sparse placeholder
See the technical guide for the full contract.
Documentation
- API reference: per-function reference generated from docstrings
- Technical guide: output contract, install matrix, package layout, and developer notes
- Contributing
- Changelog
Citation
If you use embpy in your work, please cite the repository for now. A formal citation will be added when the package is released.
Contact
For questions, issues, or feature requests, open a GitHub issue or contact the maintainers listed in the package metadata.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embpy-0.1.1.tar.gz.
File metadata
- Download URL: embpy-0.1.1.tar.gz
- Upload date:
- Size: 3.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
139d7a9dffe1511eab9fefeb471223a2229a38a35fc6382acddcb294723ea6ba
|
|
| MD5 |
d2265b5b6338fcb912bcbcc8fe3f1658
|
|
| BLAKE2b-256 |
2649c02fef5d48ce876ec4208e710989967546019bfb9cb881c0a38c664fb8e9
|
Provenance
The following attestation bundles were made for embpy-0.1.1.tar.gz:
Publisher:
release.yaml on theislab/embpy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embpy-0.1.1.tar.gz -
Subject digest:
139d7a9dffe1511eab9fefeb471223a2229a38a35fc6382acddcb294723ea6ba - Sigstore transparency entry: 1998023119
- Sigstore integration time:
-
Permalink:
theislab/embpy@6fcfb688e151a2279130bc6349009cfc8d916299 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/theislab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@6fcfb688e151a2279130bc6349009cfc8d916299 -
Trigger Event:
release
-
Statement type:
File details
Details for the file embpy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: embpy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 406.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c06f8e1e6600d3ebb4b5163b7298ab9b705dab2efe13133e7a7f6fba7b81e23
|
|
| MD5 |
db00d35883739d6a7efedcca4324adf4
|
|
| BLAKE2b-256 |
c8369df43c00cafb1b0cafbcb28a6cdfdbd0af443c6f43b96347416c1ad7f8c0
|
Provenance
The following attestation bundles were made for embpy-0.1.1-py3-none-any.whl:
Publisher:
release.yaml on theislab/embpy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
embpy-0.1.1-py3-none-any.whl -
Subject digest:
6c06f8e1e6600d3ebb4b5163b7298ab9b705dab2efe13133e7a7f6fba7b81e23 - Sigstore transparency entry: 1998023196
- Sigstore integration time:
-
Permalink:
theislab/embpy@6fcfb688e151a2279130bc6349009cfc8d916299 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/theislab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@6fcfb688e151a2279130bc6349009cfc8d916299 -
Trigger Event:
release
-
Statement type: