Fast, light, accurate library for biological sequence embeddings (proteins, DNA, RNA)
Project description
fastembed-bio
Fast, lightweight biological sequence embeddings using ONNX. Built on FastEmbed.
Why fastembed-bio?
-
Light: No GPU required. No PyTorch. Just ONNX Runtime. Perfect for serverless and resource-constrained environments.
-
Fast: ONNX Runtime is faster than PyTorch inference. Batch processing and parallelism built-in.
-
Simple: Same interface patterns as FastEmbed. If you've used FastEmbed for text, you already know how to use this.
Installation
pip install fastembed-bio
Quickstart
from fastembed.bio import ProteinEmbedding
sequences = [
"MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
"GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAK",
]
model = ProteinEmbedding("facebook/esm2_t12_35M_UR50D")
embeddings = list(model.embed(sequences))
# [
# array([-0.0055, -0.0144, 0.0355, -0.0049, ...], dtype=float32),
# array([ 0.0114, 0.0020, -0.0247, 0.0060, ...], dtype=float32)
# ]
Supported Models
Protein Embeddings
| Model | Parameters | Dimensions | Description |
|---|---|---|---|
facebook/esm2_t12_35M_UR50D |
35M | 480 | ESM-2 protein language model |
from fastembed.bio import ProteinEmbedding
model = ProteinEmbedding("facebook/esm2_t12_35M_UR50D")
embeddings = list(model.embed(["MKTVRQERLKS", "GKGDPKKPRGK"]))
DNA Embeddings (Coming Soon)
DNABert and similar models for DNA sequence embeddings.
RNA Embeddings (Coming Soon)
RNA foundation models for RNA sequence embeddings.
GPU Support
from fastembed.bio import ProteinEmbedding
model = ProteinEmbedding(
"facebook/esm2_t12_35M_UR50D",
providers=["CUDAExecutionProvider"]
)
Requires onnxruntime-gpu instead of onnxruntime.
Relationship to FastEmbed
This project is a community-driven fork of FastEmbed focused on biological sequence embeddings. It uses the same core infrastructure (ONNX models, model management, etc.) but is specialized for proteins, DNA, and RNA.
The goal is to make biological embeddings as accessible and efficient as text embeddings.
Contributing
Contributions welcome! Areas of interest:
- Additional ESM-2 model sizes
- DNABert and other DNA models
- RNA foundation models
- Performance optimizations
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastembed_bio-0.1.2.tar.gz.
File metadata
- Download URL: fastembed_bio-0.1.2.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba55af6c74dd2017903addd23174df1537f3c1b5393d5221212bb685c2711910
|
|
| MD5 |
0b6dd34c10bba0e0318aa2185638b08d
|
|
| BLAKE2b-256 |
b27a462b7ca3c6524f0721e15fdb2704e707c846a90db7e0046a298eab417cf7
|
Provenance
The following attestation bundles were made for fastembed_bio-0.1.2.tar.gz:
Publisher:
python-publish.yml on nleroy917/fastembed-bio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastembed_bio-0.1.2.tar.gz -
Subject digest:
ba55af6c74dd2017903addd23174df1537f3c1b5393d5221212bb685c2711910 - Sigstore transparency entry: 843636933
- Sigstore integration time:
-
Permalink:
nleroy917/fastembed-bio@20146913f6f4c23513c7a2d45583c17ab44ac0ad -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/nleroy917
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@20146913f6f4c23513c7a2d45583c17ab44ac0ad -
Trigger Event:
release
-
Statement type:
File details
Details for the file fastembed_bio-0.1.2-py3-none-any.whl.
File metadata
- Download URL: fastembed_bio-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfeed3ba8627ee75ada289ac974ddd6c1ebbe2d6a291a4c539a2273042f1e952
|
|
| MD5 |
c6095f6153954d4e0fd3ea811dceca9d
|
|
| BLAKE2b-256 |
82eead55f2fb5102990609d0f164668bc82cff5c3ad1390856b4c851544c1fa9
|
Provenance
The following attestation bundles were made for fastembed_bio-0.1.2-py3-none-any.whl:
Publisher:
python-publish.yml on nleroy917/fastembed-bio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastembed_bio-0.1.2-py3-none-any.whl -
Subject digest:
dfeed3ba8627ee75ada289ac974ddd6c1ebbe2d6a291a4c539a2273042f1e952 - Sigstore transparency entry: 843636983
- Sigstore integration time:
-
Permalink:
nleroy917/fastembed-bio@20146913f6f4c23513c7a2d45583c17ab44ac0ad -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/nleroy917
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@20146913f6f4c23513c7a2d45583c17ab44ac0ad -
Trigger Event:
release
-
Statement type: