Skip to main content

Fast, light, accurate library for biological sequence embeddings (proteins, DNA, RNA)

Project description

fastembed-bio

Fast, lightweight biological sequence embeddings using ONNX. Built on FastEmbed.

Why fastembed-bio?

  1. Light: No GPU required. No PyTorch. Just ONNX Runtime. Perfect for serverless and resource-constrained environments.

  2. Fast: ONNX Runtime is faster than PyTorch inference. Batch processing and parallelism built-in.

  3. Simple: Same interface patterns as FastEmbed. If you've used FastEmbed for text, you already know how to use this.

Installation

pip install fastembed-bio

Quickstart

from fastembed.bio import ProteinEmbedding

sequences = [
    "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
    "GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAK",
]

model = ProteinEmbedding("facebook/esm2_t12_35M_UR50D")
embeddings = list(model.embed(sequences))

# [
#   array([-0.0055, -0.0144,  0.0355, -0.0049, ...], dtype=float32),
#   array([ 0.0114,  0.0020, -0.0247,  0.0060, ...], dtype=float32)
# ]

Supported Models

Protein Embeddings

Model Parameters Dimensions Description
facebook/esm2_t12_35M_UR50D 35M 480 ESM-2 protein language model
from fastembed.bio import ProteinEmbedding

model = ProteinEmbedding("facebook/esm2_t12_35M_UR50D")
embeddings = list(model.embed(["MKTVRQERLKS", "GKGDPKKPRGK"]))

DNA Embeddings (Coming Soon)

DNABert and similar models for DNA sequence embeddings.

RNA Embeddings (Coming Soon)

RNA foundation models for RNA sequence embeddings.

GPU Support

from fastembed.bio import ProteinEmbedding

model = ProteinEmbedding(
    "facebook/esm2_t12_35M_UR50D",
    providers=["CUDAExecutionProvider"]
)

Requires onnxruntime-gpu instead of onnxruntime.

Relationship to FastEmbed

This project is a community-driven fork of FastEmbed focused on biological sequence embeddings. It uses the same core infrastructure (ONNX models, model management, etc.) but is specialized for proteins, DNA, and RNA.

The goal is to make biological embeddings as accessible and efficient as text embeddings.

Contributing

Contributions welcome! Areas of interest:

  • Additional ESM-2 model sizes
  • DNABert and other DNA models
  • RNA foundation models
  • Performance optimizations

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastembed_bio-0.1.2.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastembed_bio-0.1.2-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file fastembed_bio-0.1.2.tar.gz.

File metadata

  • Download URL: fastembed_bio-0.1.2.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastembed_bio-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ba55af6c74dd2017903addd23174df1537f3c1b5393d5221212bb685c2711910
MD5 0b6dd34c10bba0e0318aa2185638b08d
BLAKE2b-256 b27a462b7ca3c6524f0721e15fdb2704e707c846a90db7e0046a298eab417cf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastembed_bio-0.1.2.tar.gz:

Publisher: python-publish.yml on nleroy917/fastembed-bio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastembed_bio-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: fastembed_bio-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastembed_bio-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dfeed3ba8627ee75ada289ac974ddd6c1ebbe2d6a291a4c539a2273042f1e952
MD5 c6095f6153954d4e0fd3ea811dceca9d
BLAKE2b-256 82eead55f2fb5102990609d0f164668bc82cff5c3ad1390856b4c851544c1fa9

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastembed_bio-0.1.2-py3-none-any.whl:

Publisher: python-publish.yml on nleroy917/fastembed-bio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page