Skip to main content

Fast, light, accurate library for biological sequence embeddings (proteins, DNA, RNA)

Project description

fastembed-bio

Fast, lightweight biological sequence embeddings using ONNX. Built on FastEmbed.

Why fastembed-bio?

  1. Light: No GPU required. No PyTorch. Just ONNX Runtime. Perfect for serverless and resource-constrained environments.

  2. Fast: ONNX Runtime is faster than PyTorch inference. Batch processing and parallelism built-in.

  3. Simple: Same interface patterns as FastEmbed. If you've used FastEmbed for text, you already know how to use this.

Installation

pip install fastembed-bio

Quickstart

from fastembed.bio import ProteinEmbedding

sequences = [
    "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
    "GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAK",
]

model = ProteinEmbedding("facebook/esm2_t12_35M_UR50D")
embeddings = list(model.embed(sequences))

# [
#   array([-0.0055, -0.0144,  0.0355, -0.0049, ...], dtype=float32),
#   array([ 0.0114,  0.0020, -0.0247,  0.0060, ...], dtype=float32)
# ]

Supported Models

Protein Embeddings

Model Parameters Dimensions Description
facebook/esm2_t12_35M_UR50D 35M 480 ESM-2 protein language model
from fastembed.bio import ProteinEmbedding

model = ProteinEmbedding("facebook/esm2_t12_35M_UR50D")
embeddings = list(model.embed(["MKTVRQERLKS", "GKGDPKKPRGK"]))

DNA Embeddings (Coming Soon)

DNABert and similar models for DNA sequence embeddings.

RNA Embeddings (Coming Soon)

RNA foundation models for RNA sequence embeddings.

GPU Support

from fastembed.bio import ProteinEmbedding

model = ProteinEmbedding(
    "facebook/esm2_t12_35M_UR50D",
    providers=["CUDAExecutionProvider"]
)

Requires onnxruntime-gpu instead of onnxruntime.

Relationship to FastEmbed

This project is a community-driven fork of FastEmbed focused on biological sequence embeddings. It uses the same core infrastructure (ONNX models, model management, etc.) but is specialized for proteins, DNA, and RNA.

The goal is to make biological embeddings as accessible and efficient as text embeddings.

Contributing

Contributions welcome! Areas of interest:

  • Additional ESM-2 model sizes
  • DNABert and other DNA models
  • RNA foundation models
  • Performance optimizations

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastembed_bio-0.1.1.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastembed_bio-0.1.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file fastembed_bio-0.1.1.tar.gz.

File metadata

  • Download URL: fastembed_bio-0.1.1.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastembed_bio-0.1.1.tar.gz
Algorithm Hash digest
SHA256 725110716bb6c509de2b521dcdd590ecd16954d067854f1faedfff267ea8d025
MD5 eee54a68cd716b1914410e8a64aedf99
BLAKE2b-256 563ea6cfecebfb1005d8e3b5b6b2b1a0c0f8bd06d2fee2aa0f81eb6f2d58dca0

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastembed_bio-0.1.1.tar.gz:

Publisher: python-publish.yml on nleroy917/fastembed-bio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastembed_bio-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fastembed_bio-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastembed_bio-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 efb4bb9dccd51540effe90ea2add4c5116b1d190b6e483b5e0c91c606bd0cd87
MD5 e4f09968edd28a1bb448b063a6c8169c
BLAKE2b-256 f83f8078f6b5bd7304899f798318226a5bcaf4fc818a05da322ddc4a9071d709

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastembed_bio-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on nleroy917/fastembed-bio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page