Skip to main content

Static sentence embedding via Rust + Candle wrapped for Python

Project description

static_embed

static_embed is an educational (but fully operatioal) library and and repository that shows how to use Static Embedding with Rust. Its a library written in Rust (Candle + tokenizers) and exported to Python via PyO3 and maturin.

Features

  • Pure-Rust embedding implementation (no Python runtime dependencies at inference time)
  • High performance embeddings
  • CPU-only, self-contained model weights (downloaded on first use)
  • Python bindings that expose an easy-to-use Embedder class

Installation (for users)

Prerequisites

  • Python ≥ 3.8 (CPython)
  • A Rust toolchain (stable) – install with rustup

Quick install into the current virtual env

pip install maturin  # once
# From the repository root
maturin develop --release

This builds the Rust crate as a Python extension and installs it into the environment.


Usage

from static_embed import Embedder

# 1. Use the default public model (no args)
embedder = Embedder()

# 2. OR specify your own base-URL that hosts the weights/tokeniser
#    (must contain the same two files: ``model.safetensors`` & ``tokenizer.json``)
# custom_url = "https://my-cdn.example.com/static-retrieval-mrl-en-v1"
# embedder = Embedder(custom_url)

texts = ["Hello world!", "Rust + Python via PyO3"]
embeddings = embedder.embed(texts)

print(len(embeddings), "embeddings", "dimension", len(embeddings[0]))

Development workflow

  1. Set up a virtual environment
python -m venv .venv && source .venv/bin/activate
pip install maturin pytest
  1. Build the extension in editable mode
maturin develop                      # or `--release` for optimised builds
  1. Run the Python example
python python_example.py
  1. Run the test-suite
pytest -q

Project layout

.
├── Cargo.toml         # Rust crate manifest
├── src/               # Rust source (embedder logic + PyO3 bindings)
├── models/            # (Auto-downloaded) model weights live here
├── python_example.py  # Minimal demo script
├── test_embed.py      # Pytest verifying the binding
└── pyproject.toml     # Build configuration for maturin

Publishing to PyPI

(Requires an API token in $POETRY_PYPI_TOKEN_PYPI or ~/.pypirc)

maturin build --release --skip-auditwheel  # wheels in target/wheels/
maturin publish --skip-existing           # upload

License

MIT © Your Name

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

static_embed-0.1.1-cp38-abi3-win_amd64.whl (2.6 MB view details)

Uploaded CPython 3.8+Windows x86-64

static_embed-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

static_embed-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (2.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file static_embed-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for static_embed-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 81edc1d420143d10f3767941624d551142397e63ec2c5c2484e11d19ca00131c
MD5 cbc869f931c75a0f0e0fa75164143abf
BLAKE2b-256 2b42199c81bee92509e5c567cb06747921ce67f66f2234c60b0faed863aa9ca2

See more details on using hashes here.

File details

Details for the file static_embed-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for static_embed-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e3517494c2bc5cb077c1dfd747d7b257a0ed75b4607d9b013f9e4583d3ba8fed
MD5 8f220984aa00c531709c8735e5217e0d
BLAKE2b-256 d49b1fa1ee3c02860f302982d9d53492953b1397ff58b4b7208a1c355c8146aa

See more details on using hashes here.

File details

Details for the file static_embed-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for static_embed-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 694f8600a595cf3de195f6f9225be2d30e1555123738948e85ffd4bc47ac1d5b
MD5 20b3503b9f54434a06f6ae5add6a9842
BLAKE2b-256 bdaf4279674f78af16e9dd90ca75e2ebf33d01697d975e94fe4fff2fc1ae8833

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page