Skip to main content

rapid fuzzy string matching

Project description

rustfuzz logo

PyPI version Docs Tests MIT License Rust powered Built by AI


[!WARNING] 🚧 Under Heavy Construction

This library is actively being developed and APIs may change between releases. We're shipping fast — expect frequent updates, new features, and occasional breaking changes. Pin your version if stability matters to you: pip install rustfuzz==0.1.12


🤖 This project was built entirely by AI.

The idea was simple: could an AI agent beat RapidFuzz — one of the fastest fuzzy matching libraries in the world — by writing a Rust-backed Python library from scratch, guided only by benchmarks?

The development loop was: Research → Build → Benchmark → Repeat.


rustfuzz is a blazing-fast fuzzy string matching library for Python — implemented entirely in Rust. 🚀

Zero Python overhead. Memory safe. Pre-compiled wheels for every major platform.

Features

Blazing Fast Core algorithms written in Rust — no Python overhead, no GIL bottlenecks
🧠 Smart Matching Ratio, partial ratio, token sort/set, Levenshtein, Jaro-Winkler, and more
🔒 Memory Safe Rust's borrow checker guarantees — no segfaults, no buffer overflows
🐍 Pythonic API Clean, typed Python interface. Import and go
📦 Zero Build Step Pre-compiled wheels on PyPI for Python 3.10–3.14 on all major platforms
🏔️ Big Data Ready Excels in 1 Billion Row Challenge benchmarks, crushing high-throughput tasks
🔍 3-Way Hybrid Search BM25 + Fuzzy + Dense embeddings via RRF — 25ms at 1M docs, all in Rust
🔎 Filter & Sort Meilisearch-style filtering and sorting with Rust-level performance
📄 Document Objects First-class Document(content, metadata) + LangChain compatibility
🧩 Ecosystem Integrations BM25, Hybrid Search, and LangChain Retrievers for Vector DBs (Qdrant, LanceDB, FAISS, etc.)

Installation

pip install rustfuzz
# or, with uv (recommended — much faster):
uv pip install rustfuzz

Quick Start

import rustfuzz.fuzz as fuzz
from rustfuzz.distance import Levenshtein

# Fuzzy ratio
print(fuzz.ratio("hello world", "hello wrold"))          # ~96.0

# Partial ratio (substring match)
print(fuzz.partial_ratio("hello", "say hello world"))    # 100.0

# Token-order-insensitive match
print(fuzz.token_sort_ratio("fuzzy wuzzy", "wuzzy fuzzy")) # 100.0

# Levenshtein distance
print(Levenshtein.distance("kitten", "sitting"))         # 3

# Normalised similarity [0.0 – 1.0]
print(Levenshtein.normalized_similarity("kitten", "kitten")) # 1.0

Batch extraction

from rustfuzz import process

choices = ["New York", "New Orleans", "Newark", "Los Angeles"]
print(process.extractOne("new york", choices))
# ('New York', 100.0, 0)

print(process.extract("new", choices, limit=3))
# [('Newark', ...), ('New York', ...), ('New Orleans', ...)]

3-Way Hybrid Search (BM25 + Fuzzy + Dense)

from rustfuzz.search import Document, HybridSearch

# Create documents with metadata
docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro", {"brand": "Google", "price": 699}),
]

# Optional: add dense embeddings for semantic search
embeddings = [[1.0, 0.0, 0.0], [0.9, 0.1, 0.0], [0.1, 0.9, 0.0]]

hs = HybridSearch(docs, embeddings=embeddings)

# Handles typos via fuzzy, keywords via BM25, meaning via dense — all in Rust
results = hs.search("appel iphon", query_embedding=[1.0, 0.0, 0.0], n=1)
text, score, meta = results[0]
print(f"{text} — ${meta['price']}")
# Apple iPhone 15 Pro Max 256GB — $1199

Also works with LangChain Document objects — no dependency required, auto-detected via duck-typing!

With Real Embeddings (FastEmbed)

Use FastEmbed for lightweight, local, ONNX-based embeddings — no GPU needed:

from fastembed import TextEmbedding
from rustfuzz.search import Document, HybridSearch

model = TextEmbedding("BAAI/bge-small-en-v1.5")  # ~33 MB, CPU-only

docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple"}),
    Document("Samsung Galaxy S24 Ultra",      {"brand": "Samsung"}),
    Document("Sony WH-1000XM5 Headphones",    {"brand": "Sony"}),
]

embeddings = [e.tolist() for e in model.embed([d.content for d in docs])]
hs = HybridSearch(docs, embeddings=embeddings)

query = "wireless noise cancelling headset"
query_emb = list(model.embed([query]))[0].tolist()

results = hs.search(query, query_embedding=query_emb, n=1)
text, score, meta = results[0]
print(f"{text}{meta['brand']}")
# Sony WH-1000XM5 Headphones — Sony

With Rust-Native Embeddings (EmbedAnything)

Use EmbedAnything for Rust-native embeddings via Candle — no PyTorch, no ONNX:

import embed_anything
from embed_anything import EmbeddingModel
from rustfuzz.search import Document, HybridSearch

model = EmbeddingModel.from_pretrained_hf(
    model_id="sentence-transformers/all-MiniLM-L6-v2",
)

docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple"}),
    Document("Samsung Galaxy S24 Ultra",      {"brand": "Samsung"}),
    Document("Sony WH-1000XM5 Headphones",    {"brand": "Sony"}),
]

# Embed corpus with EmbedAnything
embed_data = embed_anything.embed_query([d.content for d in docs], embedder=model)
embeddings = [item.embedding for item in embed_data]

hs = HybridSearch(docs, embeddings=embeddings)

query = "wireless noise cancelling headset"
query_emb = embed_anything.embed_query([query], embedder=model)[0].embedding

text, score, meta = hs.search(query, query_embedding=query_emb, n=1)[0]
print(f"{text}{meta['brand']}")
# Sony WH-1000XM5 Headphones — Sony

Or use the callback pattern for fully automatic query embedding:

def embed_fn(texts: list[str]) -> list[list[float]]:
    return [r.embedding for r in embed_anything.embed_query(texts, embedder=model)]

hs = HybridSearch(docs, embeddings=embed_fn)
results = hs.search("wireless headset", n=1)  # query auto-embedded!

Filtering & Sorting (Meilisearch-style)

from rustfuzz import Document
from rustfuzz.search import BM25

docs = [
    Document("Apple iPhone 15 Pro Max",  {"brand": "Apple",   "category": "phone",  "price": 1199, "in_stock": True}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "category": "phone",  "price": 1299, "in_stock": True}),
    Document("Google Pixel 8 Pro",       {"brand": "Google",  "category": "phone",  "price": 699,  "in_stock": False}),
    Document("Apple MacBook Pro M3",     {"brand": "Apple",   "category": "laptop", "price": 2499, "in_stock": True}),
]

bm25 = BM25(docs)

# Fluent builder: filter → sort → match (executes immediately)
results = (
    bm25
    .filter('brand = "Apple" AND price > 500')
    .sort("price:asc")
    .match("pro", n=10)
)

for text, score, meta in results:
    print(f"  {text} — ${meta['price']}")

# Supports: =, !=, >, <, >=, <=, TO (range), IN, EXISTS, IS NULL, AND, OR, NOT
# Works with BM25, BM25L, BM25Plus, BM25T, and HybridSearch

Filter and sort also work with HybridSearch (BM25 + Fuzzy + Dense):

from rustfuzz import Document
from rustfuzz.search import HybridSearch

docs = [
    Document("Apple iPhone 15 Pro Max", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro",       {"brand": "Google", "price": 699}),
]

hs = HybridSearch(docs, embeddings=embeddings)

# Filter + sort + semantic search
results = (
    hs
    .filter('brand = "Apple"')
    .sort("price:asc")
    .match("iphone pro", n=5, query_embedding=query_emb)
)

Supported Algorithms

Module Algorithms
rustfuzz.fuzz ratio, partial_ratio, token_sort_ratio, token_set_ratio, token_ratio, WRatio, QRatio, partial_token_*
rustfuzz.distance Levenshtein, Hamming, Indel, Jaro, JaroWinkler, LCSseq, OSA, DamerauLevenshtein, Prefix, Postfix
rustfuzz.process extract, extractOne, extract_iter, cdist
rustfuzz.search BM25, BM25L, BM25Plus, BM25T, HybridSearch, Document
rustfuzz.filter Meilisearch-style filter parser & evaluator
rustfuzz.sort Multi-key sort with dot notation
rustfuzz.query Fluent SearchQuery builder (.filter().sort().search().collect())
rustfuzz.utils default_process

The BM25 Search Engines

rustfuzz.search implements lightning-fast Text Retrieval mathematical variants. The core differences:

  • BM25 (Okapi): The industry standard. Employs term frequency saturation (logarithmic decay) and document length normalization.
  • BM25L: Focuses on length penalization corrections. Introduces a static term shift delta, guaranteeing that matching terms yield a minimum baseline score even in massive documents where normalisation would normally suppress them.
  • BM25Plus: Also creates a lower-bound for any given matching term, but applies the shift after term saturation. Widely considered the best default for highly mixed-length corpuses.
  • BM25T: Introduces Information Gain adjustments to dynamically calculate the saturation limit $k_1$ per term, restricting dominant variance. rustfuzz hyper-optimises this by pre-computing term limits natively within the inverted index.

You can see an end-to-end benchmark comparison of these algorithms resolving the BEIR SciFact dataset in examples/bench_retrieval.py.

Documentation

Full cookbook with interactive examples and benchmark results: 👉 bmsuisse.github.io/rustfuzz

License

MIT © BM Suisse

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rustfuzz-0.1.19.tar.gz (31.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (70.8 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (70.6 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

rustfuzz-0.1.19-cp310-abi3-win_amd64.whl (70.4 MB view details)

Uploaded CPython 3.10+Windows x86-64

rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_x86_64.whl (71.0 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_aarch64.whl (70.8 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (70.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (70.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

rustfuzz-0.1.19-cp310-abi3-macosx_11_0_arm64.whl (71.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

rustfuzz-0.1.19-cp310-abi3-macosx_10_12_x86_64.whl (70.6 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file rustfuzz-0.1.19.tar.gz.

File metadata

  • Download URL: rustfuzz-0.1.19.tar.gz
  • Upload date:
  • Size: 31.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.19.tar.gz
Algorithm Hash digest
SHA256 7eb3dfc3d63218c9c2b255d52fbd648897b721e8becb296b86c4eec68d4ac4d4
MD5 57eff5809580670959c5ad8679e8c219
BLAKE2b-256 1112099268bb97a580c575a2ce9090f0374412c75e2831a4df11b962a950c635

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19.tar.gz:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2f2c490a29b03644b31c41a7ebde8fdd713ca9b128874e247766156bef044f33
MD5 061b92959b9ef26801307c0007b93529
BLAKE2b-256 5eca9c65926cab34b3becae19484e64d0f202f74d235276788cd67f43bbad798

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c7685f026d311eda1d8ab6cb386a3186a70df812e26c44408d5b60c9d9d8c36e
MD5 f54804a0b1fb36100e2593c27a5c9758
BLAKE2b-256 dd7e214fcd89b4156647cf15463256b8eee8d9c5f49f38e696868cf1219fcf73

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: rustfuzz-0.1.19-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 70.4 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ec940f78e25068314de34520c02caa67c49685db4a5bf1bdbba590c2b6ebe5d0
MD5 5a8c108e152fe25626b1bd4a33710488
BLAKE2b-256 93ea46899fa0a502a74330b854d2dc7ced58b3b4ca394cfca5caad14ec4134af

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-win_amd64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 533ed5922491bcfa199b5665c182fc73d4c66a8581dba5b4087712af8d7a0a02
MD5 148a0a55bbf448c97bb7617b59cec8fc
BLAKE2b-256 81ac121b99dbeb426e01591a53ca2ce3f645e1c65aa271e597a42c1641e90f3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ed4e2e82524a17fbed30bc905a49a0c3ea68b262f124007e1d999e13e93d0eec
MD5 96820883faae4152c9176c31d90d56b4
BLAKE2b-256 1b6d4696b30b07f166803b321e6d689d8003e69b4958066da9a329ac5afb145c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-musllinux_1_2_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d70657796ca46d89399137bbd9c2b86ffb1cfff5e5d9b5cb0679ada57631d53d
MD5 ea85b278a4f420b1b4df37fb8a913a9b
BLAKE2b-256 70ac1e79aacff06d384923d728e50d8eba600deb9fac0aee9b9620b54be822f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 de8ff39a313ba823659882d6e3ea322ab0651c35f8a47fd93c1dc05f1af3b1a9
MD5 f9c15ff9e47be03dcc8c61ae1ee54c63
BLAKE2b-256 5b2c917a1889fc3292afd40a8f8c0cd74d3553366186404414d93752287d50f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f65a27d9acf8a7d016af776929b793cb76aa491bb64d7330b919251d1fd47782
MD5 d824310d0fedb432a777a3dd70590e24
BLAKE2b-256 cb5388242eaba4a39f6a9fecce68f7c67ce3d751b79745216011340eeebb2960

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.19-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.19-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 afd1feaa5a80770383b1e8fa451d22e8f2c40a68e7d92f1daafee4ed6bf120ad
MD5 c97dccd4e49c5626c1f77e0612050c10
BLAKE2b-256 bd141ce1f1511ce418bc49913c222cf391deb4e561a58f5f20620db8e95c4be7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.19-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page