Skip to main content

rapid fuzzy string matching

Project description

rustfuzz logo

PyPI version Docs Tests MIT License Rust powered Built by AI


[!WARNING] 🚧 Under Heavy Construction

This library is actively being developed and APIs may change between releases. We're shipping fast — expect frequent updates, new features, and occasional breaking changes. Pin your version if stability matters to you: pip install rustfuzz==0.1.21


🤖 This project was built entirely by AI.

The idea was simple: could an AI agent beat RapidFuzz — one of the fastest fuzzy matching libraries in the world — by writing a Rust-backed Python library from scratch, guided only by benchmarks?

The development loop was: Research → Build → Benchmark → Repeat.


rustfuzz is a blazing-fast fuzzy string matching library for Python — implemented entirely in Rust. 🚀

Zero Python overhead. Memory safe. Pre-compiled wheels for every major platform.

Features

Blazing Fast Core algorithms written in Rust — no Python overhead, no GIL bottlenecks
🧠 Smart Matching Ratio, partial ratio, token sort/set, Levenshtein, Jaro-Winkler, and more
🔒 Memory Safe Rust's borrow checker guarantees — no segfaults, no buffer overflows
🐍 Pythonic API Clean, typed Python interface. Import and go
📦 Zero Build Step Pre-compiled wheels on PyPI for Python 3.10–3.14 on all major platforms
🏔️ Big Data Ready Excels in 1 Billion Row Challenge benchmarks, crushing high-throughput tasks
🔍 3-Way Hybrid Search BM25 + Fuzzy + Dense embeddings via RRF — 25ms at 1M docs, all in Rust
🔎 Filter & Sort Meilisearch-style filtering and sorting with Rust-level performance
📄 Document Objects First-class Document(content, metadata) + LangChain compatibility
🧩 Ecosystem Integrations BM25, Hybrid Search, and LangChain Retrievers for Vector DBs (Qdrant, LanceDB, FAISS, etc.)
🎯 Retriever Batteries-included SOTA search — auto-selects BM25, embeddings (OpenAI/Cohere/HF), and reranker

Installation

pip install rustfuzz
# or, with uv (recommended — much faster):
uv pip install rustfuzz

Quick Start

import rustfuzz.fuzz as fuzz
from rustfuzz.distance import Levenshtein

# Fuzzy ratio
print(fuzz.ratio("hello world", "hello wrold"))          # ~96.0

# Partial ratio (substring match)
print(fuzz.partial_ratio("hello", "say hello world"))    # 100.0

# Token-order-insensitive match
print(fuzz.token_sort_ratio("fuzzy wuzzy", "wuzzy fuzzy")) # 100.0

# Levenshtein distance
print(Levenshtein.distance("kitten", "sitting"))         # 3

# Normalised similarity [0.0 – 1.0]
print(Levenshtein.normalized_similarity("kitten", "kitten")) # 1.0

Batch extraction

from rustfuzz import process

choices = ["New York", "New Orleans", "Newark", "Los Angeles"]
print(process.extractOne("new york", choices))
# ('New York', 100.0, 0)

print(process.extract("new", choices, limit=3))
# [('Newark', ...), ('New York', ...), ('New Orleans', ...)]

3-Way Hybrid Search (BM25 + Fuzzy + Dense)

from rustfuzz.search import Document, HybridSearch

# Create documents with metadata
docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro", {"brand": "Google", "price": 699}),
]

# Optional: add dense embeddings for semantic search
embeddings = [[1.0, 0.0, 0.0], [0.9, 0.1, 0.0], [0.1, 0.9, 0.0]]

hs = HybridSearch(docs, embeddings=embeddings)

# Handles typos via fuzzy, keywords via BM25, meaning via dense — all in Rust
results = hs.search("appel iphon", query_embedding=[1.0, 0.0, 0.0], n=1)
text, score, meta = results[0]
print(f"{text} — ${meta['price']}")
# Apple iPhone 15 Pro Max 256GB — $1199

Also works with LangChain Document objects — no dependency required, auto-detected via duck-typing!

Custom BM25 variants via fluent builder

You can seamlessly construct a HybridSearch model using any of the advanced BM25 variants (BM25L, BM25Plus, BM25T) via the .to_hybrid() builder method:

from rustfuzz.search import BM25L

results = (
    BM25L(docs, delta=0.5, b=0.8)
    .to_hybrid(embeddings=embeddings)
    .filter('brand = "Apple"')
    .match("iphone pro", n=10)
)

With Rust-Native Embeddings (EmbedAnything)

Use EmbedAnything for Rust-native embeddings via Candle — no PyTorch, no ONNX:

import embed_anything
from embed_anything import EmbeddingModel
from rustfuzz.search import Document, HybridSearch

model = EmbeddingModel.from_pretrained_hf(
    model_id="sentence-transformers/all-MiniLM-L6-v2",
)

docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple"}),
    Document("Samsung Galaxy S24 Ultra",      {"brand": "Samsung"}),
    Document("Sony WH-1000XM5 Headphones",    {"brand": "Sony"}),
]

# Embed corpus with EmbedAnything
embed_data = embed_anything.embed_query([d.content for d in docs], embedder=model)
embeddings = [item.embedding for item in embed_data]

hs = HybridSearch(docs, embeddings=embeddings)

query = "wireless noise cancelling headset"
query_emb = embed_anything.embed_query([query], embedder=model)[0].embedding

text, score, meta = hs.search(query, query_embedding=query_emb, n=1)[0]
print(f"{text}{meta['brand']}")
# Sony WH-1000XM5 Headphones — Sony

Or use the callback pattern for fully automatic query embedding:

def embed_fn(texts: list[str]) -> list[list[float]]:
    return [r.embedding for r in embed_anything.embed_query(texts, embedder=model)]

hs = HybridSearch(docs, embeddings=embed_fn)
results = hs.search("wireless headset", n=1)  # query auto-embedded!

Retriever — Easy API

The Retriever class auto-selects the best pipeline — no manual wiring needed:

from rustfuzz import Retriever

# Simplest — BM25+ with fuzzy matching
r = Retriever(docs)
results = r.search("wireless headphones", n=5)

# Auto-embed with OpenAI (or "cohere", "azure-openai", True for local HF)
r = Retriever(docs, embeddings="openai", api_key="sk-...")

# Full SOTA: BM25 + Fuzzy + Embeddings + Reranker
r = Retriever(docs, embeddings="openai", reranker=cross_encoder_model)
results = r.search("wireless headphones", n=10)

# Config dataclass for reusable settings
from rustfuzz import RetrieverConfig
cfg = RetrieverConfig(algorithm="bm25l", k1=1.2, b=0.8)
r = Retriever(docs, config=cfg, embeddings="cohere")

Supported embedding providers: openai, openai-large, cohere, cohere-multilingual, azure-openai, azure-cohere, any "org/model" HuggingFace name, or True for the default local model.

Filtering & Sorting (Meilisearch-style)

from rustfuzz import Document
from rustfuzz.search import BM25

docs = [
    Document("Apple iPhone 15 Pro Max",  {"brand": "Apple",   "category": "phone",  "price": 1199, "in_stock": True}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "category": "phone",  "price": 1299, "in_stock": True}),
    Document("Google Pixel 8 Pro",       {"brand": "Google",  "category": "phone",  "price": 699,  "in_stock": False}),
    Document("Apple MacBook Pro M3",     {"brand": "Apple",   "category": "laptop", "price": 2499, "in_stock": True}),
]

bm25 = BM25(docs)

# Fluent builder: filter → sort → match (executes immediately)
results = (
    bm25
    .filter('brand = "Apple" AND price > 500')
    .sort("price:asc")
    .match("pro", n=10)
)

for text, score, meta in results:
    print(f"  {text} — ${meta['price']}")

# Supports: =, !=, >, <, >=, <=, TO (range), IN, EXISTS, IS NULL, AND, OR, NOT
# Works with BM25, BM25L, BM25Plus, BM25T, and HybridSearch

Filter and sort also work with HybridSearch (BM25 + Fuzzy + Dense):

from rustfuzz import Document
from rustfuzz.search import HybridSearch

docs = [
    Document("Apple iPhone 15 Pro Max", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro",       {"brand": "Google", "price": 699}),
]

hs = HybridSearch(docs, embeddings=embeddings)

# Filter + sort + semantic search
results = (
    hs
    .filter('brand = "Apple"')
    .sort("price:asc")
    .match("iphone pro", n=5, query_embedding=query_emb)
)

Supported Algorithms

Module Algorithms
rustfuzz.fuzz ratio, partial_ratio, token_sort_ratio, token_set_ratio, token_ratio, WRatio, QRatio, partial_token_*
rustfuzz.distance Levenshtein, Hamming, Indel, Jaro, JaroWinkler, LCSseq, OSA, DamerauLevenshtein, Prefix, Postfix
rustfuzz.process extract, extractOne, extract_iter, cdist
rustfuzz.search BM25, BM25L, BM25Plus, BM25T, HybridSearch, Document
rustfuzz.engine Retriever, RetrieverConfig — batteries-included easy API
rustfuzz.filter Meilisearch-style filter parser & evaluator
rustfuzz.sort Multi-key sort with dot notation
rustfuzz.query Fluent SearchQuery builder (.filter().sort().search().collect())
rustfuzz.utils default_process

The BM25 Search Engines

rustfuzz.search implements lightning-fast Text Retrieval mathematical variants. The core differences:

  • BM25 (Okapi): The industry standard. Employs term frequency saturation (logarithmic decay) and document length normalization.
  • BM25L: Focuses on length penalization corrections. Introduces a static term shift delta, guaranteeing that matching terms yield a minimum baseline score even in massive documents where normalisation would normally suppress them.
  • BM25Plus: Also creates a lower-bound for any given matching term, but applies the shift after term saturation. Widely considered the best default for highly mixed-length corpuses.
  • BM25T: Introduces Information Gain adjustments to dynamically calculate the saturation limit $k_1$ per term, restricting dominant variance. rustfuzz hyper-optimises this by pre-computing term limits natively within the inverted index.

You can see an end-to-end benchmark comparison of these algorithms resolving the BEIR SciFact dataset in examples/bench_retrieval.py.

Documentation

Full cookbook with interactive examples and benchmark results: 👉 bmsuisse.github.io/rustfuzz

License

MIT © BM Suisse

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rustfuzz-0.1.38.tar.gz (28.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

rustfuzz-0.1.38-cp310-abi3-win_amd64.whl (997.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

rustfuzz-0.1.38-cp310-abi3-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

rustfuzz-0.1.38-cp310-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file rustfuzz-0.1.38.tar.gz.

File metadata

  • Download URL: rustfuzz-0.1.38.tar.gz
  • Upload date:
  • Size: 28.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.38.tar.gz
Algorithm Hash digest
SHA256 2520dc1193fcadb31b5d2b6e1226c43bae446035bd780bbd1d03e6af70676ad4
MD5 482824a2cbb5d77806aa281c8d4e4f34
BLAKE2b-256 930653464c0886908a4f23dc459a0db0500c548c3ca7f1027dcd59062c7bc6b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38.tar.gz:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6848518efa90fdb4319202e5595aa32d891c6af9025ea33fc0a5d850b5504279
MD5 dc64d8f4c08ce50d2f9df45a0539af1e
BLAKE2b-256 553b35f5e3557510b2222b79d9d9adc40f497e0bae3476ae98edd30739fba0fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5a424255b68ae92ea8050151fd1f457863d7cd6a89647f5f9358cebc8819d799
MD5 7eef27d84f034100687a23d49105e555
BLAKE2b-256 ea0360a37f3ad7bf435736c846d9ffdc40a92338e58ff03efbf56962cdeb60df

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: rustfuzz-0.1.38-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 997.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e60c2fc4e49528457eb861d2c495062deb2a130da4788b661c5f4e6213c0d518
MD5 062be444e9fb039817283a8b87b22999
BLAKE2b-256 1e85dbf144ebd9f2c17c1013fa226f10b9dca92e1d54e615cd87e9774f8270ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-win_amd64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 fcfaeb72ea161d29288ebe3d86b74373d606adec79765db27050a7d60053fd40
MD5 31cff063aa027d7e7919fc105acf1cbd
BLAKE2b-256 97fb5cad6fde6fb1dca7b3eaf67aaf1ff1d85ad3974b248aed54ae54d04468b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 dd6f7b34e5dd7354695e639d0a820dec016b91469f8820a9cfeafb4f6afe315c
MD5 165ee16a1b87665126fef30fddda5877
BLAKE2b-256 5891743491857837f0166ebecaa929a75543db48c236b75285222f88f829861b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-musllinux_1_2_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a29aac0215befb35e02b2489a1aa11b7b256f978b4b3882fe0465b5a97dd37a1
MD5 cc1b8d8c7937a9971ee02757dde8ce9a
BLAKE2b-256 fa0372f3e11e41b29169a59e094b98072a96b88dd124e2d75022876abcdc806c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a8f450dc9f6e59b0e0b942eadd8122405711fba83d87880f59798e663f1e704f
MD5 15b1cbe0ec6a9f268af35ed3da97ac98
BLAKE2b-256 226c4518fcf9f1bb3ecc5cceec10ad21cc397759c51ad6c1c673736e5c655b17

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5eef43f0b122800148c7b8d4fb39760abcfc7050c64843e11bbef7440497ff71
MD5 061e603acc7a666ef1bf4a030aec6a23
BLAKE2b-256 f69728b2abd2b5edfbb8c3042544882783d4f1c1975f8ad913f5dcd1992d1036

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.38-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.38-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 eaba53efbaf8b26338540fab3d8021cc157c7e7e19ac03507183971f46e96410
MD5 77d741f0e75477ea22361d9674a61116
BLAKE2b-256 0f54559d0097278d00400f2f52af11bbe7433b41809f32ef0180ab999b01794f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.38-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page