Skip to main content

rapid fuzzy string matching

Project description

rustfuzz logo

PyPI version Docs Tests MIT License Rust powered Built by AI


[!WARNING] 🚧 Under Heavy Construction

This library is actively being developed and APIs may change between releases. We're shipping fast — expect frequent updates, new features, and occasional breaking changes. Pin your version if stability matters to you: pip install rustfuzz==0.1.12


🤖 This project was built entirely by AI.

The idea was simple: could an AI agent beat RapidFuzz — one of the fastest fuzzy matching libraries in the world — by writing a Rust-backed Python library from scratch, guided only by benchmarks?

The development loop was: Research → Build → Benchmark → Repeat.


rustfuzz is a blazing-fast fuzzy string matching library for Python — implemented entirely in Rust. 🚀

Zero Python overhead. Memory safe. Pre-compiled wheels for every major platform.

Features

Blazing Fast Core algorithms written in Rust — no Python overhead, no GIL bottlenecks
🧠 Smart Matching Ratio, partial ratio, token sort/set, Levenshtein, Jaro-Winkler, and more
🔒 Memory Safe Rust's borrow checker guarantees — no segfaults, no buffer overflows
🐍 Pythonic API Clean, typed Python interface. Import and go
📦 Zero Build Step Pre-compiled wheels on PyPI for Python 3.10–3.14 on all major platforms
🏔️ Big Data Ready Excels in 1 Billion Row Challenge benchmarks, crushing high-throughput tasks
🔍 3-Way Hybrid Search BM25 + Fuzzy + Dense embeddings via RRF — 25ms at 1M docs, all in Rust
🔎 Filter & Sort Meilisearch-style filtering and sorting with Rust-level performance
📄 Document Objects First-class Document(content, metadata) + LangChain compatibility
🧩 Ecosystem Integrations BM25, Hybrid Search, and LangChain Retrievers for Vector DBs (Qdrant, LanceDB, FAISS, etc.)

Installation

pip install rustfuzz
# or, with uv (recommended — much faster):
uv pip install rustfuzz

Quick Start

import rustfuzz.fuzz as fuzz
from rustfuzz.distance import Levenshtein

# Fuzzy ratio
print(fuzz.ratio("hello world", "hello wrold"))          # ~96.0

# Partial ratio (substring match)
print(fuzz.partial_ratio("hello", "say hello world"))    # 100.0

# Token-order-insensitive match
print(fuzz.token_sort_ratio("fuzzy wuzzy", "wuzzy fuzzy")) # 100.0

# Levenshtein distance
print(Levenshtein.distance("kitten", "sitting"))         # 3

# Normalised similarity [0.0 – 1.0]
print(Levenshtein.normalized_similarity("kitten", "kitten")) # 1.0

Batch extraction

from rustfuzz import process

choices = ["New York", "New Orleans", "Newark", "Los Angeles"]
print(process.extractOne("new york", choices))
# ('New York', 100.0, 0)

print(process.extract("new", choices, limit=3))
# [('Newark', ...), ('New York', ...), ('New Orleans', ...)]

3-Way Hybrid Search (BM25 + Fuzzy + Dense)

from rustfuzz.search import Document, HybridSearch

# Create documents with metadata
docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro", {"brand": "Google", "price": 699}),
]

# Optional: add dense embeddings for semantic search
embeddings = [[1.0, 0.0, 0.0], [0.9, 0.1, 0.0], [0.1, 0.9, 0.0]]

hs = HybridSearch(docs, embeddings=embeddings)

# Handles typos via fuzzy, keywords via BM25, meaning via dense — all in Rust
results = hs.search("appel iphon", query_embedding=[1.0, 0.0, 0.0], n=1)
text, score, meta = results[0]
print(f"{text} — ${meta['price']}")
# Apple iPhone 15 Pro Max 256GB — $1199

Also works with LangChain Document objects — no dependency required, auto-detected via duck-typing!

With Real Embeddings (FastEmbed)

Use FastEmbed for lightweight, local, ONNX-based embeddings — no GPU needed:

from fastembed import TextEmbedding
from rustfuzz.search import Document, HybridSearch

model = TextEmbedding("BAAI/bge-small-en-v1.5")  # ~33 MB, CPU-only

docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple"}),
    Document("Samsung Galaxy S24 Ultra",      {"brand": "Samsung"}),
    Document("Sony WH-1000XM5 Headphones",    {"brand": "Sony"}),
]

embeddings = [e.tolist() for e in model.embed([d.content for d in docs])]
hs = HybridSearch(docs, embeddings=embeddings)

query = "wireless noise cancelling headset"
query_emb = list(model.embed([query]))[0].tolist()

results = hs.search(query, query_embedding=query_emb, n=1)
text, score, meta = results[0]
print(f"{text}{meta['brand']}")
# Sony WH-1000XM5 Headphones — Sony

With Rust-Native Embeddings (EmbedAnything)

Use EmbedAnything for Rust-native embeddings via Candle — no PyTorch, no ONNX:

import embed_anything
from embed_anything import EmbeddingModel
from rustfuzz.search import Document, HybridSearch

model = EmbeddingModel.from_pretrained_hf(
    model_id="sentence-transformers/all-MiniLM-L6-v2",
)

docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple"}),
    Document("Samsung Galaxy S24 Ultra",      {"brand": "Samsung"}),
    Document("Sony WH-1000XM5 Headphones",    {"brand": "Sony"}),
]

# Embed corpus with EmbedAnything
embed_data = embed_anything.embed_query([d.content for d in docs], embedder=model)
embeddings = [item.embedding for item in embed_data]

hs = HybridSearch(docs, embeddings=embeddings)

query = "wireless noise cancelling headset"
query_emb = embed_anything.embed_query([query], embedder=model)[0].embedding

text, score, meta = hs.search(query, query_embedding=query_emb, n=1)[0]
print(f"{text}{meta['brand']}")
# Sony WH-1000XM5 Headphones — Sony

Or use the callback pattern for fully automatic query embedding:

def embed_fn(texts: list[str]) -> list[list[float]]:
    return [r.embedding for r in embed_anything.embed_query(texts, embedder=model)]

hs = HybridSearch(docs, embeddings=embed_fn)
results = hs.search("wireless headset", n=1)  # query auto-embedded!

Filtering & Sorting (Meilisearch-style)

from rustfuzz import Document
from rustfuzz.search import BM25

docs = [
    Document("Apple iPhone 15 Pro Max",  {"brand": "Apple",   "category": "phone",  "price": 1199, "in_stock": True}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "category": "phone",  "price": 1299, "in_stock": True}),
    Document("Google Pixel 8 Pro",       {"brand": "Google",  "category": "phone",  "price": 699,  "in_stock": False}),
    Document("Apple MacBook Pro M3",     {"brand": "Apple",   "category": "laptop", "price": 2499, "in_stock": True}),
]

bm25 = BM25(docs)

# Fluent builder: filter → sort → match (executes immediately)
results = (
    bm25
    .filter('brand = "Apple" AND price > 500')
    .sort("price:asc")
    .match("pro", n=10)
)

for text, score, meta in results:
    print(f"  {text} — ${meta['price']}")

# Supports: =, !=, >, <, >=, <=, TO (range), IN, EXISTS, IS NULL, AND, OR, NOT
# Works with BM25, BM25L, BM25Plus, BM25T, and HybridSearch

Filter and sort also work with HybridSearch (BM25 + Fuzzy + Dense):

from rustfuzz import Document
from rustfuzz.search import HybridSearch

docs = [
    Document("Apple iPhone 15 Pro Max", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro",       {"brand": "Google", "price": 699}),
]

hs = HybridSearch(docs, embeddings=embeddings)

# Filter + sort + semantic search
results = (
    hs
    .filter('brand = "Apple"')
    .sort("price:asc")
    .match("iphone pro", n=5, query_embedding=query_emb)
)

Supported Algorithms

Module Algorithms
rustfuzz.fuzz ratio, partial_ratio, token_sort_ratio, token_set_ratio, token_ratio, WRatio, QRatio, partial_token_*
rustfuzz.distance Levenshtein, Hamming, Indel, Jaro, JaroWinkler, LCSseq, OSA, DamerauLevenshtein, Prefix, Postfix
rustfuzz.process extract, extractOne, extract_iter, cdist
rustfuzz.search BM25, BM25L, BM25Plus, BM25T, HybridSearch, Document
rustfuzz.filter Meilisearch-style filter parser & evaluator
rustfuzz.sort Multi-key sort with dot notation
rustfuzz.query Fluent SearchQuery builder (.filter().sort().search().collect())
rustfuzz.utils default_process

The BM25 Search Engines

rustfuzz.search implements lightning-fast Text Retrieval mathematical variants. The core differences:

  • BM25 (Okapi): The industry standard. Employs term frequency saturation (logarithmic decay) and document length normalization.
  • BM25L: Focuses on length penalization corrections. Introduces a static term shift delta, guaranteeing that matching terms yield a minimum baseline score even in massive documents where normalisation would normally suppress them.
  • BM25Plus: Also creates a lower-bound for any given matching term, but applies the shift after term saturation. Widely considered the best default for highly mixed-length corpuses.
  • BM25T: Introduces Information Gain adjustments to dynamically calculate the saturation limit $k_1$ per term, restricting dominant variance. rustfuzz hyper-optimises this by pre-computing term limits natively within the inverted index.

You can see an end-to-end benchmark comparison of these algorithms resolving the BEIR SciFact dataset in examples/bench_retrieval.py.

Documentation

Full cookbook with interactive examples and benchmark results: 👉 bmsuisse.github.io/rustfuzz

License

MIT © BM Suisse

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rustfuzz-0.1.21.tar.gz (31.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (70.8 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (70.6 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

rustfuzz-0.1.21-cp310-abi3-win_amd64.whl (70.4 MB view details)

Uploaded CPython 3.10+Windows x86-64

rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_x86_64.whl (71.0 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_aarch64.whl (70.8 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (70.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (70.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

rustfuzz-0.1.21-cp310-abi3-macosx_11_0_arm64.whl (71.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

rustfuzz-0.1.21-cp310-abi3-macosx_10_12_x86_64.whl (70.6 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file rustfuzz-0.1.21.tar.gz.

File metadata

  • Download URL: rustfuzz-0.1.21.tar.gz
  • Upload date:
  • Size: 31.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.21.tar.gz
Algorithm Hash digest
SHA256 0cada5a5d707263cf2174af65afd77977b5d6bb1c9fc7e815bac0c0519dbfc1b
MD5 957b283a462cf92501cba0b2ba472b6e
BLAKE2b-256 9b3252654cdaa4b77b704527fe5280cae9f16f3b4c69a042f67a798232e732f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21.tar.gz:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b3805fd135b59327f2f34e9191676079d22a7603f870f28d1b10efdc5edbd740
MD5 63064fa9c759a8ad2fc492c5f8dfe8a0
BLAKE2b-256 9e67af54db352b3df3367f58fa40010e508388d29c3cf9fbfc9701fb3d7a5f8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a9d165b4144b9d89db88e5d4c286006e15cd4404dc81b969eb1372027569c2a4
MD5 492b1c9e2deab7189f764997fc9feab1
BLAKE2b-256 f89cfe5b5ced4202f7387f39acd776a75f44c6f3119db366ad8270920572a871

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: rustfuzz-0.1.21-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 70.4 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8deed3971bc67bc0b9b4dc165ba02168b7a71fd8984bf7eb9f723f6db1c1a750
MD5 9d0d6ed36faad796fc633e4ac950b363
BLAKE2b-256 437b06e2f7ef39e327fa212f14f22ef08ac4dd68f2726d594d55be1c809a8a2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-win_amd64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0178de904d98eae058263a5afbb6caefd25a62123d8bd26d6304058c8ebf636a
MD5 97394269a2a77f38d65cd0bbb78ff2e7
BLAKE2b-256 d144cd7da8a8fe6273194a655e80002fb2ea7cddb25dd7f3e9a688b644733bba

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 981c88ef0d3856d9cd21b6cdd9a6c2501166030c2030f323543330d2d09bd861
MD5 b3d908d0c58ff84d11c385c5846b29a0
BLAKE2b-256 0f2369d2a12c606c8ad5ffe05695c2640c8ee283a8ea02c1eb1fc254bcbc3255

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-musllinux_1_2_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1ff456d1054e735fffc59b5c40deb37ac87120d7c66cc7a7b5c17cc95ab7e79
MD5 dcec2a1fd0611394272ca47f82aa28c9
BLAKE2b-256 2bda0c9eadf926677228218f3b121b6991b0ea075ce2d88ff61d238a956a8c87

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 27c6e27dceb68213f297b59a5fb213cf13f28a41553129a310cde7321faa40c1
MD5 0f69ff1ab618ed6454e690bcae2977bb
BLAKE2b-256 e91d13c26e535cd01dc3a7835e147c3543d1103d3f9b653a7e0f0fac8903d6ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 78bb073c032fb06ef1f38f91673a98d37ef3cbabe0c6539579b913d7319820c4
MD5 a07dc3a5b898a82b6474992a17ddfc46
BLAKE2b-256 6cdf6923dec8c39751e2877eaab3ccfffafce4c28eb49bcd96f2204b01662142

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.21-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.21-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c2409c8ceda84faebce375273c927f57d49df7293620f3ef237dca96d6c4471b
MD5 fa4eb7009b9f04ed04cfdee68e94fb43
BLAKE2b-256 bead7c70ac95cf3fabd5226a6805454a0d9fef22c47ad8434090ebdbb50091a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.21-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page