Skip to main content

rapid fuzzy string matching

Project description

rustfuzz logo

PyPI version Docs Tests MIT License Rust powered Built by AI


[!WARNING] 🚧 Under Heavy Construction

This library is actively being developed and APIs may change between releases. We're shipping fast — expect frequent updates, new features, and occasional breaking changes. Pin your version if stability matters to you: pip install rustfuzz==0.1.12


🤖 This project was built entirely by AI.

The idea was simple: could an AI agent beat RapidFuzz — one of the fastest fuzzy matching libraries in the world — by writing a Rust-backed Python library from scratch, guided only by benchmarks?

The development loop was: Research → Build → Benchmark → Repeat.


rustfuzz is a blazing-fast fuzzy string matching library for Python — implemented entirely in Rust. 🚀

Zero Python overhead. Memory safe. Pre-compiled wheels for every major platform.

The Challenge: Beat RapidFuzz

flowchart LR
    R["🔍 Research<br>Profiler output<br>& algorithm gaps"]
    B["🦀 Build<br>Rust implementation<br>via PyO3"]
    T["✅ Test<br>All tests must pass<br>before proceeding"]
    BM["📊 Benchmark<br>vs RapidFuzz<br>Numbers don't lie"]
    RP["🔁 Repeat<br>Find the next<br>bottleneck"]

    R --> B --> T --> BM --> RP --> R

    style R fill:#6366f1,color:#fff,stroke:none
    style B fill:#a855f7,color:#fff,stroke:none
    style T fill:#ef4444,color:#fff,stroke:none
    style BM fill:#22c55e,color:#fff,stroke:none
    style RP fill:#f59e0b,color:#fff,stroke:none

The goal: match or exceed RapidFuzz's throughput on ratio, partial_ratio, token_sort_ratio, and process.extract — all from Python. Each iteration starts with profiling, identifies the hottest path, and rewrites it deeper into Rust.

The Results: RustFuzz is Faster 🏆

We benchmarked process.extract on a 1,000,000 row corpus. Thanks to zero-overhead Rayon parallelization, lock-free global threshold shrinking (AtomicU64), and native query token caching, rustfuzz officially outperforms rapidfuzz.

Benchmark (1M rows) RapidFuzz RustFuzz (Parallel)
Raw Characters (ratio) 5506 ms 5253 ms
Complex Tokens (WRatio) 3032 ms 2716 ms

But that's not all. By utilizing the built-in BM25 Hybrid Pipeline, rustfuzz can complete the identical extraction task in a revolutionary 97 ms (a ~30x speedup over state-of-the-art fuzzy matching!).

Features

Blazing Fast Core algorithms written in Rust — no Python overhead, no GIL bottlenecks
🧠 Smart Matching Ratio, partial ratio, token sort/set, Levenshtein, Jaro-Winkler, and more
🔒 Memory Safe Rust's borrow checker guarantees — no segfaults, no buffer overflows
🐍 Pythonic API Clean, typed Python interface. Import and go
📦 Zero Build Step Pre-compiled wheels on PyPI for Python 3.10–3.14 on all major platforms
🏔️ Big Data Ready Excels in 1 Billion Row Challenge benchmarks, crushing high-throughput tasks
🔍 3-Way Hybrid Search BM25 + Fuzzy + Dense embeddings via RRF — 25ms at 1M docs, all in Rust
📄 Document Objects First-class Document(content, metadata) + LangChain compatibility
🧩 Ecosystem Integrations BM25, Hybrid Search, and LangChain Retrievers for Vector DBs (Qdrant, LanceDB, FAISS, etc.)

Installation

pip install rustfuzz
# or, with uv (recommended — much faster):
uv pip install rustfuzz

Quick Start

import rustfuzz.fuzz as fuzz
from rustfuzz.distance import Levenshtein

# Fuzzy ratio
print(fuzz.ratio("hello world", "hello wrold"))          # ~96.0

# Partial ratio (substring match)
print(fuzz.partial_ratio("hello", "say hello world"))    # 100.0

# Token-order-insensitive match
print(fuzz.token_sort_ratio("fuzzy wuzzy", "wuzzy fuzzy")) # 100.0

# Levenshtein distance
print(Levenshtein.distance("kitten", "sitting"))         # 3

# Normalised similarity [0.0 – 1.0]
print(Levenshtein.normalized_similarity("kitten", "kitten")) # 1.0

Batch extraction

from rustfuzz import process

choices = ["New York", "New Orleans", "Newark", "Los Angeles"]
print(process.extractOne("new york", choices))
# ('New York', 100.0, 0)

print(process.extract("new", choices, limit=3))
# [('Newark', ...), ('New York', ...), ('New Orleans', ...)]

3-Way Hybrid Search (BM25 + Fuzzy + Dense)

from rustfuzz.search import Document, HybridSearch

# Create documents with metadata
docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple", "price": 1199}),
    Document("Samsung Galaxy S24 Ultra", {"brand": "Samsung", "price": 1299}),
    Document("Google Pixel 8 Pro", {"brand": "Google", "price": 699}),
]

# Optional: add dense embeddings for semantic search
embeddings = [[1.0, 0.0, 0.0], [0.9, 0.1, 0.0], [0.1, 0.9, 0.0]]

hs = HybridSearch(docs, embeddings=embeddings)

# Handles typos via fuzzy, keywords via BM25, meaning via dense — all in Rust
results = hs.search("appel iphon", query_embedding=[1.0, 0.0, 0.0], n=1)
text, score, meta = results[0]
print(f"{text} — ${meta['price']}")
# Apple iPhone 15 Pro Max 256GB — $1199

Also works with LangChain Document objects — no dependency required, auto-detected via duck-typing!

With Real Embeddings (FastEmbed)

Use FastEmbed for lightweight, local, ONNX-based embeddings — no GPU needed:

from fastembed import TextEmbedding
from rustfuzz.search import Document, HybridSearch

model = TextEmbedding("BAAI/bge-small-en-v1.5")  # ~33 MB, CPU-only

docs = [
    Document("Apple iPhone 15 Pro Max 256GB", {"brand": "Apple"}),
    Document("Samsung Galaxy S24 Ultra",      {"brand": "Samsung"}),
    Document("Sony WH-1000XM5 Headphones",    {"brand": "Sony"}),
]

embeddings = [e.tolist() for e in model.embed([d.content for d in docs])]
hs = HybridSearch(docs, embeddings=embeddings)

query = "wireless noise cancelling headset"
query_emb = list(model.embed([query]))[0].tolist()

results = hs.search(query, query_embedding=query_emb, n=1)
text, score, meta = results[0]
print(f"{text}{meta['brand']}")
# Sony WH-1000XM5 Headphones — Sony

Supported Algorithms

Module Algorithms
rustfuzz.fuzz ratio, partial_ratio, token_sort_ratio, token_set_ratio, token_ratio, WRatio, QRatio, partial_token_*
rustfuzz.distance Levenshtein, Hamming, Indel, Jaro, JaroWinkler, LCSseq, OSA, DamerauLevenshtein, Prefix, Postfix
rustfuzz.process extract, extractOne, extract_iter, cdist
rustfuzz.search BM25, BM25L, BM25Plus, BM25T, HybridSearch, Document
rustfuzz.utils default_process

The BM25 Search Engines

rustfuzz.search implements lightning-fast Text Retrieval mathematical variants. The core differences:

  • BM25 (Okapi): The industry standard. Employs term frequency saturation (logarithmic decay) and document length normalization.
  • BM25L: Focuses on length penalization corrections. Introduces a static term shift delta, guaranteeing that matching terms yield a minimum baseline score even in massive documents where normalisation would normally suppress them.
  • BM25Plus: Also creates a lower-bound for any given matching term, but applies the shift after term saturation. Widely considered the best default for highly mixed-length corpuses.
  • BM25T: Introduces Information Gain adjustments to dynamically calculate the saturation limit $k_1$ per term, restricting dominant variance. rustfuzz hyper-optimises this by pre-computing term limits natively within the inverted index.

You can see an end-to-end benchmark comparison of these algorithms resolving the BEIR SciFact dataset in examples/bench_retrieval.py.

Documentation

Full cookbook with interactive examples and benchmark results: 👉 bmsuisse.github.io/rustfuzz

License

MIT © BM Suisse

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rustfuzz-0.1.13.tar.gz (31.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (70.7 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (70.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

rustfuzz-0.1.13-cp310-abi3-win_amd64.whl (70.3 MB view details)

Uploaded CPython 3.10+Windows x86-64

rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_x86_64.whl (70.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_aarch64.whl (70.7 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (70.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (70.5 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

rustfuzz-0.1.13-cp310-abi3-macosx_11_0_arm64.whl (70.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

rustfuzz-0.1.13-cp310-abi3-macosx_10_12_x86_64.whl (70.5 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file rustfuzz-0.1.13.tar.gz.

File metadata

  • Download URL: rustfuzz-0.1.13.tar.gz
  • Upload date:
  • Size: 31.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.13.tar.gz
Algorithm Hash digest
SHA256 926d0740c762fa01ef2b198eb10fdb2796ec9e0659c98eda565b1a8a45b3b878
MD5 3e5c4f61c5990cf5408f966c0fd8c10e
BLAKE2b-256 b1cf76eba075bce548e96e4c9a1940fcb0e8a9f57a0490462bae633b72d33e7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13.tar.gz:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5ae9fde0c8790a164827a043282a0d9a31385b9b60fed786cdd4a7cd1d533606
MD5 d541041ad7ce9ac98de53c15660c589c
BLAKE2b-256 6bca7d3f23ceefdac91e5a9325288b3cbb36e1463c3b494931c89cf0b360d19e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6a6493feca66e2403441cd46eb6882f787e547ef3a589449af7483660a2f05a1
MD5 f22adb9701f85ec47857f6ab2af4788c
BLAKE2b-256 f56e446af0bed77a084cf6532459b05f12056aef9508ae9f818a9da262a78ff5

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: rustfuzz-0.1.13-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 70.3 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c7d4342db63c36ca82ec38a578447ef7a27b6d252ad0892ef35f64c5bbf3dab6
MD5 be89de37b1c55afe4d5d07a54e1bc73b
BLAKE2b-256 0a700165094cb529517327dde884a2260ab525e628d2cc38ec4d99047eb80ffc

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-win_amd64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8cdfcb4afdb2383d35a7a28aa2a431a6b00b6fa1380c17044d9a2cf0feb832f9
MD5 a238d818682c3b2873c77e130e2ffc3c
BLAKE2b-256 9f6ca30f612e697e47a98ae9d9026ba81c2f4450a422ab40cf6d7405a4ed5846

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 5182ebfa58627f094504bea10ed3efadfc2fb05d5020bb4bc3e45ccde2e38418
MD5 5dbf28c02e0f6fba8622a0f7677e06e8
BLAKE2b-256 f6d13cbbeaf9867c8d0a43bb75118ee1493d3720c4f71914f3b2b4ee3c7f4677

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-musllinux_1_2_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 145eddfdfe47851df306386a71ce2d2d22b1cb8fc37679f1c5a8018d168aa3be
MD5 c3edfdd0d7789f5bdcdf0fa019b22b8e
BLAKE2b-256 9fc75bf2425e5982d3ab83ec12bf171c6006c5fa16fd7639ddb3591ef3fb96f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d860aaca58ebc2d1971540651e2ce2efbceaa347767b515accc012a5140e64b0
MD5 bbe2df5b10280269d5ebb679f72a6e30
BLAKE2b-256 44b379bfd542b86d6508d24add1fcf62b47de38b0884ef1e3b671a49e101b1fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fba6acd3d739a5649d9a1c83eac3241dd6f8fdd5aac393ba9d57410bb15ab878
MD5 d67f133640ab6ae47409447466b30236
BLAKE2b-256 0637c232c79b83f69f6c5fb3e3bc04929d3dfa8dc36f92ed29b668969fde9d62

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rustfuzz-0.1.13-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rustfuzz-0.1.13-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 25e97225d15a0c5c1f44190581bcdf9d360a9d6b49d28071fe074003155ae5c6
MD5 78c76c13fefd8ec97b04465fdf455ce9
BLAKE2b-256 95b0ea4e3bf6fd12e6deb031dabee488f45578294b66f920c162e1e4c923cdb7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rustfuzz-0.1.13-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: ci.yml on bmsuisse/rustfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page