Collection facades with built-in fuzzy lookup powered by RapidFuzz
Project description
rapidfuzz-collections
rapidfuzz-collections provides collection facades that keep Python's builtin collection behavior while adding fuzzy lookup powered by RapidFuzz.
Use it when your data naturally belongs in a list, tuple, set, or dict, but you also need typo-tolerant lookup over the stored values or mapping keys.
Contents
- What This Library Adds
- Installation
- Quick Start
- Choosing a Collection
- Practical Examples
- Runnable Examples
- Lookup Model
- Index Strategies
- Configuration
- Batch Lookup and cdist
- Mutation and Rebuilds
- Result Objects
- Public Method Reference
- Performance Guidance
- Advanced Index APIs
- Design Boundaries
- Development Checks
- Third-party Example Datasets
- License
For index-strategy rationale, historical benchmark investigations, and reproducible numbers behind the guidance in this README, see benchmarks/DESIGN.md.
What This Library Adds
RapidFuzz already provides the fuzzy matching algorithms. This library does not replace RapidFuzz and does not reimplement its scorers.
RapidFuzz provides:
- string similarity and distance scorers, such as
WRatio,ratio, and Levenshtein distance; - high-performance extraction utilities such as
process.extractOne; - matrix scoring utilities such as
process.cdist; - scorer-specific behavior, score cutoffs, score hints, and scorer kwargs.
rapidfuzz-collections adds:
- builtin-like collections that store your original values unchanged;
- cached normalized lookup choices for repeated fuzzy searches;
- exact-value registries and deterministic equal-score tie-breaking;
- mutation-aware index maintenance for mutable collections;
- consistent result objects for value and mapping lookups.
The boundary is deliberate:
- Choose RapidFuzz directly when you need raw string metrics, custom matrix scoring, or one-off matching between plain sequences.
- Choose
rapidfuzz-collectionswhen your data has collection semantics and fuzzy lookup is a repeated operation over that collection.
Official RapidFuzz resources:
- Documentation: https://rapidfuzz.github.io/RapidFuzz/
- GitHub: https://github.com/rapidfuzz/RapidFuzz
Installation
pip install rapidfuzz-collections
Python 3.14 or later is required. RapidFuzz is installed as the runtime fuzzy matching dependency.
Install the optional cdist extra only if you plan to use the explicit bounded matrix batch methods:
pip install "rapidfuzz-collections[cdist]"
The cdist extra installs NumPy for the opt-in *_batch_cdist methods. The ordinary fuzzy lookup methods do not require NumPy.
Quick Start
from rapidfuzz_collections import FuzzyDict, FuzzyList
products = FuzzyList(["Alpha Phone", "Beta Tablet", "Gamma Watch"])
match = products.fuzzy_find_one("Alpa Phone")
print(match.value) # "Alpha Phone"
catalog = FuzzyDict({"Alpha Phone": 499, "Beta Tablet": 799})
price = catalog.fuzzy_get("beta tablt")
print(price) # 799
Misses return None, or a default for direct fuzzy_get helpers:
print(products.fuzzy_find_one("Coffee Grinder")) # None
print(catalog.fuzzy_get("Coffee Grinder", default=0)) # 0
Choosing a Collection
| Need | Collection |
|---|---|
| Ordered values, duplicates allowed, mutable | FuzzyList |
| Ordered values, duplicates allowed, immutable | FuzzyTuple |
| Unique values, mutable | FuzzySet |
| Unique values, immutable and hashable | FrozenFuzzySet |
| Key to value mapping, mutable | FuzzyDict |
| Key to value mapping, immutable | FrozenFuzzyDict |
Choose by data model first:
- Use a sequence when order, duplicates, or positions matter.
- Use a set when values are unique and the value itself is the result.
- Use a dict when a fuzzy-matched key should retrieve a payload.
- Use frozen collections for read-many reference data.
- Use mutable collections when values are changed after construction.
Then choose an index strategy only for dict/set facades. See Index Strategies.
Practical Examples
Command palette
A command palette is an ordered list of unique or repeated labels. The result value is the command label, and the position can still be useful for UI state.
from rapidfuzz_collections import FuzzyList
commands = FuzzyList([
"Open Settings",
"Open Recent File",
"Toggle Sidebar",
"Format Document",
])
match = commands.fuzzy_find_one("format doc")
if match is not None:
print(match.value)
Product catalog lookup
A catalog often needs fuzzy lookup by product name while returning a price, identifier, or metadata record.
from rapidfuzz_collections import FuzzyDict
catalog = FuzzyDict({
"Alpha Phone 128GB": {"sku": "AP-128", "price": 499},
"Beta Tablet 11 inch": {"sku": "BT-11", "price": 799},
})
item = catalog.fuzzy_find_item("beta tab 11")
if item is not None:
print(item.key, item.value["sku"])
Tag validation
A set is useful when the matched value itself is enough. Duplicate tags are ignored, and fuzzy containment is explicit.
from rapidfuzz_collections import FuzzySet
allowed_tags = FuzzySet(["python", "machine learning", "data science"])
if allowed_tags.fuzzy_contains("machne learnig"):
print(allowed_tags.fuzzy_get("machne learnig"))
Immutable reference table
Frozen collections are appropriate for data that is loaded once and queried many times, such as a country lookup table or a command alias map.
from rapidfuzz_collections import FrozenFuzzyDict
countries = FrozenFuzzyDict({
"United States": "US",
"United Kingdom": "GB",
"Georgia": "GE",
})
print(countries.fuzzy_get("Unted Stats")) # "US"
Custom normalizer for structured values
The collection stores original objects unchanged. The normalizer controls only what text is searched.
from rapidfuzz_collections import FuzzyList
def person_normalizer(person):
if isinstance(person, dict):
return f"{person['first']} {person['last']}".casefold()
return None
people = FuzzyList(
[
{"first": "Ada", "last": "Lovelace", "id": 1},
{"first": "Grace", "last": "Hopper", "id": 2},
],
normalizer=person_normalizer,
)
match = people.fuzzy_find_one("grace hoppr")
print(match.value["id"]) # 2
Runnable Examples
The examples/ directory contains runnable scripts that demonstrate the library against real or inline data. Each script covers one collection class and one usage pattern.
Install the package in editable mode before running any script:
pip install -e .
python examples/<script>.py
See examples/README.md for the full list of scripts, setup instructions, and dataset licensing notes.
Lookup Model
Every top-one fuzzy lookup follows the same broad sequence:
- Normalize the query. If the normalizer returns
None, return no fuzzy match. - Score searchable candidates and apply the configured score cutoff.
- Rank accepted candidates by scorer quality: highest similarity or lowest distance.
- Among candidates with the same score, prefer a hashable stored value or key equal to the original query.
- Resolve any remaining tie by source position or insertion order.
Consequently, fuzzy_find_one(query) selects the same candidate as the first result of fuzzy_find_many(query, limit=1). The same rule is used by top-one retrieval and mutation methods such as fuzzy_get, fuzzy_discard, and fuzzy_remove.
For a compatible native RapidFuzz scorer, an exact candidate may return immediately when scorer metadata proves that its score is optimal. This is an optimization only; it does not change the ranking contract. Custom scorers without compatible metadata evaluate every searchable candidate needed to determine the best score.
from rapidfuzz_collections import FuzzyList
names = FuzzyList(["ALPHA", "alpha"], score_cutoff=0)
print(names.fuzzy_find_one("alpha").value) # "alpha"
print(names.fuzzy_find_many("alpha", limit=1)[0].value) # "alpha"
Both stored strings normalize to the same text and receive the same default score. The exact original value wins that score tie even though it appears later. For unhashable sequence values, no exact-value registry is available, so equal-score ties are resolved by source position.
The original collection data is not replaced by normalized data. Normalized choices are cached beside the collection to avoid repeating normalization work on every query.
The lookup domain depends on collection type:
| Collection type | Fuzzy search domain |
|---|---|
FuzzyList, FuzzyTuple |
stored values |
FuzzySet, FrozenFuzzySet |
stored values |
FuzzyDict, FrozenFuzzyDict |
mapping keys |
Mapping value lookup is intentionally key-based. To fuzzy-search mapping values, store those values in a value collection or create a separate mapping whose keys are the searchable values.
Index Strategies
FuzzyDict, FuzzySet, FrozenFuzzyDict, and FrozenFuzzySet accept a strategy parameter:
from rapidfuzz_collections import FuzzyDict, IndexStrategy
catalog = FuzzyDict(
{"Alpha Phone": 499, "Beta Tablet": 799},
strategy=IndexStrategy.SEQUENCE,
)
IndexStrategy.SEQUENCE
SEQUENCE stores normalized choices in sequence order. It is the default because it is the safest general read-heavy baseline in current benchmarks.
Use it when:
- you do not know the workload shape yet;
- point lookups and ordinary batch lookups dominate;
- you need the explicit
*_batch_cdistmethods; - you prefer the most predictable default.
IndexStrategy.KEYED
KEYED stores normalized choices keyed by each unique hashable value or key. It can reduce build cost or selected bulk mutation costs for dict/set domains, especially when normalized collisions are common. It also stores a canonical exact-value mapping so an equal-but-not-identical query returns the object actually held by the collection.
Try it when:
- the collection is a dict or set facade;
- keys or values are unique and hashable;
- build cost or selected mutation paths matter;
- fuzzy discard/retain operations are common;
- local benchmarks show a keyed win for your data.
KEYED is not a universal faster mode. Keep SEQUENCE as the baseline for large read-heavy collections unless your own measurements say otherwise. For mutable collections, KEYED generally uses more memory than SEQUENCE. Frozen KEYED collections can still reduce both build cost and memory.
Both strategies return the same public result classes. Dict and set facades are position-free: Match.index and MappingMatch.index are always None for those facades.
Neither strategy is a universal winner across all workloads. For the benchmark rows and reasoning behind the guidance above, see Why SEQUENCE is the default strategy and Why KEYED still exists in benchmarks/DESIGN.md.
Configuration
All collection facades accept these keyword-only options:
normalizer = None
scorer = WRatio
scorer_kwargs = None
scorer_type = ScorerType.SIMILARITY
score_cutoff = 80
score_hint = None
strategy = IndexStrategy.SEQUENCE # dict/set facades only
Normalization
The normalizer converts a stored object or query into searchable text. Return None to exclude a value or query from every fuzzy method, including contains, find-one, find-many, count, and fuzzy mutation methods. The original value remains available through ordinary exact collection operations such as in, indexing, or mapping lookup.
The default normalizer:
- accepts strings only;
- strips leading and trailing whitespace;
- applies
casefold(); - excludes strings shorter than three characters.
When normalizer=None, indexes use an optimized built-in callable with behavior equivalent to Normalizer.default(). A Normalizer instance can be supplied directly when a custom pipeline is needed. Treat any supplied normalizer as immutable after constructing a collection or index: stored choices are normalized and cached during index maintenance, so later mutation of the callable could make query normalization inconsistent with those cached values.
Normalizer builder methods mutate the instance by appending operations. Do not keep configuring an instance after passing it to a collection:
from rapidfuzz_collections import FuzzyList, Normalizer
normalizer = Normalizer().isinstance_str().strip()
products = FuzzyList([" Keyboard ", " Mouse "], normalizer=normalizer)
# Unsafe: cached choices used the old pipeline, while later queries use the
# mutated pipeline.
normalizer.casefold()
Instead, complete the pipeline first and then treat the callable as immutable:
normalizer = Normalizer().isinstance_str().strip().casefold()
products = FuzzyList([" Keyboard ", " Mouse "], normalizer=normalizer)
# Do not mutate normalizer after this point.
The same rule applies to any custom mutable or stateful callable. The caller is responsible for keeping its behavior stable for the lifetime of the collection or index. To change normalization rules, construct a new configured collection, for example with with_config(normalizer=...), instead of mutating the callable already in use.
Scorers
Native RapidFuzz scorers use its optimized process path. Custom scorers are also supported and are called directly, so they do not need to accept RapidFuzz's internal keyword arguments. ScorerType determines their ordering and cutoff semantics. Use ScorerType.SIMILARITY when higher scores are better, and ScorerType.DISTANCE when lower scores are better. Pass the enum member itself; raw values such as 0, 1, or "distance" are rejected instead of being interpreted implicitly.
An exact candidate returns immediately only when native RapidFuzz metadata confirms both the configured scorer direction and that candidate's optimal score. Otherwise, the lookup evaluates the candidates required to determine the best score. Exact equality then breaks equal-score ties; it never replaces a better scorer result. A custom scorer without compatible metadata evaluates every searchable candidate, so a non-exact value with a better custom score still wins.
from rapidfuzz.distance import Levenshtein
from rapidfuzz_collections import FuzzyList, ScorerType
words = FuzzyList(
["kitten", "sitting", "mitten"],
scorer=Levenshtein.distance,
scorer_type=ScorerType.DISTANCE,
score_cutoff=2,
)
Score cutoffs
score_cutoff controls which candidates are accepted:
- for similarity scorers, candidates below the cutoff are rejected;
- for distance scorers, candidates above the cutoff are rejected;
Nonedisables cutoff filtering.
Score hints
score_hint is forwarded to RapidFuzz as an expected score. It can help RapidFuzz choose an internal implementation path, but it does not change the semantic result. Leave it as None unless you have measured your workload.
Scorer kwargs
Use scorer_kwargs for scorer-specific options:
from rapidfuzz.distance import Levenshtein
from rapidfuzz_collections import FuzzyList, ScorerType
values = FuzzyList(
["kitten", "sitting"],
scorer=Levenshtein.distance,
scorer_type=ScorerType.DISTANCE,
scorer_kwargs={"weights": (1, 1, 2)},
score_cutoff=None,
)
with_config
with_config(...) returns a new collection over the same logical data with selected fuzzy options changed. The source collection is not mutated.
strict = FuzzyList(["Alpha Phone", "Beta Tablet"], score_cutoff=95)
permissive = strict.with_config(score_cutoff=60)
Per-query overrides
Every fuzzy lookup method also accepts scorer, scorer_kwargs, scorer_type, score_cutoff, and score_hint as keyword-only arguments. Passing one of them overrides the collection's default for that single call only; the collection itself is not changed, and the collection's own defaults still apply to every other call:
products = FuzzyList(["Alpha Phone", "Beta Tablet"], score_cutoff=90)
products.fuzzy_find_one("Alpa Phone") # uses score_cutoff=90
products.fuzzy_find_one("Alpa Phone", score_cutoff=60) # uses score_cutoff=60, just for this call
This is a lighter-weight alternative to with_config(...) when only a single query needs different matching behavior, since it avoids building a second collection. Omit an argument to keep using the collection's default; passing None for scorer_kwargs or score_cutoff is a meaningful value (no extra scorer kwargs / no cutoff filtering), not the same as omitting it.
When scorer is overridden without scorer_type, the score direction is inferred from compatible RapidFuzz metadata. Custom scorers without that metadata must provide scorer_type explicitly; otherwise the query raises ValueError instead of risking reversed ranking or cutoff semantics.
normalizer and strategy cannot be overridden per query because they affect how the index itself is built and searched. They remain fixed for the lifetime of a collection. Change them through with_config(...), which builds a new index, or by constructing a new collection.
Batch Lookup and cdist
Use ordinary batch methods first:
products = FuzzyList(["Alpha Phone", "Beta Tablet", "Gamma Watch"])
matches = products.fuzzy_find_one_batch([
"Alpa Phone",
"Bta Tablet",
"Missing",
])
Batch methods preserve query order. Top-one and direct retrieval methods return one result per query; multi-match methods return one result list per query. For each query, the same ranking order is applied: scorer quality, exact equality, then collection order.
Ordinary batch methods
| Collection family | Top-one batch method | Many-match batch method |
|---|---|---|
| Sequence | fuzzy_find_one_batch |
fuzzy_find_many_batch |
| Set | fuzzy_find_one_batch |
fuzzy_find_many_batch |
| Mapping | fuzzy_find_key_batch, fuzzy_find_item_batch |
fuzzy_find_keys_batch, fuzzy_find_items_batch |
Explicit cdist methods
The *_batch_cdist methods are advanced opt-in methods. They compute bounded query-by-choice matrix chunks using RapidFuzz process.cdist, immediately reduce each query to its top-one result, and return the same semantic result as the ordinary top-one batch methods.
They are useful only after measurement on your workload. The ordinary batch methods are the default because RapidFuzz extractOne can prune candidate scoring as it finds strong matches, while cdist computes all pairs in each matrix chunk.
| Collection family | cdist method | Result meaning |
|---|---|---|
| Sequence | fuzzy_find_one_batch_cdist |
best value match per query |
| Set | fuzzy_find_one_batch_cdist |
best value match per query |
| Mapping | fuzzy_find_key_batch_cdist |
best key match per query |
| Mapping | fuzzy_find_item_batch_cdist |
best key/value match per query |
| Standalone sequence indexes | find_one_batch_cdist |
best indexed value match per query |
Requirements and limits:
- install
rapidfuzz-collections[cdist]; - use
IndexStrategy.SEQUENCEfor dict/set facades; IndexStrategy.KEYEDraisesNotImplementedError;- custom scorers are adapted for RapidFuzz matrix calls while receiving only the explicitly configured
scorer_kwargs; - use RapidFuzz directly if you need the full score matrix.
Mutation and Rebuilds
Mutable collections keep an internal fuzzy index synchronized with exact collection storage.
Top-one fuzzy mutations use the same deterministic resolver as top-one reads: scorer quality is primary, exact equality breaks equal-score ties, and source or insertion order breaks remaining ties. Thus fuzzy_discard(query) removes the value that fuzzy_find_one(query) would return, while fuzzy_discard_all and fuzzy_retain_all operate on every candidate that passes the score cutoff.
Some mutations can update the index incrementally. Other mutations mark the index dirty, and the next fuzzy query rebuilds derived lookup state once.
Practical rules:
- Appending to
FuzzyListis cheap. - Adding to
FuzzySetis cheap when the value is new. - Updating an existing
FuzzyDictvalue does not change the fuzzy key index. - Positional insertions, replacements, and large complex deletions are more likely to require a rebuild.
For measured rebuild cost after incremental deletion, see Exact shortcuts after incremental deletion in benchmarks/DESIGN.md.
Result Objects
Match[T]
Returned by value collections, set collections, mapping key methods, and standalone sequence indexes.
| Field | Meaning |
|---|---|
value |
original matched value or key |
score |
RapidFuzz scorer output |
index |
source position, or None for dict/set facades |
query |
original query object |
normalized_query |
normalized query text |
normalized_value |
normalized matched value/key text |
MappingMatch[K, V]
Returned by mapping item methods.
| Field | Meaning |
|---|---|
key |
original matched mapping key |
value |
payload stored under that key |
score |
RapidFuzz scorer output |
index |
always None for mapping facades |
query |
original query object |
normalized_query |
normalized query text |
normalized_key |
normalized matched key text |
Scores are scorer-dependent. For ScorerType.SIMILARITY, higher is better. For ScorerType.DISTANCE, lower is better.
ValueMatch[T] and KeyValueMatch[K, V]
Returned by the standalone ImmutableFuzzyKeyedIndex/MutableFuzzyKeyedIndex classes described in Advanced Index APIs. Keyed indexes do not track sequence positions, so these result types have no index field at all, rather than index=None. Collection facades built on a keyed index adapt these results to Match/MappingMatch with index=None.
ValueMatch[T]:
| Field | Meaning |
|---|---|
value |
original collection value |
score |
RapidFuzz scorer output |
query |
original query object |
normalized_query |
normalized query text |
normalized_value |
normalized form of matched value |
KeyValueMatch[K, V]:
| Field | Meaning |
|---|---|
key |
original matched mapping key |
value |
payload stored under that key |
score |
RapidFuzz scorer output |
query |
original query object |
normalized_query |
normalized query text |
normalized_key |
normalized form of matched key |
Public Method Reference
| Method | FuzzyList |
FuzzyTuple |
FuzzySet |
FrozenFuzzySet |
FuzzyDict |
FrozenFuzzyDict |
Notes |
|---|---|---|---|---|---|---|---|
fuzzy_find_one(query) |
yes | yes | yes | yes | no | no | best value match |
fuzzy_find_many(query, limit=5) |
yes | yes | yes | yes | no | no | best value matches |
fuzzy_find_index(query) |
yes | yes | no | no | no | no | source index of the best value match |
fuzzy_count(query) |
yes | yes | no | no | no | no | number of matching sequence values |
fuzzy_get(query, default=None) |
yes | yes | yes | yes | yes | yes | direct value/payload retrieval |
fuzzy_get_batch(queries, default=None) |
yes | yes | yes | yes | yes | yes | direct batch value/payload retrieval |
fuzzy_contains(query) |
yes | yes | yes | yes | no | no | fuzzy value containment |
fuzzy_contains_key(query) |
no | no | no | no | yes | yes | fuzzy key containment |
fuzzy_find_key(query) |
no | no | no | no | yes | yes | best key match |
fuzzy_find_item(query) |
no | no | no | no | yes | yes | best key/value match |
fuzzy_find_keys(query, limit=5) |
no | no | no | no | yes | yes | best key matches |
fuzzy_find_items(query, limit=5) |
no | no | no | no | yes | yes | best key/value matches |
fuzzy_find_one_batch(queries) |
yes | yes | yes | yes | no | no | ordinary top-one batch |
fuzzy_find_many_batch(queries, limit=5) |
yes | yes | yes | yes | no | no | ordinary many-match batch |
fuzzy_find_key_batch(queries) |
no | no | no | no | yes | yes | ordinary key batch |
fuzzy_find_item_batch(queries) |
no | no | no | no | yes | yes | ordinary item batch |
fuzzy_find_keys_batch(queries, limit=5) |
no | no | no | no | yes | yes | ordinary many-key batch |
fuzzy_find_items_batch(queries, limit=5) |
no | no | no | no | yes | yes | ordinary many-item batch |
fuzzy_find_one_batch_cdist(queries) |
yes | yes | yes | yes | no | no | advanced top-one batch, sequence strategy only |
fuzzy_find_key_batch_cdist(queries) |
no | no | no | no | yes | yes | advanced key batch, sequence strategy only |
fuzzy_find_item_batch_cdist(queries) |
no | no | no | no | yes | yes | advanced item batch, sequence strategy only |
fuzzy_score_all(query) |
yes | yes | yes | yes | yes | yes | one score slot per stored item/key |
fuzzy_iter_scores(query) |
yes | yes | yes | yes | yes | yes | streaming score slots |
fuzzy_discard(query) |
yes | no | yes | no | yes | no | remove best fuzzy match |
fuzzy_remove(query) |
yes | no | no | no | no | no | list-only remove with error on miss |
fuzzy_discard_all(query) |
yes | no | yes | no | yes | no | remove all fuzzy matches |
fuzzy_retain_all(query) |
yes | no | yes | no | yes | no | keep only fuzzy matches |
with_config(**overrides) |
yes | yes | yes | yes | yes | yes | return reconfigured collection |
fromkeys(keys, value=None, **config) |
no | no | no | no | yes | yes | mapping factory |
Performance Guidance
Start with the defaults:
- default normalizer;
WRatioscorer;score_cutoff=80;IndexStrategy.SEQUENCE;- ordinary batch methods instead of
cdist.
Measure before switching:
- Try
IndexStrategy.KEYEDfor dict/set workloads dominated by construction cost, normalized collisions, or selected bulk fuzzy mutations. Treat lower memory as a possible frozen-collection benefit, not a general KEYED property. - Try
*_batch_cdistonly for large batch workloads where scorer choice and query distribution make full matrix scoring worthwhile. - Try
score_hintonly when a specific scorer/data distribution benefits from it.
Do not assume that lower-level RapidFuzz APIs are always faster through the facades. Collection lookup includes exact-value tie resolution, normalization caches, result adaptation, and mutation state.
For the benchmark data and reasoning behind this guidance, see Practical strategy rules in benchmarks/DESIGN.md.
Advanced Index APIs
Most users should use collection facades. Standalone indexes are available for advanced users who already manage storage separately:
FuzzySequenceIndexMutableFuzzySequenceIndexImmutableFuzzyKeyedIndexMutableFuzzyKeyedIndex
Use standalone indexes when:
- you do not need a collection facade;
- you can keep exact storage and index storage synchronized yourself;
- you need lower-level access to index lookup behavior.
Do not expose a collection's internal index and mutate it separately. That would desynchronize the collection's exact storage from its fuzzy lookup state.
The keyed index classes return ValueMatch/KeyValueMatch results; see ValueMatch[T] and KeyValueMatch[K, V].
Design Boundaries
This library is intentionally not:
- a full-text search engine;
- a database index;
- a sublinear approximate nearest-neighbor index;
- a replacement for RapidFuzz scorers and distance functions;
- a general matrix-scoring wrapper around
process.cdist; - a compatibility layer for historical APIs.
It is a collection-oriented layer over RapidFuzz:
- store original data in familiar Python collection shapes;
- cache normalized lookup data;
- keep fuzzy lookup state synchronized with mutations;
- expose predictable fuzzy result objects.
When in doubt, first choose the collection that matches your data model. Then choose configuration and strategy based on measured workload behavior.
Development Checks
pip install -e ".[dev]"
python -m ruff format --check .
python -m ruff check .
python -m pytest -q
See tests/README.md for the test-suite structure and local coverage commands.
Third-party Example Datasets
The repository includes third-party example datasets under examples/data/ for documentation, examples, and local experimentation.
These datasets are not part of the Python package distribution and are not covered by this project's source code license. See examples/data/NOTICE.md for dataset sources, licenses, attribution, and modification notes.
License
This project is licensed under the MIT License. See LICENSE.
For the design rationale, benchmark methodology, and historical investigation findings behind the index-strategy and performance guidance in this README, see benchmarks/DESIGN.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rapidfuzz_collections-1.0.0.tar.gz.
File metadata
- Download URL: rapidfuzz_collections-1.0.0.tar.gz
- Upload date:
- Size: 98.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b80aad937432b0407a0dd6917e627f08be62b25c3ebc4bb7b612eb2aa7559964
|
|
| MD5 |
058582940680457fb89f80b8f0550ac7
|
|
| BLAKE2b-256 |
25da0aa27cdb6dfd2fa03cd6d8918b44d1bd210ba19cba80bf13c4ec9e676988
|
Provenance
The following attestation bundles were made for rapidfuzz_collections-1.0.0.tar.gz:
Publisher:
ci.yml on igorxut/rapidfuzz-collections
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rapidfuzz_collections-1.0.0.tar.gz -
Subject digest:
b80aad937432b0407a0dd6917e627f08be62b25c3ebc4bb7b612eb2aa7559964 - Sigstore transparency entry: 2064170680
- Sigstore integration time:
-
Permalink:
igorxut/rapidfuzz-collections@ec08fbe549fa3ec3fcc2b81d5c26a9e2adab581e -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/igorxut
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@ec08fbe549fa3ec3fcc2b81d5c26a9e2adab581e -
Trigger Event:
push
-
Statement type:
File details
Details for the file rapidfuzz_collections-1.0.0-py3-none-any.whl.
File metadata
- Download URL: rapidfuzz_collections-1.0.0-py3-none-any.whl
- Upload date:
- Size: 89.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6de6a63957a49046b1ea94c8c570bfb3d85902f191003a7e3dc703c6fadafa3
|
|
| MD5 |
b74ee3e69d71b5ac35310f9452470cbe
|
|
| BLAKE2b-256 |
b8ea45cc387a7a8d8b30e30e63a0e21eed4c0e143fd057b0bddd3276c59979fd
|
Provenance
The following attestation bundles were made for rapidfuzz_collections-1.0.0-py3-none-any.whl:
Publisher:
ci.yml on igorxut/rapidfuzz-collections
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rapidfuzz_collections-1.0.0-py3-none-any.whl -
Subject digest:
a6de6a63957a49046b1ea94c8c570bfb3d85902f191003a7e3dc703c6fadafa3 - Sigstore transparency entry: 2064170695
- Sigstore integration time:
-
Permalink:
igorxut/rapidfuzz-collections@ec08fbe549fa3ec3fcc2b81d5c26a9e2adab581e -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/igorxut
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@ec08fbe549fa3ec3fcc2b81d5c26a9e2adab581e -
Trigger Event:
push
-
Statement type: