Skip to main content

In-process fzf/skim-style fuzzy finder for Python, implemented in Rust.

Project description

skimmatch

skimmatch is an in-process fzf/skim-style fuzzy finder for Python, implemented in Rust.

It is designed for ranked abbreviation matching over a fixed list of candidate strings. You give it strings such as filenames, references, titles, symbols, or command labels; users type short abbreviation-style queries; skimmatch returns the best candidates, scores, and optional highlight positions.

from skimmatch import Matcher

candidates = [
    "Follmer and Schied, Stochastic Finance, 2011",
    "Mildenhall and Major, Pricing Insurance Risk",
    "Wang distortion risk measures",
    "Archive reference catalogue",
]

matcher = Matcher(candidates)

for result in matcher.search("wang distortion", limit=3):
    print(result)

Example result:

{
    "index": 2,
    "score": 260,
    "text": "Wang distortion risk measures",
    "matches": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10],
}

Scores are backend scores where higher is better. The exact numeric value should be treated as ranking information, not as a stable cross-version metric.

What This Is

skimmatch solves the same broad problem as interactive fuzzy finders such as fzf and skim: finding good abbreviation matches quickly.

For example, a query like:

fs sf 2011

can match:

Follmer and Schied, Stochastic Finance, 2011

because the query characters and tokens appear in useful positions and in the right order.

This is different from edit-distance fuzzy matching. Libraries such as RapidFuzz, Levenshtein, or token-ratio matchers are excellent for typo correction, deduplication, OCR cleanup, and record linkage. skimmatch is aimed at fast candidate selection, interactive search, and highlightable abbreviation matching.

Features

  • In-process Python extension: no external fzf executable required.
  • Rust matching backends using SkimMatcherV2, nucleo-matcher, and frizbee.
  • Preloaded candidate lists for fast repeated queries.
  • Single-token and multi-token search modes.
  • Optional highlight indices for UI rendering.
  • Legacy tuple-returning APIs for compatibility with the earlier rustfuzz shape.
  • Structured Matcher.search(...) API for new code.
  • Backend argument already present, so future backends can be added without changing the public matcher classes.

Installation

When published on PyPI:

pip install skimmatch

From a local checkout:

uv pip install -e .

or build with maturin:

uv run maturin develop

The current package metadata targets Python 3.13 or newer.

Quick Start

Use Matcher for new code.

from skimmatch import Matcher

candidates = [
    "Buhlmann, Mathematical Methods in Risk Theory",
    "Cramer, Collective Risk Theory",
    "Mildenhall and Major, Pricing Insurance Risk",
    "Kaas, Goovaerts, Dhaene, and Denuit, Modern Actuarial Risk Theory",
]

matcher = Matcher(candidates)
results = matcher.search("risk theory", limit=5)

for result in results:
    print(result["index"], result["score"], result["text"])

By default, search:

  • splits the query on whitespace;
  • requires every query token to match;
  • returns up to 20 results;
  • includes candidate text;
  • includes highlight positions.

Structured API

matcher = Matcher(candidates, backend="nucleo")  # or "skim" or "frizbee"
results = matcher.search(
    query,
    limit=20,
    highlights=True,
    include_text=True,
    multi=True,
)

Each result is a dictionary containing:

{
    "index": 0,          # original candidate index
    "score": 123,       # backend score, higher is better
    "text": "...",      # included when include_text=True
    "matches": [0, 3],  # included when highlights=True
}

Parameters

query

The search string. In multi-token mode, whitespace-separated tokens are matched independently and every token must match the candidate.

limit

The maximum number of results to return. limit=0 returns an empty list.

highlights

When true, results include matches, a sorted and deduplicated list of matched positions. Turn this off when you only need ranking; score-only matching does less work.

include_text

When true, each result includes the original candidate string. Turn this off if you already have the candidate list and want smaller result objects.

multi

When true, the query is split on whitespace and all tokens are required. When false, the whole query is sent to the matcher as one pattern.

Legacy APIs

The package also exports compatibility classes with tuple return shapes:

from skimmatch import FuzzyMatcher, FuzzyMatcherMulti, FuzzyMatcherMultiHi

FuzzyMatcher

Treats the whole query as one pattern.

matcher = FuzzyMatcher(candidates)
indices, scores = matcher.query("sf", top_k=10)

FuzzyMatcherMulti

Splits the query on whitespace. Every token must match.

matcher = FuzzyMatcherMulti(candidates)
indices, scores = matcher.query("pricing insurance", top_k=10)

FuzzyMatcherMultiHi

Like FuzzyMatcherMulti, but also returns highlight positions.

matcher = FuzzyMatcherMultiHi(candidates)
indices, scores, highlights = matcher.query("pricing insurance", top_k=10)

Matching Behavior

The available backends are:

backend="skim"
backend="nucleo"
backend="frizbee"

backend="skim" uses SkimMatcherV2 from the Rust fuzzy-matcher crate and is kept for compatibility.

backend="nucleo" uses nucleo-matcher, the lower-level matcher from the nucleo ecosystem. It is the default backend. It is a modern fzf-like backend and may rank candidates differently from skim. Scores are backend-specific and should not be compared between backends.

backend="frizbee" uses frizbee, a SIMD matcher with typo-resistant matching support. skimmatch currently runs it with typo tolerance disabled for a closer comparison with the other fzf-style backends. It matches against bytes, so highlight lists are intentionally empty for this backend until Unicode offset semantics are defined.

Good matches tend to reward:

  • characters appearing in order;
  • compact alignments;
  • word-boundary matches;
  • punctuation-separated and camel-case transitions;
  • early matches;
  • consecutive query-character matches;
  • candidates that match every query token in multi-token mode.

skimmatch returns candidates sorted by descending score. Ties are ordered by the original candidate index for deterministic output.

When To Use It

skimmatch is a good fit for:

  • command palettes;
  • file pickers;
  • bibliography and reference search;
  • symbol search;
  • autocomplete over known labels;
  • terminal or web UI candidate selection;
  • fast repeated queries over a preloaded list.

It is probably not the right tool for:

  • typo correction;
  • deduplication;
  • record linkage;
  • token-sort similarity;
  • OCR cleanup;
  • semantic search;
  • embedding-based retrieval.

Those are useful problems, but they are different from fzf/skim-style abbreviation matching.

Performance Notes

Candidate strings are copied into Rust once when the matcher is constructed. Repeated calls to query or search scan that Rust-owned list and return only the final top results to Python.

For best performance:

  • construct one matcher and reuse it across queries;
  • set highlights=False when you only need indices and scores;
  • set include_text=False when you already have the candidate strings;
  • use limit to keep returned result objects small.

Development

This project is a Python package with a Rust extension built by maturin.

Run the tests:

uv run pytest tests/test_skimmatch.py -q

Check Rust formatting:

cargo fmt --check

Important files:

  • src/lib.rs: Rust/PyO3 extension implementation.
  • python/skimmatch/__init__.py: Python re-exports.
  • tests/test_skimmatch.py: API and behavior tests.
  • pyproject.toml: Python packaging and maturin configuration.
  • Cargo.toml: Rust crate configuration.

Backend Roadmap

The public API accepts a backend argument. Today "skim", "nucleo", and "frizbee" are implemented. frizbee is experimental and currently exposes score/ranking behavior without highlight positions.

Unknown backend names currently raise ValueError.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skimmatch-0.2.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

skimmatch-0.2.0-cp314-cp314-win_amd64.whl (269.2 kB view details)

Uploaded CPython 3.14Windows x86-64

skimmatch-0.2.0-cp313-cp313-win_amd64.whl (269.4 kB view details)

Uploaded CPython 3.13Windows x86-64

File details

Details for the file skimmatch-0.2.0.tar.gz.

File metadata

  • Download URL: skimmatch-0.2.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for skimmatch-0.2.0.tar.gz
Algorithm Hash digest
SHA256 33301d0b5a062b0479f5dffe02ca1b456692f87c0f65192ac3d92ca09292c618
MD5 e9b56c2d7c94d85f9e7b9ec7056b0349
BLAKE2b-256 6bf0ca14d5adf150f80359670e9d05cdcfc536f42bcfa5884ced82f2fa7fa765

See more details on using hashes here.

File details

Details for the file skimmatch-0.2.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: skimmatch-0.2.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 269.2 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for skimmatch-0.2.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 195a4d257c4636d88020e9428d32b3664dd1984c5d70d32ca8f4e34764a0c61e
MD5 fbd142da8fcc0dc8ace97b5fa9d5da89
BLAKE2b-256 3aa6d703538c0505fcfd556a39582d4229491f610db25547ea3eaae66f2ba4a3

See more details on using hashes here.

File details

Details for the file skimmatch-0.2.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: skimmatch-0.2.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 269.4 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for skimmatch-0.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 a2c8ad972388c6deac42f4aa3ea1568224515e45dae25531dbc56acbfed9a93e
MD5 dabd5c74f5a1bd6f9846ae93aee24154
BLAKE2b-256 3acc103ffb7f87db07424021f437a212b5a0dc16014e8533d818d0a023affcb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page