Skip to main content

Composable Python search with per-field match strategies and a Q expression DSL

Project description

Srxy

Smart, composable search for Python — and your filesystem.

Pass any list of objects (dicts, dataclasses, Pydantic models) and find what you mean, not just what you typed. Fuzzy, phonetic, and composite matching out of the box. Search files by name or content from Python or the terminal.

pip install srxy

Why Srxy?

Magic search One function call. Auto-discovers fields, blends matchers, ranks by score.
Field search + AND/OR Per-field strategies with a fluent Q DSL — combine conditions with & and |.
File search + CLI Search paths by file name and/or content. Same smart matching, plus a srxy command.

Magic search

The fastest path to good results. magic_search auto-discovers fields from your items, runs composite matching on each, and keeps the best score (OR semantics). Typos, phonetic near-misses, and partial matches are handled for you.

from srxy import magic_search

items = [
    {"name": "salt"},
    {"name": "salty"},
    {"name": "salad"},
]

# Match across specific fields
results = magic_search(items, "salat", fields=["name"])
print(results[0].item["name"])  # salad
print(results[0].score)

# Or search every discoverable field (default)
results = magic_search(items, "salat")

Works with dicts, dataclasses, and Pydantic models. Default threshold is 0.25; tune it when you need stricter or looser matches.


Field search with AND / OR

When you need precision, use search with the Q expression DSL. Pick a match strategy per field, then wire them together with boolean logic.

from srxy import search, Q, FieldConfig, MatchType

# OR — match if any field scores well
search(items, "salat", where=Q.composite("name") | Q.contains("tags"))

# AND — every branch must clear the threshold
search(items, "spatial", where=Q.all(Q.composite("name"), Q.exact("status")))

# Nested — (sku OR barcode) AND label
search(
    items,
    "ABC-123",
    where=Q.any(Q.exact("sku"), Q.exact("barcode")) & Q.exact("label"),
)

Boolean scoring: OR uses max(child scores), AND uses min(child scores).

Prefer explicit config over the DSL? Pass a list of FieldConfig instead:

search(
    people,
    "engineer",
    fields=[
        FieldConfig("role", MatchType.EXACT, weight=2.0),
        FieldConfig("name", MatchType.CONTAINS, weight=1.0),
    ],
    threshold=0.5,
)

File search

Search filesystem paths by file name, file content, or both — no ML required. Directories are walked recursively. By default, dot-prefixed hidden entries and noise folders (__pycache__, node_modules) are skipped. Content search scores each line and returns matching line numbers.

Supported content formats: plain text, .pdf, .docx, .xlsx, and .pptx (text extracted automatically).

from pathlib import Path
from srxy import magic_file_search

results = magic_file_search(Path("./src"), "registry", threshold=0.3)
for result in results:
    print(result.path, result.score, result.breakdown)
    for line in result.lines:
        print(f"  line {line.line_number}: {line.text}")

# Include hidden directories and files (e.g. .git)
results = magic_file_search(Path("."), "token", skip_hidden_folders=False)

# Include noise directories (e.g. __pycache__, node_modules)
results = magic_file_search(Path("."), "token", skip_noise_folders=False)

# Search everywhere — disable both skip flags
results = magic_file_search(
    Path("."),
    "token",
    skip_hidden_folders=False,
    skip_noise_folders=False,
)

CLI

Search from the terminal after install:

# Search names and contents (grouped output)
srxy registry ./src

# Content only — shows line numbers
srxy revenue ./docs --content-only

# Flat, pipe-friendly output
srxy token ./src --format flat

# JSON for scripting
srxy budget . --json

# Search hidden directories and files (e.g. .git)
srxy token . --include-hidden

# Search noise directories (e.g. __pycache__, node_modules)
srxy token . --include-noise

# Search everywhere
srxy token . --include-hidden --include-noise

Options: --names-only, --content-only, --include-hidden, --include-noise, --threshold, --max-file-size, --max-line-matches, --semantic (opt-in ML). Exit codes: 0 matches found, 1 no matches, 2 usage/path error.


Match types

Type Behavior
EXACT Case-insensitive full string equality
CONTAINS Substring match
PARTIAL Prefix or suffix match
FUZZY Character-level similarity (rapidfuzz)
PHONETIC Sounds-alike (metaphone, soundex, NYSIIS with graduated scoring)
SEMANTIC Meaning similarity (optional; see below)
COMPOSITE Weighted blend of available atomic matchers (default smart mode)

Default composite weights: fuzzy 35%, semantic 20%, partial 15%, phonetic 12%, contains 10%, exact 8%. When semantic is disabled, composite skips it and renormalizes the remaining weights. Override per field via composite_weights on Q.composite(...) or FieldConfig.


Semantic matching (optional)

Semantic search is off by default. Opt in when you need meaning-based similarity:

export SRXY_SEMANTIC=1
pip install 'srxy[semantic]'

With SRXY_SEMANTIC=1, composite matching includes semantic similarity. Explicit Q.semantic(...) or MatchType.SEMANTIC raises a clear error if semantic is not enabled.

Default model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (downloaded from Hugging Face on first use). For a local cache:

./scripts/download_semantic_model.sh
export SRXY_SEMANTIC_MODEL_PATH=~/.cache/srxy/semantic-model

Core dependencies (always installed): rapidfuzz and jellyfish (phonetic matching).


Development

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,semantic]"
./scripts/quality/checks.sh --fix
./scripts/quality/checks.sh

Quality gate: Ruff → ShellCheck/shfmt → basedpyright → pip-audit → build → pytest.

  • Local (./scripts/quality/checks.sh): runs all tests (unit + integration).
  • CI: runs only pytest -m unit (fast tests; no semantic model required).

Integration tests (requires pip install -e ".[semantic]" and SRXY_SEMANTIC=1, set automatically in tests/integration/conftest.py):

pytest -m integration

Integration tests load a curated news-style corpus from tests/fixtures/search_corpus.json and measure top-k hit rates.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srxy-1.0.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srxy-1.0.0-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file srxy-1.0.0.tar.gz.

File metadata

  • Download URL: srxy-1.0.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for srxy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 667ae0ae1093c5a40dfac9fef8f0fb38d8ae8ef6f88748bece63c4681d280e43
MD5 18d9287c4c3ab9450e4034041170d55b
BLAKE2b-256 338911cfecd8ae1f026a0cd236ee09e720e2a46189f090072fe774bfa699290b

See more details on using hashes here.

File details

Details for the file srxy-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: srxy-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for srxy-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5c13b7feb8b6225a056214069aa9405f65783412200e3290842afbfeb18fee3
MD5 2d9a4584101e18b2d754b7657d3981d2
BLAKE2b-256 ddd4948b4ee75311e27e9204a08e06de454193b6989edef2a363eab345e15c84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page