Skip to main content

Ultra-fast file search and processing tool

Project description

qry

CI License: Apache-2.0 Python

Ultra-fast file search and metadata extraction tool.

Features

  • Fast filesystem search with optional depth, date, and size filters
  • Optimized performance — caching, parallel processing, date-based directory pruning
  • CLI modes: search, interactive, batch, version
  • HTTP API (FastAPI) for JSON and HTML search responses
  • Metadata extraction for matched files (size, timestamps, content type)
  • Streaming results — Ctrl+C stops search mid-way and outputs what was found so far
  • Smart directory exclusions.git, .venv, __pycache__, dist, node_modules skipped by default
  • YAML, JSON, and paths output — machine-readable output for piping into other tools
  • Python APIimport qry; qry.search(...) for use in other applications
  • Regex search--regex flag for pattern matching in filenames and content
  • Size filtering--min-size / --max-size with human-readable units (1k, 10MB, 1G)
  • Sort results--sort name|size|date
  • Content preview--preview shows matching line with context for content search
  • Date filtering--last-days, --after-date, --before-date for time-based searches
  • Parallel search-w/--workers for multi-threaded directory processing
  • Priority-based search — searches important directories first (src/, tests/), then lower priority (cache/, .git/)
  • Incremental results — shows results as they're found with timeout-based priority fallback

Installation

Poetry (recommended)

poetry install --with dev

pip (minimal)

pip install -r requirements.txt

Quick start

# search in current directory (filename match, YAML output)
poetry run qry "invoice"

# search file contents with preview snippet
poetry run qry "def search" -c -P --path ./qry

# regex search, sorted by name
poetry run qry "\.py$" -r --sort name --path .

# pipe-friendly output for shell pipelines
poetry run qry "TODO" -c -o paths | xargs grep -n "FIXME"

# show version and engines
poetry run qry version

CLI usage

qry search [query ...] [-f] [-c] [-r] [-P] [--type EXT1,EXT2] [--scope PATH | --path PATH]
           [--depth N] [--last-days N] [--limit N] [--min-size SIZE] [--max-size SIZE]
           [--sort name|size|date] [--exclude DIR] [--no-exclude]
           [--output yaml|json|paths]

qry interactive
qry batch <input_file> [--output-file FILE] [--format text|json|csv] [--workers N]
qry version

Search mode flags

Flag Long form Searches
(none) filename (default)
-f --filename filename only
-c --content file contents
-r --regex treat query as regular expression

Filtering flags

Flag Description
-t EXT Filter by file type (comma-separated)
-d N Max directory depth
-l N Limit results (0 = unlimited, default)
--last-days N Files modified in last N days
--after-date YYYY-MM-DD Files modified after date
--before-date YYYY-MM-DD Files modified before date
-w N Workers for parallel search (default: 4)
--min-size SIZE Minimum file size (e.g. 1k, 10MB, 1G)
--max-size SIZE Maximum file size (e.g. 100k, 5MB)
-e DIR Exclude extra directory (repeatable, comma-separated)
--no-exclude Disable all default exclusions

Output flags

Flag Description
-o yaml YAML output (default)
-o json JSON output
-o paths One path per line — pipe-friendly
-P --preview — show matching line with context (with -c)
--sort Sort results by name, size, or date

Default excluded directories: .git .venv __pycache__ dist node_modules .tox .mypy_cache

Priority-based search

When priority mode is enabled, directories are searched in order of importance:

Priority Value Directories
SOURCE 100 src/, source/, lib/, code/
PROJECT 90 tests/, test/, docs/, scripts/, examples/
CONFIG 80 config/, .config/, settings/
MAIN 70 main/, app/, core/, server/, client/
MODULES 60 modules/, module/, components/, packages/, plugins/
UTILS 50 utils/, helpers/, tools/
BUILD 40 build/, dist/, out/, target/, release/
CACHE 30 cache/, __pycache__/, node_modules/, .pytest_cache/
TEMP 20 temp/, tmp/, .tmp/
GENERATED 10 generated/, compiled/, bin/, obj/
EXCLUDED 0 .git/, .svn/, .venv/, venv/, .idea/, .vscode/

This ensures that important directories (source code) are searched first, while cache and temporary directories are searched last.

Examples

# search by filename (default)
poetry run qry "invoice"

# search inside file contents — press Ctrl+C to stop early
poetry run qry "def search" -c
poetry run qry "TODO OR FIXME" -c --type py --path ./src

# regex search for Python files
poetry run qry "\.py$" -r --sort name -s qry/

# content search with preview snippet
poetry run qry "search" -c -P --sort name -s qry/ -d 2

# filter by file size
poetry run qry "" --min-size 10k --max-size 1MB --sort size

# JSON output for piping
poetry run qry "invoice" -o json | jq '.results[]'

# pipe-friendly: one path per line
poetry run qry "TODO" -c -o paths | xargs grep -n "FIXME"
poetry run qry "invoice" -o paths | xargs -I{} cp {} /backup/

# exclude extra directories
poetry run qry "config" -e build -e ".cache"

# disable all exclusions (search everything)
poetry run qry "config" --no-exclude

# combine scope/depth/type/date
poetry run qry "invoice OR faktura" --scope /data/docs --depth 3
poetry run qry search "report" --type pdf,docx --last-days 7
poetry run qry batch queries.txt --format json --output-file results.json

# date filtering examples
poetry run qry "report" --last-days 30          # files modified in last 30 days
poetry run qry "invoice" --after-date 2026-01-01    # files after Jan 1, 2026
poetry run qry "invoice" --before-date 2025-12-31  # files before Dec 31, 2025
poetry run qry "report" --after-date 2026-01-01 --before-date 2026-02-01  # date range

Python API

Use qry directly from Python — no subprocess needed:

import qry

# Return all matching file paths as a list
files = qry.search("invoice", scope="/data/docs", mode="content", depth=3)

# Stream results one at a time (memory-efficient, supports Ctrl+C)
for path in qry.search_iter("TODO", scope="./src", mode="content"):
    print(path)

# Regex search with sorting
py_files = qry.search(r"test_.*\.py$", scope=".", regex=True, sort_by="name")

# Size filtering — find large files
big = qry.search("", scope=".", min_size=1024*1024, sort_by="size")

# Custom exclusions
files = qry.search("config", exclude_dirs=[".git", "build", ".venv"])

Parameters for both qry.search() and qry.search_iter():

Parameter Type Default Description
query_text str Text to search for
scope str "." Directory to search
mode str "filename" "filename", "content", or "both"
depth int|None None Max directory depth
file_types list|None None Extensions to include, e.g. ["py","txt"]
exclude_dirs list|None None Dir names to skip (None = use defaults)
max_results int unlimited Hard cap on results
min_size int|None None Minimum file size in bytes
max_size int|None None Maximum file size in bytes
regex bool False Treat query as regular expression
sort_by str|None None Sort by "name", "size", or "date"
date_range tuple|None None Date range as (start_date, end_date) datetime tuples

Example with date filtering:

from datetime import datetime, timedelta
import qry

# Files modified in last 30 days
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
files = qry.search("invoice", date_range=(start_date, end_date))

# Files modified in 2026
files = qry.search("report", date_range=(datetime(2026, 1, 1), datetime(2026, 12, 31)))

HTTP API usage

Run server:

poetry run qry-api --host 127.0.0.1 --port 8000

Main endpoints:

  • GET /api/search
  • GET /api/search/html
  • GET /api/engines
  • GET /api/health
  • OpenAPI docs: GET /api/docs

Development

Run tests

poetry run pytest -q

Useful make targets

make install
make test
make lint
make type-check
make run-api

Project structure

  • qry/cli/ – CLI commands and interactive mode
  • qry/api/ – FastAPI application and routes
  • qry/core/ – core data models
  • qry/engines/ – search engine implementations
  • qry/web/ – HTML renderer/templates integration
  • tests/ – test suite

Additional docs

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qry-0.2.12.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qry-0.2.12-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file qry-0.2.12.tar.gz.

File metadata

  • Download URL: qry-0.2.12.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qry-0.2.12.tar.gz
Algorithm Hash digest
SHA256 9ba585bf958159f649f65ea00c8f8193effff4fbe44b94ee3e0b3b1e92822070
MD5 40fce1f57289c0b988fe5e5eeb8507ff
BLAKE2b-256 b3ba615d09bc1958f6a24e8b18d590a9d5fad29af158be5a5d604bc36d8e117e

See more details on using hashes here.

File details

Details for the file qry-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: qry-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qry-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 f3b970b23de50b7be370c38a49d3d89a5ad73632052d757d48246cf3e5f378f8
MD5 918dabb4d9ef4ed7d8636995fb9ecf91
BLAKE2b-256 1e914871db4370220cfe38b8650da688ed2f7d6cf1d2cc1c23d3d279eb44f2aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page