Skip to main content

Ultra-fast file search and processing tool

Project description

qry

CI License: Apache-2.0 Python

Ultra-fast file search and metadata extraction tool.

Features

  • Fast filesystem search with optional depth, date, and size filters
  • Optimized performance — caching, parallel processing, date-based directory pruning
  • CLI modes: search, interactive, batch, version
  • HTTP API (FastAPI) for JSON and HTML search responses
  • Metadata extraction for matched files (size, timestamps, content type)
  • Streaming results — Ctrl+C stops search mid-way and outputs what was found so far
  • Smart directory exclusions.git, .venv, __pycache__, dist, node_modules skipped by default
  • YAML, JSON, and paths output — machine-readable output for piping into other tools
  • Python APIimport qry; qry.search(...) for use in other applications
  • Regex search--regex flag for pattern matching in filenames and content
  • Size filtering--min-size / --max-size with human-readable units (1k, 10MB, 1G)
  • Sort results--sort name|size|date
  • Content preview--preview shows matching line with context for content search
  • Date filtering--last-days, --after-date, --before-date for time-based searches
  • Parallel search-w/--workers for multi-threaded directory processing
  • Priority-based search — searches important directories first (src/, tests/), then lower priority (cache/, .git/)
  • Incremental results — shows results as they're found with timeout-based priority fallback

Installation

Poetry (recommended)

poetry install --with dev

pip (minimal)

pip install -r requirements.txt

Quick start

# search in current directory (filename match, YAML output)
poetry run qry "invoice"

# search file contents with preview snippet
poetry run qry "def search" -c -P --path ./qry

# regex search, sorted by name
poetry run qry "\.py$" -r --sort name --path .

# pipe-friendly output for shell pipelines
poetry run qry "TODO" -c -o paths | xargs grep -n "FIXME"

# show version and engines
poetry run qry version

CLI usage

qry search [query ...] [-f] [-c] [-r] [-P] [--type EXT1,EXT2] [--scope PATH | --path PATH]
           [--depth N] [--last-days N] [--limit N] [--min-size SIZE] [--max-size SIZE]
           [--sort name|size|date] [--exclude DIR] [--no-exclude]
           [--output yaml|json|paths]

qry interactive
qry batch <input_file> [--output-file FILE] [--format text|json|csv] [--workers N]
qry version

Search mode flags

Flag Long form Searches
(none) filename (default)
-f --filename filename only
-c --content file contents
-r --regex treat query as regular expression

Filtering flags

Flag Description
-t EXT Filter by file type (comma-separated)
-d N Max directory depth
-l N Limit results (0 = unlimited, default)
--last-days N Files modified in last N days
--after-date YYYY-MM-DD Files modified after date
--before-date YYYY-MM-DD Files modified before date
-w N Workers for parallel search (default: 4)
--min-size SIZE Minimum file size (e.g. 1k, 10MB, 1G)
--max-size SIZE Maximum file size (e.g. 100k, 5MB)
-e DIR Exclude extra directory (repeatable, comma-separated)
--no-exclude Disable all default exclusions

Output flags

Flag Description
-o yaml YAML output (default)
-o json JSON output
-o paths One path per line — pipe-friendly
-P --preview — show matching line with context (with -c)
--sort Sort results by name, size, or date

Default excluded directories: .git .venv __pycache__ dist node_modules .tox .mypy_cache

Priority-based search

When priority mode is enabled, directories are searched in order of importance:

Priority Value Directories
SOURCE 100 src/, source/, lib/, code/
PROJECT 90 tests/, test/, docs/, scripts/, examples/
CONFIG 80 config/, .config/, settings/
MAIN 70 main/, app/, core/, server/, client/
MODULES 60 modules/, module/, components/, packages/, plugins/
UTILS 50 utils/, helpers/, tools/
BUILD 40 build/, dist/, out/, target/, release/
CACHE 30 cache/, __pycache__/, node_modules/, .pytest_cache/
TEMP 20 temp/, tmp/, .tmp/
GENERATED 10 generated/, compiled/, bin/, obj/
EXCLUDED 0 .git/, .svn/, .venv/, venv/, .idea/, .vscode/

This ensures that important directories (source code) are searched first, while cache and temporary directories are searched last.

Examples

# search by filename (default)
poetry run qry "invoice"

# search inside file contents — press Ctrl+C to stop early
poetry run qry "def search" -c
poetry run qry "TODO OR FIXME" -c --type py --path ./src

# regex search for Python files
poetry run qry "\.py$" -r --sort name -s qry/

# content search with preview snippet
poetry run qry "search" -c -P --sort name -s qry/ -d 2

# filter by file size
poetry run qry "" --min-size 10k --max-size 1MB --sort size

# JSON output for piping
poetry run qry "invoice" -o json | jq '.results[]'

# pipe-friendly: one path per line
poetry run qry "TODO" -c -o paths | xargs grep -n "FIXME"
poetry run qry "invoice" -o paths | xargs -I{} cp {} /backup/

# exclude extra directories
poetry run qry "config" -e build -e ".cache"

# disable all exclusions (search everything)
poetry run qry "config" --no-exclude

# combine scope/depth/type/date
poetry run qry "invoice OR faktura" --scope /data/docs --depth 3
poetry run qry search "report" --type pdf,docx --last-days 7
poetry run qry batch queries.txt --format json --output-file results.json

# date filtering examples
poetry run qry "report" --last-days 30          # files modified in last 30 days
poetry run qry "invoice" --after-date 2026-01-01    # files after Jan 1, 2026
poetry run qry "invoice" --before-date 2025-12-31  # files before Dec 31, 2025
poetry run qry "report" --after-date 2026-01-01 --before-date 2026-02-01  # date range

Python API

Use qry directly from Python — no subprocess needed:

import qry

# Return all matching file paths as a list
files = qry.search("invoice", scope="/data/docs", mode="content", depth=3)

# Stream results one at a time (memory-efficient, supports Ctrl+C)
for path in qry.search_iter("TODO", scope="./src", mode="content"):
    print(path)

# Regex search with sorting
py_files = qry.search(r"test_.*\.py$", scope=".", regex=True, sort_by="name")

# Size filtering — find large files
big = qry.search("", scope=".", min_size=1024*1024, sort_by="size")

# Custom exclusions
files = qry.search("config", exclude_dirs=[".git", "build", ".venv"])

Parameters for both qry.search() and qry.search_iter():

Parameter Type Default Description
query_text str Text to search for
scope str "." Directory to search
mode str "filename" "filename", "content", or "both"
depth int|None None Max directory depth
file_types list|None None Extensions to include, e.g. ["py","txt"]
exclude_dirs list|None None Dir names to skip (None = use defaults)
max_results int unlimited Hard cap on results
min_size int|None None Minimum file size in bytes
max_size int|None None Maximum file size in bytes
regex bool False Treat query as regular expression
sort_by str|None None Sort by "name", "size", or "date"
date_range tuple|None None Date range as (start_date, end_date) datetime tuples

Example with date filtering:

from datetime import datetime, timedelta
import qry

# Files modified in last 30 days
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
files = qry.search("invoice", date_range=(start_date, end_date))

# Files modified in 2026
files = qry.search("report", date_range=(datetime(2026, 1, 1), datetime(2026, 12, 31)))

HTTP API usage

Run server:

poetry run qry-api --host 127.0.0.1 --port 8000

Main endpoints:

  • GET /api/search
  • GET /api/search/html
  • GET /api/engines
  • GET /api/health
  • OpenAPI docs: GET /api/docs

Development

Run tests

poetry run pytest -q

Useful make targets

make install
make test
make lint
make type-check
make run-api

Project structure

  • qry/cli/ – CLI commands and interactive mode
  • qry/api/ – FastAPI application and routes
  • qry/core/ – core data models
  • qry/engines/ – search engine implementations
  • qry/web/ – HTML renderer/templates integration
  • tests/ – test suite

Additional docs

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qry-0.2.11.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qry-0.2.11-py3-none-any.whl (40.5 kB view details)

Uploaded Python 3

File details

Details for the file qry-0.2.11.tar.gz.

File metadata

  • Download URL: qry-0.2.11.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qry-0.2.11.tar.gz
Algorithm Hash digest
SHA256 0fe5a0465a9f04943eab4772b03287dfde872fddabadd666fe8e8a435d80e8f7
MD5 e06789e3a2bf14851e64c2075a27a005
BLAKE2b-256 2a5fb85ac1b5dd2c7a7b1f0601d164367acfb8c89986588468b8514768baaf09

See more details on using hashes here.

File details

Details for the file qry-0.2.11-py3-none-any.whl.

File metadata

  • Download URL: qry-0.2.11-py3-none-any.whl
  • Upload date:
  • Size: 40.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qry-0.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 f62ca8b04dac8ec2460bbe87f2665d5d4cfa78e5a6f33fe50e23ab491a157ce2
MD5 92ffda652d4db6fc8b256747acc20c0c
BLAKE2b-256 86821936abcb6821ce0b703046b89e58383cade8028ab54347ee7b3c2b7b41fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page