Skip to main content

Lightweight recursive search for Python, CLIs, and coding agents — globs, content grep, stat/dup helpers, JSON-stable outputs, typed agent API, optional MCP server.

Project description

   ____   ____   _       _     
  |  _ \ / ___| | | ___ | |__  
  | |_) | |  _  | |/ _ \| '_ \ 
  |  _ <| |_| | | | (_) | |_) |
  |_| \_\\____| |_|\___/|_.__/ 
            /  /  /  /  / 

Lightweight recursive search for Python, CLIs, and coding agents.

PyPI Python versions CI Coverage mypy: strict pre-commit License Conventional Commits Ruff GitHub stars PyPI - Downloads Docs


Why use this?

Modern Python has pathlib.Path.rglob, and external tools like fd and ripgrep are blazingly fast. rglob is smaller and more embeddable: filename globbing, content grep, count/stat helpers, stable JSON schemas, a typed rglob.agent namespace, and an optional MCP server.

It is designed to be a default recursive-search dependency for coding agents that need predictable outputs, bounded result shapes, and read-only behavior.

Installation

pip install rglob

Quick Start (Python API)

import rglob

# Modern API — yields pathlib.Path, with filter kwargs
for p in rglob.find("/path/to/project", "*.py", exclude=".venv"):
    print(p)

# Eager variant
paths = rglob.find_all("/repo", "**/*.py", hidden=False, max_depth=4)

# OS-aware case sensitivity (`None` = OS default — case-sensitive on Linux,
# case-insensitive on macOS/Windows; pass True/False to force)
paths = rglob.find_all(".", "*.PY", case_sensitive=False)

# Legacy API still works (now returns list[Path] at 2.0 — see migration guide)
files = rglob.rglob("/path/to/project", "*.py")          # → list[Path]
files_cwd = rglob.rglob_("*.py")                          # → list[Path]

# Count non-empty, non-comment lines across matching files
non_empty_non_comment = rglob.lcount(
    "/path/to/project",
    "*.py",
    lambda line: bool(line.strip()) and not line.lstrip().startswith("#"),
)

# Total size of all JPGs in megabytes (use provided unit helpers)
total_mb = rglob.tsize("/path/to/photos", "*.jpg", rglob.megabytes)

Agent integrations should import from the stable rglob.agent namespace:

from pathlib import Path

from rglob.agent import GrepOptions, WalkOptions, grep_all, search_all

files = search_all(WalkOptions(patterns=["*.py"], base=Path("src")))
todos = grep_all(GrepOptions(pattern="TODO", paths=["*.py"], base=Path("src")))

Paths are sorted by default for deterministic output. Pass sort=False for raw scandir order. Recursive ** globs work (rglob.find_all("src", "**/*.py")); symlink loops are detected and terminated automatically.

Upgrading from 1.x? rglob() now returns list[Path] instead of list[str]. See migrating to 2.0 for the one-line migration.

Quick Start (CLI)

# Find files
rglob find "*.py"

# Multiple patterns are OR'd
rglob find "*.py" "*.pyx"

# Filter flags
rglob find "*.py" --base ./src --exclude .venv -d 3 --hidden

# Output formats
rglob find "*.py" --json | jq '.results[] | .path'
rglob find "*.py" --jsonl
rglob find "*.py" -0 | xargs -0 wc -l       # NUL-separated for xargs

# Mini-template formatter
rglob find "*.py" --format "{name}: {size_mb:.2f} MiB"

# Count lines, skipping empties and comment lines
rglob lcount "*.py" --no-empty --no-comments

# Grep content and count structured stats
rglob grep TODO "*.py" --context 2 --json
rglob count "*.py" --no-empty --no-comments --json

# Sum total size in MB
rglob tsize "*.py" --unit mb

# Machine discovery for agents
rglob describe find
rglob schema grep
rglob schema --all
rglob capabilities --json
rglob agent-version       # locked SemVer of the agent contract (see ADR-0009)

# MCP server (stdio). Exposes `find_files`, `grep_content`, `count_lines`,
# `find_duplicate_files`, and `describe_subcommand` with read-only,
# bounded defaults. Full setup in docs/agents/mcp-setup.md.
pip install "rglob[mcp]"
rglob mcp

# Shell completion (one-time setup)
rglob --install-completion bash    # or zsh / fish / powershell

Quote your patterns! Otherwise your shell pre-expands them before Python runs. Use rglob find "*.py", not rglob find *.py. If rglob find receives multiple unquoted positional patterns it will warn you on stderr.

Fun features

# Summary table: file count, total size, extension breakdown
rglob stats "*.py" --base ./src

# Unicode tree of matches (depth 3 by default)
rglob tree "*.py" --base ./src

# Top 10 largest files
rglob top "*" --base ~/Downloads

# Find duplicate files (size → 4-KiB hash → full hash)
rglob dupes "*" --base ~/Downloads --min-size 1M

# Respect .gitignore (requires `pip install rglob[gitignore]`)
rglob find "*" --gitignore

# Filter by kind / size / mtime
rglob find "*" -t f --min-size 1M --newer-than 7d

The duplicate detection uses xxhash.xxh3_64 when the optional [ext] extra is installed; it falls back to stdlib BLAKE2b otherwise — both are fast enough that the difference rarely matters in 2026.

Compatibility

Python Status
3.11+ Supported (rglob 2.0+)
3.10 Pin rglob<2 — dropped at 2.0 (Python 3.10 EOL is October 2026)
3.6–3.9 Not supported
2.7 Final supported release is 1.4 (PyPI history: https://pypi.org/project/rglob/1.4/)

Documentation

Full docs (API reference, CLI reference, architecture diagrams, ADRs) live in docs/ and are published as a MkDocs Material site at https://chris-piekarski.github.io/python-rglob/.

  • Agent integration — CLI JSON, Python API, MCP, safety, and stability guidance for coding agents.
  • Modernization roadmap — the six-phase plan that delivered 2.0.
  • Migrating to 2.0 — the list[str]list[Path] return-type flip.
  • Architecture — package layout, walker call-graph, CLI command hierarchy, dupes pipeline, and the 2.0 public-API class diagram.
  • Decisions — ADRs for the locked-in design choices.

Development

git clone https://github.com/chris-piekarski/python-rglob.git
cd python-rglob
python -m venv .venv && source .venv/bin/activate
make dev-setup     # installs [dev,bdd,docs,gitignore] + pre-commit hooks
make test          # pytest + behave, gated at 100% local coverage
make lint          # ruff + mypy --strict
make docs          # live MkDocs preview at :8000

The 2.0 release replaced pylint with Ruff as the primary linter and added mypy --strict. make lint runs both; pylint src/rglob features still works if you want a second opinion.

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rglob-2.0.0.tar.gz (127.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rglob-2.0.0-py3-none-any.whl (47.4 kB view details)

Uploaded Python 3

File details

Details for the file rglob-2.0.0.tar.gz.

File metadata

  • Download URL: rglob-2.0.0.tar.gz
  • Upload date:
  • Size: 127.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for rglob-2.0.0.tar.gz
Algorithm Hash digest
SHA256 af3df7f61297e7b9991d647abe9b651f574ff3495aeeb77e041bc7cf95749c80
MD5 f2f550fdf138f2566aa146a49a1c3f7f
BLAKE2b-256 5a234e5ca0b606884fac4a7124a3913f3f2fd0f495be8f5f2f21b5397de8cefd

See more details on using hashes here.

File details

Details for the file rglob-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: rglob-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 47.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for rglob-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94c4d3417499cd7ac768aa80b726bd3fa684363d7ab312a599d02585b98f231c
MD5 aee5eaf88d43444ce83ebd7171f51637
BLAKE2b-256 2870a25944d12f00aed2e06d4aeea138e6bbbef0db37443c156185e5810eac2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page