Lightweight recursive search for Python, CLIs, and coding agents — globs, content grep, stat/dup helpers, JSON-stable outputs, typed agent API, optional MCP server.
Project description
____ ____ _ _
| _ \ / ___| | | ___ | |__
| |_) | | _ | |/ _ \| '_ \
| _ <| |_| | | | (_) | |_) |
|_| \_\\____| |_|\___/|_.__/
/ / / / /
Lightweight recursive search for Python, CLIs, and coding agents.
Why use this?
Modern Python has pathlib.Path.rglob, and external tools like fd and
ripgrep are blazingly fast. rglob is smaller and more embeddable: filename
globbing, content grep, count/stat helpers, stable JSON schemas, a typed
rglob.agent namespace, and an optional MCP server.
It is designed to be a default recursive-search dependency for coding agents that need predictable outputs, bounded result shapes, and read-only behavior.
Installation
pip install rglob
Quick Start (Python API)
import rglob
# Modern API — yields pathlib.Path, with filter kwargs
for p in rglob.find("/path/to/project", "*.py", exclude=".venv"):
print(p)
# Eager variant
paths = rglob.find_all("/repo", "**/*.py", hidden=False, max_depth=4)
# OS-aware case sensitivity (`None` = OS default — case-sensitive on Linux,
# case-insensitive on macOS/Windows; pass True/False to force)
paths = rglob.find_all(".", "*.PY", case_sensitive=False)
# Legacy API still works (now returns list[Path] at 2.0 — see migration guide)
files = rglob.rglob("/path/to/project", "*.py") # → list[Path]
files_cwd = rglob.rglob_("*.py") # → list[Path]
# Count non-empty, non-comment lines across matching files
non_empty_non_comment = rglob.lcount(
"/path/to/project",
"*.py",
lambda line: bool(line.strip()) and not line.lstrip().startswith("#"),
)
# Total size of all JPGs in megabytes (use provided unit helpers)
total_mb = rglob.tsize("/path/to/photos", "*.jpg", rglob.megabytes)
Agent integrations should import from the stable rglob.agent namespace:
from pathlib import Path
from rglob.agent import GrepOptions, WalkOptions, grep_all, search_all
files = search_all(WalkOptions(patterns=["*.py"], base=Path("src")))
todos = grep_all(GrepOptions(pattern="TODO", paths=["*.py"], base=Path("src")))
Paths are sorted by default for deterministic output. Pass sort=False for
raw scandir order. Recursive ** globs work
(rglob.find_all("src", "**/*.py")); symlink loops are detected and
terminated automatically.
Upgrading from 1.x?
rglob()now returnslist[Path]instead oflist[str]. See migrating to 2.0 for the one-line migration.
Quick Start (CLI)
# Find files
rglob find "*.py"
# Multiple patterns are OR'd
rglob find "*.py" "*.pyx"
# Filter flags
rglob find "*.py" --base ./src --exclude .venv -d 3 --hidden
# Output formats
rglob find "*.py" --json | jq '.results[] | .path'
rglob find "*.py" --jsonl
rglob find "*.py" -0 | xargs -0 wc -l # NUL-separated for xargs
# Mini-template formatter
rglob find "*.py" --format "{name}: {size_mb:.2f} MiB"
# Count lines, skipping empties and comment lines
rglob lcount "*.py" --no-empty --no-comments
# Grep content and count structured stats
rglob grep TODO "*.py" --context 2 --json
rglob count "*.py" --no-empty --no-comments --json
# Sum total size in MB
rglob tsize "*.py" --unit mb
# Machine discovery for agents
rglob describe find
rglob schema grep
rglob schema --all
rglob capabilities --json
rglob agent-version # locked SemVer of the agent contract (see ADR-0009)
# MCP server (stdio). Exposes `find_files`, `grep_content`, `count_lines`,
# `find_duplicate_files`, and `describe_subcommand` with read-only,
# bounded defaults. Full setup in docs/agents/mcp-setup.md.
pip install "rglob[mcp]"
rglob mcp
# Shell completion (one-time setup)
rglob --install-completion bash # or zsh / fish / powershell
Quote your patterns! Otherwise your shell pre-expands them before Python runs. Use
rglob find "*.py", notrglob find *.py. Ifrglob findreceives multiple unquoted positional patterns it will warn you on stderr.
Fun features
# Summary table: file count, total size, extension breakdown
rglob stats "*.py" --base ./src
# Unicode tree of matches (depth 3 by default)
rglob tree "*.py" --base ./src
# Top 10 largest files
rglob top "*" --base ~/Downloads
# Find duplicate files (size → 4-KiB hash → full hash)
rglob dupes "*" --base ~/Downloads --min-size 1M
# Respect .gitignore (requires `pip install rglob[gitignore]`)
rglob find "*" --gitignore
# Filter by kind / size / mtime
rglob find "*" -t f --min-size 1M --newer-than 7d
The duplicate detection uses xxhash.xxh3_64 when the optional [ext]
extra is installed; it falls back to stdlib BLAKE2b otherwise — both are
fast enough that the difference rarely matters in 2026.
Compatibility
| Python | Status |
|---|---|
| 3.11+ | Supported (rglob 2.0+) |
| 3.10 | Pin rglob<2 — dropped at 2.0 (Python 3.10 EOL is October 2026) |
| 3.6–3.9 | Not supported |
| 2.7 | Final supported release is 1.4 (PyPI history: https://pypi.org/project/rglob/1.4/) |
Documentation
Full docs (API reference, CLI reference, architecture diagrams, ADRs) live in
docs/ and are published as a MkDocs Material site at
https://chris-piekarski.github.io/python-rglob/.
- Agent integration — CLI JSON, Python API, MCP, safety, and stability guidance for coding agents.
- Modernization roadmap — the six-phase plan that delivered 2.0.
- Migrating to 2.0 — the
list[str]→list[Path]return-type flip. - Architecture — package layout, walker call-graph,
CLI command hierarchy,
dupespipeline, and the 2.0 public-API class diagram. - Decisions — ADRs for the locked-in design choices.
Development
git clone https://github.com/chris-piekarski/python-rglob.git
cd python-rglob
python -m venv .venv && source .venv/bin/activate
make dev-setup # installs [dev,bdd,docs,gitignore] + pre-commit hooks
make test # pytest + behave, gated at 100% local coverage
make lint # ruff + mypy --strict
make docs # live MkDocs preview at :8000
The 2.0 release replaced pylint with Ruff as the primary linter and added
mypy --strict. make lint runs both; pylint src/rglob features still
works if you want a second opinion.
License
Apache 2.0 — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rglob-2.0.0.tar.gz.
File metadata
- Download URL: rglob-2.0.0.tar.gz
- Upload date:
- Size: 127.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af3df7f61297e7b9991d647abe9b651f574ff3495aeeb77e041bc7cf95749c80
|
|
| MD5 |
f2f550fdf138f2566aa146a49a1c3f7f
|
|
| BLAKE2b-256 |
5a234e5ca0b606884fac4a7124a3913f3f2fd0f495be8f5f2f21b5397de8cefd
|
File details
Details for the file rglob-2.0.0-py3-none-any.whl.
File metadata
- Download URL: rglob-2.0.0-py3-none-any.whl
- Upload date:
- Size: 47.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94c4d3417499cd7ac768aa80b726bd3fa684363d7ab312a599d02585b98f231c
|
|
| MD5 |
aee5eaf88d43444ce83ebd7171f51637
|
|
| BLAKE2b-256 |
2870a25944d12f00aed2e06d4aeea138e6bbbef0db37443c156185e5810eac2b
|