Tree-sitter repository map in TOON format for LLM consumption
Project description
sourcecrumb
Tree-sitter repository map in TOON format for LLM consumption.
What it does
sourcecrumb parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.
The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.
Installation
Requires Python >= 3.13.
From PyPI
pip install sourcecrumb
Or with uv:
uv pip install sourcecrumb
Run without installing
uvx sourcecrumb .
From source
git clone https://github.com/phobologic/sourcecrumb.git
cd sourcecrumb
uv sync
Usage
sourcecrumb [ROOT] [OPTIONS]
| Option | Description |
|---|---|
ROOT |
Repository root directory (default: .) |
--max-files, -n |
Limit output to top N files by PageRank (min: 1) |
--language, -l |
Restrict to a specific language (e.g., python) |
--cache |
Cache file path; reuses if newer than all source files |
--max-file-size |
Skip files larger than this many bytes (default: 1MB) |
--fast |
Experimental: parse files in parallel for faster processing |
Example
$ sourcecrumb . -n 3
repo: sourcecrumb
root: sourcecrumb
files[3]{path,language,rank}:
sourcecrumb/models.py,python,0.2615
sourcecrumb/languages.py,python,0.1155
sourcecrumb/discovery.py,python,0.0590
symbols[17]{file,name,kind,line,signature}:
sourcecrumb/models.py,TagKind,class,10,TagKind(enum.Enum)
sourcecrumb/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
sourcecrumb/models.py,Tag,class,27,Tag
sourcecrumb/models.py,FileInfo,class,39,FileInfo
...
dependencies[1]{source,target,symbols}:
sourcecrumb/discovery.py,sourcecrumb/languages.py,language_for_extension
Claude Code integration
The primary use case is running sourcecrumb as a Claude Code hook so every subagent automatically gets a repo map injected into its context.
Add this to .claude/settings.json:
{
"hooks": {
"SubagentStart": [
{
"hooks": [
{
"type": "command",
"command": "uvx sourcecrumb \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/sourcecrumb.toon\""
}
]
}
]
}
}
If you have sourcecrumb installed globally, you can use sourcecrumb directly instead of going through uvx.
The SubagentStart hook fires when any subagent launches. sourcecrumb's stdout is injected into the subagent's context, giving it an instant overview of the codebase.
--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.
TOON format
The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:
- Scalar fields —
key: value - Tabular arrays —
name[count]{col1,col2,...}:followed by indented CSV rows - Quoting — values containing special characters are double-quoted; numbers and plain strings are bare
How it works
- Discover files — uses
git ls-fileswhen available, falls back to.gitignore-based filtering - Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
- Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
- Rank with PageRank — scores files by importance in the dependency graph
- Select top N — when
--max-filesis set, keeps only the highest-ranked files - Encode to TOON — serializes the repo map into the compact output format
Supported languages
Python. Extensible by adding a .scm query file to sourcecrumb/queries/ and registering the language in sourcecrumb/languages.py.
Development
uv run pytest # run tests
uv run ruff check sourcecrumb/ tests/ # lint
uv run ruff format sourcecrumb/ tests/ # format
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourcecrumb-0.2.1.tar.gz.
File metadata
- Download URL: sourcecrumb-0.2.1.tar.gz
- Upload date:
- Size: 234.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8dadd00251ec94b66e427906e8387b154ee85f86210659ea43144fed5c2c05f
|
|
| MD5 |
baa9f1653eeca04c9e158a8bb527be48
|
|
| BLAKE2b-256 |
9f3704850ba1a5bc0de77dff1f2206d15ea96a7cc4c9fc51c840668e87d9f90b
|
File details
Details for the file sourcecrumb-0.2.1-py3-none-any.whl.
File metadata
- Download URL: sourcecrumb-0.2.1-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93e9fefa86d306a3f5d39a412ebb7502c00c3d6a7fe8f16033e61eeb60a315ef
|
|
| MD5 |
99bfa6653e3adb6ff8c2d4fa2b81984a
|
|
| BLAKE2b-256 |
9ea91dcae5ac5a6cf397d35362de8461fe73e96355c4d595560064c872f32e2f
|