Skip to main content

Tree-sitter repository map in TOON format for LLM consumption

Project description

sourcecrumb

Tree-sitter repository map in TOON format for LLM consumption.

What it does

sourcecrumb parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.

The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.

Installation

Requires Python >= 3.13.

From PyPI

pip install sourcecrumb

Or with uv:

uv pip install sourcecrumb

Run without installing

uvx sourcecrumb .

From source

git clone https://github.com/phobologic/sourcecrumb.git
cd sourcecrumb
uv sync

Usage

sourcecrumb [ROOT] [OPTIONS]
Option Description
ROOT Repository root directory (default: .)
--max-files, -n Limit output to top N files by PageRank (min: 1)
--language, -l Restrict to a specific language (e.g., python)
--cache Cache file path; reuses if newer than all source files
--max-file-size Skip files larger than this many bytes (default: 1MB)
--fast Experimental: parse files in parallel for faster processing

Example

$ sourcecrumb . -n 3
repo: sourcecrumb
root: sourcecrumb
files[3]{path,language,rank}:
  sourcecrumb/models.py,python,0.2615
  sourcecrumb/languages.py,python,0.1155
  sourcecrumb/discovery.py,python,0.0590
symbols[17]{file,name,kind,line,signature}:
  sourcecrumb/models.py,TagKind,class,10,TagKind(enum.Enum)
  sourcecrumb/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
  sourcecrumb/models.py,Tag,class,27,Tag
  sourcecrumb/models.py,FileInfo,class,39,FileInfo
  ...
dependencies[1]{source,target,symbols}:
  sourcecrumb/discovery.py,sourcecrumb/languages.py,language_for_extension

Claude Code integration

The primary use case is running sourcecrumb as a Claude Code hook so every subagent automatically gets a repo map injected into its context.

Add this to .claude/settings.json:

{
  "hooks": {
    "SubagentStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "uvx sourcecrumb \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/sourcecrumb.toon\""
          }
        ]
      }
    ]
  }
}

If you have sourcecrumb installed globally, you can use sourcecrumb directly instead of going through uvx.

The SubagentStart hook fires when any subagent launches. sourcecrumb's stdout is injected into the subagent's context, giving it an instant overview of the codebase.

--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.

TOON format

The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:

  • Scalar fieldskey: value
  • Tabular arraysname[count]{col1,col2,...}: followed by indented CSV rows
  • Quoting — values containing special characters are double-quoted; numbers and plain strings are bare

How it works

  1. Discover files — uses git ls-files when available, falls back to .gitignore-based filtering
  2. Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
  3. Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
  4. Rank with PageRank — scores files by importance in the dependency graph
  5. Select top N — when --max-files is set, keeps only the highest-ranked files
  6. Encode to TOON — serializes the repo map into the compact output format

Supported languages

Python. Extensible by adding a .scm query file to sourcecrumb/queries/ and registering the language in sourcecrumb/languages.py.

Development

uv run pytest                            # run tests
uv run ruff check sourcecrumb/ tests/    # lint
uv run ruff format sourcecrumb/ tests/   # format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcecrumb-0.2.2.tar.gz (248.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourcecrumb-0.2.2-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file sourcecrumb-0.2.2.tar.gz.

File metadata

  • Download URL: sourcecrumb-0.2.2.tar.gz
  • Upload date:
  • Size: 248.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sourcecrumb-0.2.2.tar.gz
Algorithm Hash digest
SHA256 b479e747d33fc4602a49f818bb46c5c38d579d9613f99596e49abf7d44012278
MD5 bd4db1f905ed05a463b7c987d76655a0
BLAKE2b-256 031585c6e5455c52233f4dcfef6983593d2ccf8afda7426c27d8e62997dfcf0b

See more details on using hashes here.

File details

Details for the file sourcecrumb-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sourcecrumb-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0d8a74d7713f2749b61ec0c12d0fb33e79305c286bddc6181f6dc280567b6e3c
MD5 f5c0336c8c01cb7cd512c1ae13acb8a0
BLAKE2b-256 12ef8231160e4a590fb7403b50870f0a361a6a1d4c88622e0f4d15a296999763

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page