Skip to main content

Tree-sitter repository map in TOON format for LLM consumption

Project description

sourcecrumb

Tree-sitter repository map in TOON format for LLM consumption.

What it does

sourcecrumb parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.

The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.

Installation

Requires Python >= 3.13.

From PyPI

pip install sourcecrumb

Or with uv:

uv pip install sourcecrumb

Run without installing

uvx sourcecrumb .

The short alias scrumb also works everywhere sourcecrumb does.

From source

git clone https://github.com/phobologic/sourcecrumb.git
cd sourcecrumb
uv sync

Usage

scrumb [ROOT] [OPTIONS]
Option Description
ROOT Repository root directory (default: .)
--max-files, -n Limit output to top N files by PageRank (min: 1)
--language, -l Restrict to a specific language (e.g., python)
--cache Cache file path; reuses if newer than all source files

Example

$ scrumb . -n 3
repo: sourcecrumb
root: sourcecrumb
files[3]{path,language,rank}:
  sourcecrumb/models.py,python,0.2615
  sourcecrumb/languages.py,python,0.1155
  sourcecrumb/discovery.py,python,0.0590
symbols[17]{file,name,kind,line,signature}:
  sourcecrumb/models.py,TagKind,class,10,TagKind(enum.Enum)
  sourcecrumb/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
  sourcecrumb/models.py,Tag,class,27,Tag
  sourcecrumb/models.py,FileInfo,class,39,FileInfo
  ...
dependencies[1]{source,target,symbols}:
  sourcecrumb/discovery.py,sourcecrumb/languages.py,language_for_extension

Claude Code integration

The primary use case is running sourcecrumb as a Claude Code hook so every subagent automatically gets a repo map injected into its context.

Add this to .claude/settings.json:

{
  "hooks": {
    "SubagentStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "uvx sourcecrumb \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/sourcecrumb.toon\""
          }
        ]
      }
    ]
  }
}

If you have sourcecrumb installed globally, you can use scrumb directly instead of going through uvx.

The SubagentStart hook fires when any subagent launches. sourcecrumb's stdout is injected into the subagent's context, giving it an instant overview of the codebase.

--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.

TOON format

The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:

  • Scalar fieldskey: value
  • Tabular arraysname[count]{col1,col2,...}: followed by indented CSV rows
  • Quoting — values containing special characters are double-quoted; numbers and plain strings are bare

How it works

  1. Discover files — uses git ls-files when available, falls back to .gitignore-based filtering
  2. Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
  3. Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
  4. Rank with PageRank — scores files by importance in the dependency graph
  5. Select top N — when --max-files is set, keeps only the highest-ranked files
  6. Encode to TOON — serializes the repo map into the compact output format

Supported languages

Python. Extensible by adding a .scm query file to sourcecrumb/queries/ and registering the language in sourcecrumb/languages.py.

Development

uv run pytest                            # run tests
uv run ruff check sourcecrumb/ tests/    # lint
uv run ruff format sourcecrumb/ tests/   # format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcecrumb-0.2.0.tar.gz (234.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourcecrumb-0.2.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file sourcecrumb-0.2.0.tar.gz.

File metadata

  • Download URL: sourcecrumb-0.2.0.tar.gz
  • Upload date:
  • Size: 234.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sourcecrumb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e09a341ac18840e99f8ae6a54ef01214dedbae32e856a20fa12c12598733906f
MD5 b4c81424b87d6a6df33decac530db905
BLAKE2b-256 4ef3e47e5e12cdc66e336cba9619e762ae14f6e4957517447da1d007359a5246

See more details on using hashes here.

File details

Details for the file sourcecrumb-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sourcecrumb-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b565ecc82a91c844c58becfd6c50df2aaf5a2b63def9d80fdeef52c79f20197e
MD5 46b931be1a4c8627a89ab26b24136b4c
BLAKE2b-256 adb3cc752c50ce817d3a68f0f8bf824eb1f6e691c1556d9d444ead949316b0ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page