Skip to main content

Tree-sitter repository map in TOON format for LLM consumption

Project description

sourcecrumb

Tree-sitter repository map in TOON format for LLM consumption.

What it does

sourcecrumb parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.

The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.

Installation

Requires Python >= 3.13.

git clone <repo-url>
cd sourcecrumb
uv sync

Usage

scrumb [ROOT] [OPTIONS]
Option Description
ROOT Repository root directory (default: .)
--max-files, -n Limit output to top N files by PageRank (min: 1)
--language, -l Restrict to a specific language (e.g., python)
--cache Cache file path; reuses if newer than all source files

Example

$ scrumb . -n 3
repo: sourcecrumb
root: sourcecrumb
files[3]{path,language,rank}:
  sourcecrumb/models.py,python,0.2615
  sourcecrumb/languages.py,python,0.1155
  sourcecrumb/discovery.py,python,0.0590
symbols[17]{file,name,kind,line,signature}:
  sourcecrumb/models.py,TagKind,class,10,TagKind(enum.Enum)
  sourcecrumb/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
  sourcecrumb/models.py,Tag,class,27,Tag
  sourcecrumb/models.py,FileInfo,class,39,FileInfo
  ...
dependencies[1]{source,target,symbols}:
  sourcecrumb/discovery.py,sourcecrumb/languages.py,language_for_extension

Claude Code integration

The primary use case is running sourcecrumb as a Claude Code hook so every subagent automatically gets a repo map injected into its context.

Add this to .claude/settings.json:

{
  "hooks": {
    "SubagentStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "scrumb \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/sourcecrumb.toon\""
          }
        ]
      }
    ]
  }
}

The SubagentStart hook fires when any subagent launches. sourcecrumb's stdout is injected into the subagent's context, giving it an instant overview of the codebase.

--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.

TOON format

The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:

  • Scalar fieldskey: value
  • Tabular arraysname[count]{col1,col2,...}: followed by indented CSV rows
  • Quoting — values containing special characters are double-quoted; numbers and plain strings are bare

How it works

  1. Discover files — uses git ls-files when available, falls back to .gitignore-based filtering
  2. Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
  3. Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
  4. Rank with PageRank — scores files by importance in the dependency graph
  5. Select top N — when --max-files is set, keeps only the highest-ranked files
  6. Encode to TOON — serializes the repo map into the compact output format

Supported languages

Python. Extensible by adding a .scm query file to sourcecrumb/queries/ and registering the language in sourcecrumb/languages.py.

Development

uv run pytest                            # run tests
uv run ruff check sourcecrumb/ tests/    # lint
uv run ruff format sourcecrumb/ tests/   # format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcecrumb-0.1.0.tar.gz (221.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourcecrumb-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file sourcecrumb-0.1.0.tar.gz.

File metadata

  • Download URL: sourcecrumb-0.1.0.tar.gz
  • Upload date:
  • Size: 221.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sourcecrumb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2be2391b7fec6486f50e23977ea934dc69eaa3a148cd8a04727a788535d204fd
MD5 ebe7ad618bb3eed88e21c7646861846a
BLAKE2b-256 af1e3a6a0fe2c313e6f2dea97ac4be9b0dc4a2e42952e8afe1ef516ea8114ad2

See more details on using hashes here.

File details

Details for the file sourcecrumb-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sourcecrumb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dacb40824b2b520b852ae76cacaf135a451d8748de52f4e758938b60a1840639
MD5 fbf20bafcaff725c3bd0ff1fbc9697e5
BLAKE2b-256 dd38e8a423e218736e331f15c681422d1fadb7efdaf91ef45e9db35e332b592d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page