Skip to main content

Tree-sitter repository map in TOON format for LLM consumption

Project description

sourcecrumb

Tree-sitter repository map in TOON format for LLM consumption.

What it does

sourcecrumb parses a codebase with tree-sitter, extracts symbols (classes, functions, methods, imports), builds a file-to-file dependency graph, and ranks files by PageRank. The output is a compact TOON-formatted map designed to fit in an LLM context window.

The goal: give an LLM agent a high-level map of a codebase so it can explore more effectively — knowing which files matter most, what symbols they define, and how they depend on each other.

Installation

Requires Python >= 3.13.

From PyPI

pip install sourcecrumb

Or with uv:

uv pip install sourcecrumb

Run without installing

uvx sourcecrumb .

From source

git clone https://github.com/phobologic/sourcecrumb.git
cd sourcecrumb
uv sync

Usage

sourcecrumb [ROOT] [OPTIONS]
Option Description
ROOT Repository root directory (default: .)
--max-files, -n Limit output to top N files by PageRank (min: 1)
--language, -l Restrict to a specific language (e.g., python)
--cache Cache file path; reuses if newer than all source files
--max-file-size Skip files larger than this many bytes (default: 1MB)

Example

$ sourcecrumb . -n 3
repo: sourcecrumb
root: sourcecrumb
files[3]{path,language,rank}:
  sourcecrumb/models.py,python,0.2615
  sourcecrumb/languages.py,python,0.1155
  sourcecrumb/discovery.py,python,0.0590
symbols[17]{file,name,kind,line,signature}:
  sourcecrumb/models.py,TagKind,class,10,TagKind(enum.Enum)
  sourcecrumb/models.py,SymbolKind,class,17,SymbolKind(enum.Enum)
  sourcecrumb/models.py,Tag,class,27,Tag
  sourcecrumb/models.py,FileInfo,class,39,FileInfo
  ...
dependencies[1]{source,target,symbols}:
  sourcecrumb/discovery.py,sourcecrumb/languages.py,language_for_extension

Claude Code integration

The primary use case is running sourcecrumb as a Claude Code hook so every subagent automatically gets a repo map injected into its context.

Add this to .claude/settings.json:

{
  "hooks": {
    "SubagentStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "uvx sourcecrumb \"$CLAUDE_PROJECT_DIR\" --cache \"$CLAUDE_PROJECT_DIR/.cache/sourcecrumb.toon\""
          }
        ]
      }
    ]
  }
}

If you have sourcecrumb installed globally, you can use sourcecrumb directly instead of going through uvx.

The SubagentStart hook fires when any subagent launches. sourcecrumb's stdout is injected into the subagent's context, giving it an instant overview of the codebase.

--cache avoids re-parsing on every agent launch — the cache file is reused as long as no source files have changed. Add .cache/ to your .gitignore.

TOON format

The output uses TOON (Text Object Oriented Notation), a compact format designed for LLM consumption:

  • Scalar fieldskey: value
  • Tabular arraysname[count]{col1,col2,...}: followed by indented CSV rows
  • Quoting — values containing special characters are double-quoted; numbers and plain strings are bare

How it works

  1. Discover files — uses git ls-files when available, falls back to .gitignore-based filtering
  2. Parse with tree-sitter — extracts classes, functions, methods, and imports from each file
  3. Build dependency graph — creates file-to-file edges based on shared symbols (imports that resolve to definitions in other files)
  4. Rank with PageRank — scores files by importance in the dependency graph
  5. Select top N — when --max-files is set, keeps only the highest-ranked files
  6. Encode to TOON — serializes the repo map into the compact output format

Supported languages

Python. Extensible by adding a .scm query file to sourcecrumb/queries/ and registering the language in sourcecrumb/languages.py.

Development

uv run pytest                            # run tests
uv run ruff check sourcecrumb/ tests/    # lint
uv run ruff format sourcecrumb/ tests/   # format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcecrumb-0.2.3.tar.gz (261.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourcecrumb-0.2.3-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file sourcecrumb-0.2.3.tar.gz.

File metadata

  • Download URL: sourcecrumb-0.2.3.tar.gz
  • Upload date:
  • Size: 261.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.13

File hashes

Hashes for sourcecrumb-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ef8f8c90832d7b44d83b469a8141b47d4500d261ff286de2ec5990103d93f933
MD5 9829af74b96255be53a766ef8bf0f641
BLAKE2b-256 d35907df97e1bbc93ed5c75afa1200bc02e6a4438d1bffcecaee2752e03a0f13

See more details on using hashes here.

File details

Details for the file sourcecrumb-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for sourcecrumb-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 04ed5c5934b28803cfcd923b417a6db2ca858f6cace0a3837f5eb1bf962f5e1d
MD5 3b10113e9b857712b3050c8156342aa4
BLAKE2b-256 2523e4c6b98da2f47170f91bbc52bd15cf3b0de76fd69c6118ae22b4a59c91d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page