Distill Python repositories into compact review bundles for LLMs and structured IR for agents.
Project description
distillrepo
distillrepo distills Python repositories into compact review bundles for LLMs and structured IR for agents.
Default outputs:
distilled.<package>.<MMMDDYYYY>.py: a single-file bundle for LLM review<package_root>/.distillrepo/: a structured Intermediate Representation (IR) for agents and downstream tooling
Project page: https://github.com/raamana/distillrepo
This software was developed with the help of Codex model GPT 5.4.
Install:
pip install distillrepo
Run:
distillrepo path/to/package
Why use distillrepo
Large repos are awkward to review with an LLM if you only have two bad options:
- paste raw source and waste context
- paste a vague summary and lose important detail
distillrepo sits in the middle:
- it preserves real code for the most relevant parts
- compresses lower-priority areas into summaries or signatures
- keeps a structured IR for retrieval, ranking, and follow-up analysis
Example Demo Outputs
These are real runs on open-source repositories. They show the kind of compression distillrepo can achieve, but they should be interpreted together with root coverage and review quality, not as standalone scoreboards.
| Repo | Shape | Review mode | Files | Symbols | Distilled size | Saved | Compression |
|---|---|---|---|---|---|---|---|
openai-agents-python |
Agent SDK | review |
163 | 1701 | 100,082 | 79.2% | 4.8x |
networkx |
Large API library | review |
288 | 1973 | 62,074 | 93.8% | 16.3x |
networkx |
Large API library | budgeted |
288 | 1973 | 41,872 | 95.8% | 24.1x |
rich |
Medium utility library | review |
100 | 833 | 15,964 | 94.6% | 18.5x |
Please note that distillrepo uses heuristics for root inference, reachability, hotspot ranking, and unused-code detection. Those are useful review aids, but they are not ground truth.
openai-agents-python
Good demo for an agent-native audience: handoffs, tools, tracing, memory, model adapters, and runtime orchestration all live in one package.
163files,1701symbols,163modules482,255estimated original tokens ->100,082distilled tokens79.2%saved,4.8xcompression96modules reached from the inferred root set
Why it is useful: the LLM bundle keeps the core agent runtime and API surface reviewable in one file, while .distillrepo/ gives agents a reusable symbol and relationship map for follow-up inspection.
networkx
Good demo for large API-heavy libraries: many modules, broad public surface, and enough internal structure that selective compression matters.
review mode:
288files,1973symbols,288modules1,008,946estimated original tokens ->62,074distilled tokens93.8%saved,16.3xcompression222modules reached from the inferred root set
budgeted mode:
1,008,946estimated original tokens ->41,872distilled tokens95.8%saved,24.1xcompression
Why it is useful: review preserves more structural and code detail for general inspection; budgeted shows how much further the bundle can shrink when you mainly want a compact triage artifact.
rich
Good demo for a medium-sized, recognizable library with many modules and a clear internal architecture.
100files,833symbols,100modules295,361estimated original tokens ->15,964distilled tokens94.6%saved,18.5xcompression66modules reached from the inferred root set
Why it is useful: the repo is large enough to make manual copy-paste review awkward, but still small enough that the bundle and IR outputs are intuitive to inspect.
Installation
pip install distillrepo
For richer static analysis, install the optional analyzers too:
pip install "distillrepo[analysis]"
Quick Start
Analyze a package directory:
distillrepo path/to/package
This writes:
path/to/package/distilled.<package>.<MMMDDYYYY>.pypath/to/package/.distillrepo/
Show help:
distillrepo --help
Print the generated bundle to stdout as well:
distillrepo path/to/package --stdout
What it Produces
1. LLM Bundle
The single-file bundle is optimized for copy-paste review in an LLM. Depending on mode, it can include:
- repo summary and review guidance
- inferred roots and top-level structure
- hotspot and cycle summaries
- selected full source
- summarized modules
- signature-only modules
Default output name:
distilled.<package>.<MMMDDYYYY>.py
2. IR Directory
The .distillrepo/ directory is the structured output for agents and tooling. It includes artifacts such as:
manifest.jsonrepo_summary.mdmodules.jsonsymbols.jsonrelationships.jsonentrypoints.jsonchunks.jsonhotspots.jsonunused_candidates.json
Use the IR when you want deterministic machine-readable structure instead of one monolithic bundle.
How to Use the Outputs
For LLM review:
- start with
distilled.<package>.<date>.py - use
reviewmode first unless you have a specific need - if the bundle still feels too large, try
architectureorbudgeted - if you need nearly raw source, use
concatorplain_concat
For agents or scripts:
- read
.distillrepo/manifest.jsonfirst - use
modules.json,symbols.json, andrelationships.jsonto find relevant code - use
chunks.jsonandhotspots.jsonto prioritize what to inspect
For manual follow-up:
- use the bundle and IR as navigation aids, then verify important conclusions against the original source
Review Modes
Recommended order:
-
reviewBest default. Balanced mix of analysis, selected full source, summaries, and signatures. -
architectureBetter when you want a high-level map of a repo before drilling into code. -
hotspotsBetter when you care most about complex or risky logic. -
entrypathBetter when you want code closest to inferred runtime or review roots. -
budgetedMore aggressive compression. Useful when context is tight and you still want a structured overview. -
concatCleaned source concatenation with lightweight headers from static analysis. Useful when you want near-source input with basic structure preserved. -
plain_concatCleaned source concatenation only. No added headers or analysis sections. -
fullLargest review bundle. Includes the analysis sections plus broad full-source inclusion. Useful for debugging the tool or getting an almost-verbatim review artifact, not for tight context budgets.
Common Scenarios
First pass on an unfamiliar repo
distillrepo path/to/package
This uses review mode, which is the recommended default.
Architecture walkthrough
distillrepo path/to/package --review-mode architecture
Use this when you want a compact map of the repo before asking the LLM deeper questions.
Focus on risky or complex code
distillrepo path/to/package --review-mode hotspots
Useful for audit-style passes and targeted review.
Near-source bundle with lightweight file markers
distillrepo path/to/package --review-mode concat
Useful when you want to preserve source fidelity but still keep file boundaries obvious.
Source only, no added headers
distillrepo path/to/package --review-mode plain_concat
Useful when you want a cleaned source dump and nothing else.
Override entry inference
distillrepo path/to/package \
--entry-point-module cli.py \
--entry-point-function main
Useful when the inferred root or entry surface is not the one you want reviewed.
Tighten scope
distillrepo path/to/package \
--exclude-dir tests \
--exclude-glob "docs/*"
Useful when the repo has too much non-essential code for the task at hand.
Stdout Summary
Each run prints a short summary so the user gets immediate value even before opening the outputs:
- files, symbols, and modules analyzed
- analysis kind
- roots analyzed
- reached vs not reached
- cycles
- possible unused symbol count
- top hotspot
- original vs distilled estimated tokens
- saved tokens, retained percentage, and compression ratio
- output paths
What To Trust
distillrepo separates directly extracted facts from heuristic judgments.
High-confidence facts:
- file paths and module paths
- line spans and signatures
- declared symbols
- static imports
- directly resolved relationships when extraction succeeds
Heuristics:
- hotspot rankings
- importance scores
- root selection and pooled root coverage
- "not reached from roots" conclusions
- unused-code candidates
- source inclusion and compression decisions
"Not reached from roots" does not mean dead code. Dynamic imports, lazy exports, plugin registration, reflection, and runtime dispatch may be underrepresented.
How It Chooses
distillrepo builds a small set of review roots, analyzes each root, then pools the results:
- application-style repos bias toward package root plus runnable entry surfaces
- library-style repos bias toward package and public subpackage roots
- shared-across-roots modules are ranked higher for review
The .distillrepo/ IR keeps the fuller pooled analysis. The single-file distilled.<package>.<date>.py bundle is the review-oriented derived artifact.
Compression Notes
The reported token counts are estimates based on text length. They are useful for comparing runs and spotting extreme compression, but they are not model-specific tokenizer counts.
There is not yet a universal compression threshold that guarantees trustworthy review quality across repos. Treat compression as an observed outcome, not the main objective. The main objective is retaining enough review-relevant structure and source to support a useful LLM review.
Example Results
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distillrepo-0.1.2.tar.gz.
File metadata
- Download URL: distillrepo-0.1.2.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.2 cpython/3.13.1 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eff6d603b57b49ab5174a5188da85bdd5ea21ed865423135666bb03b60e71a40
|
|
| MD5 |
ae9e974101ad67422c9658188b13f22c
|
|
| BLAKE2b-256 |
2dfe46286208a048dd7cf9a98d1ba333903a85b5013e4517a659b42565383ba7
|
File details
Details for the file distillrepo-0.1.2-py3-none-any.whl.
File metadata
- Download URL: distillrepo-0.1.2-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.2 cpython/3.13.1 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6250f69b494ac611d984e4e15e50aa167feb011e670a8c3b45f6cef445220ab
|
|
| MD5 |
523ef04d3fa5feb14050515f011f9ca9
|
|
| BLAKE2b-256 |
51bc96eb216afbce022461596d759ed7c0fc155bfde52bacf7f4676fd8601241
|