Skip to main content

Distill Python repositories into compact review bundles for LLMs and structured IR for agents.

Project description

distillrepo

distillrepo distills Python repositories into compact review bundles for LLMs and structured IR for agents.

Default outputs:

  • distilled.<package>.<MMMDDYYYY>.py: a single-file bundle for LLM review
  • <package_root>/.distillrepo/: a structured Intermediate Representation (IR) for agents and downstream tooling

Project page: https://github.com/raamana/distillrepo

Install:

pip install distillrepo

Run:

distillrepo path/to/package

Why use distillrepo

Large repos are awkward to review with an LLM if you only have two bad options:

  • paste raw source and waste context
  • paste a vague summary and lose important detail

distillrepo sits in the middle:

  • it preserves real code for the most relevant parts
  • compresses lower-priority areas into summaries or signatures
  • keeps a structured IR for retrieval, ranking, and follow-up analysis

Installation

pip install distillrepo

For richer static analysis, install the optional analyzers too:

pip install "distillrepo[analysis]"

Quick Start

Analyze a package directory:

distillrepo path/to/package

This writes:

  • path/to/package/distilled.<package>.<MMMDDYYYY>.py
  • path/to/package/.distillrepo/

Show help:

distillrepo --help

Print the generated bundle to stdout as well:

distillrepo path/to/package --stdout

What it Produces

1. LLM Bundle

The single-file bundle is optimized for copy-paste review in an LLM. Depending on mode, it can include:

  • repo summary and review guidance
  • inferred roots and top-level structure
  • hotspot and cycle summaries
  • selected full source
  • summarized modules
  • signature-only modules

Default output name:

distilled.<package>.<MMMDDYYYY>.py

2. IR Directory

The .distillrepo/ directory is the structured output for agents and tooling. It includes artifacts such as:

  • manifest.json
  • repo_summary.md
  • modules.json
  • symbols.json
  • relationships.json
  • entrypoints.json
  • chunks.json
  • hotspots.json
  • unused_candidates.json

Use the IR when you want deterministic machine-readable structure instead of one monolithic bundle.

How to Use the Outputs

For LLM review:

  • start with distilled.<package>.<date>.py
  • use review mode first unless you have a specific need
  • if the bundle still feels too large, try architecture or budgeted
  • if you need nearly raw source, use concat or plain_concat

For agents or scripts:

  • read .distillrepo/manifest.json first
  • use modules.json, symbols.json, and relationships.json to find relevant code
  • use chunks.json and hotspots.json to prioritize what to inspect

For manual follow-up:

  • use the bundle and IR as navigation aids, then verify important conclusions against the original source

Review Modes

Recommended order:

  • review Best default. Balanced mix of analysis, selected full source, summaries, and signatures.

  • architecture Better when you want a high-level map of a repo before drilling into code.

  • hotspots Better when you care most about complex or risky logic.

  • entrypath Better when you want code closest to inferred runtime or review roots.

  • budgeted More aggressive compression. Useful when context is tight and you still want a structured overview.

  • concat Cleaned source concatenation with lightweight headers from static analysis. Useful when you want near-source input with basic structure preserved.

  • plain_concat Cleaned source concatenation only. No added headers or analysis sections.

  • full Largest review bundle. Includes the analysis sections plus broad full-source inclusion. Useful for debugging the tool or getting an almost-verbatim review artifact, not for tight context budgets.

Common Scenarios

First pass on an unfamiliar repo

distillrepo path/to/package

This uses review mode, which is the recommended default.

Architecture walkthrough

distillrepo path/to/package --review-mode architecture

Use this when you want a compact map of the repo before asking the LLM deeper questions.

Focus on risky or complex code

distillrepo path/to/package --review-mode hotspots

Useful for audit-style passes and targeted review.

Near-source bundle with lightweight file markers

distillrepo path/to/package --review-mode concat

Useful when you want to preserve source fidelity but still keep file boundaries obvious.

Source only, no added headers

distillrepo path/to/package --review-mode plain_concat

Useful when you want a cleaned source dump and nothing else.

Override entry inference

distillrepo path/to/package \
  --entry-point-module cli.py \
  --entry-point-function main

Useful when the inferred root or entry surface is not the one you want reviewed.

Tighten scope

distillrepo path/to/package \
  --exclude-dir tests \
  --exclude-glob "docs/*"

Useful when the repo has too much non-essential code for the task at hand.

Stdout Summary

Each run prints a short summary so the user gets immediate value even before opening the outputs:

  • files, symbols, and modules analyzed
  • analysis kind
  • roots analyzed
  • reached vs not reached
  • cycles
  • possible unused symbol count
  • top hotspot
  • original vs distilled estimated tokens
  • saved tokens, retained percentage, and compression ratio
  • output paths

What To Trust

distillrepo separates directly extracted facts from heuristic judgments.

High-confidence facts:

  • file paths and module paths
  • line spans and signatures
  • declared symbols
  • static imports
  • directly resolved relationships when extraction succeeds

Heuristics:

  • hotspot rankings
  • importance scores
  • root selection and pooled root coverage
  • "not reached from roots" conclusions
  • unused-code candidates
  • source inclusion and compression decisions

"Not reached from roots" does not mean dead code. Dynamic imports, lazy exports, plugin registration, reflection, and runtime dispatch may be underrepresented.

How It Chooses

distillrepo builds a small set of review roots, analyzes each root, then pools the results:

  • application-style repos bias toward package root plus runnable entry surfaces
  • library-style repos bias toward package and public subpackage roots
  • shared-across-roots modules are ranked higher for review

The .distillrepo/ IR keeps the fuller pooled analysis. The single-file distilled.<package>.<date>.py bundle is the review-oriented derived artifact.

Compression Notes

The reported token counts are estimates based on text length. They are useful for comparing runs and spotting extreme compression, but they are not model-specific tokenizer counts.

There is not yet a universal compression threshold that guarantees trustworthy review quality across repos. Treat compression as an observed outcome, not the main objective. The main objective is retaining enough review-relevant structure and source to support a useful LLM review.

Example Results

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distillrepo-0.1.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distillrepo-0.1-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file distillrepo-0.1.tar.gz.

File metadata

  • Download URL: distillrepo-0.1.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.2 cpython/3.13.1 HTTPX/0.28.1

File hashes

Hashes for distillrepo-0.1.tar.gz
Algorithm Hash digest
SHA256 01a292735b5080f34ccb2903a56d94e3d1b56eaa72cc14ebb00e5ee1e23e4795
MD5 be3b27964885915251fa09a240bb7802
BLAKE2b-256 d18f244b6aa971168b75c602f6d63d5899de785eebed3056c7e0e597330ecfae

See more details on using hashes here.

File details

Details for the file distillrepo-0.1-py3-none-any.whl.

File metadata

  • Download URL: distillrepo-0.1-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.2 cpython/3.13.1 HTTPX/0.28.1

File hashes

Hashes for distillrepo-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0173364aa5e039e37a79edd2e02a335ba6f42c01771afbee9524da2a19831d1f
MD5 fb115ab5794a6def0a66cf5272aa5637
BLAKE2b-256 81ee32629f570329ef23f3f1f402e504f696fc3c14f57b61a98d60e58a2b8700

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page