Pack Python codebases into Markdown optimized for LLM context delivery (pack/unpack/patch/apply)
Project description
codecrate
codecrate turns a Python repository into a Markdown "context pack" optimized for LLM consumption, with full round-trip support:
pack: repo → context.mdunpack: context.md → reconstructed filespatch: old context.md + current repo → diff-only patch.mdapply: patch.md → apply changes to repo
Features
- Markdown-native output: Generates self-contained Markdown files with syntax highlighting
- Symbol index: Quick navigation to functions and classes
- Deduplication: Optionally deduplicate identical function bodies to save tokens
- Two layout modes:
stubs: Compact file stubs with function bodies in a separate "Function Library"full: Complete file contents (no stubbing)
- Round-trip support: Reconstruct original files exactly from Markdown packs
- Diff generation: Create minimal patch Markdown files showing only changed code
- Gitignore support: Respect
.gitignorewhen scanning files - Tool ignore support: Respect
.codecrateignore(always) - Targeted packing: Optional
--stdinmode to pack an explicit file list - Token diagnostics: Optional CLI token reports (encoding, tree, top files)
Installation
pip install -e .
Or for development:
pip install -e ".[dev]"
Quick Start
Pack a Repository
Pack your current directory into context.md:
codecrate pack .
Pack with specific output file:
codecrate pack . -o my_project.md
Unpack to Reconstruct Files
Reconstruct files from a packed Markdown:
codecrate unpack context.md -o reconstructed/
Generate and Apply Patches
- Pack your repository as a baseline:
codecrate pack . -o baseline.md
-
Make changes to your code
-
Generate a patch:
codecrate patch baseline.md . -o changes.md
- Apply the patch:
codecrate apply changes.md .
Configuration
Codecrate reads config from the repository root with this precedence:
- CLI flags
.codecrate.toml/codecrate.tomlpyproject.tomlunder[tool.codecrate]
Create a .codecrate.toml or codecrate.toml file in your repository root:
[codecrate]
# File patterns to include (default: ["**/*.py"])
include = ["**/*.py"]
# File patterns to exclude
exclude = ["**/test_*.py", "**/tests/**"]
# Deduplicate identical function bodies (default: false)
dedupe = true
# Keep docstrings in stubbed file view (default: true)
keep_docstrings = true
# Respect .gitignore when scanning (default: true)
respect_gitignore = true
# Always respected when present (separate file, gitignore syntax):
# .codecrateignore
# Output layout: "auto", "stubs", or "full" (default: "auto")
# - auto: use stubs only if dedupe collapses something
# - stubs: always use stubs + Function Library
# - full: emit complete file contents
layout = "auto"
# Navigation density: "auto", "compact", or "full"
# - auto: compact for unsplit packs, full when split outputs are requested
nav_mode = "auto"
# Optional non-Python symbol extraction backend: auto|python|tree-sitter|none
# (Python files always use built-in AST parsing)
symbol_backend = "auto"
# Sensitive file filtering
security_check = true
security_content_sniff = false
# Split output into multiple files if char count exceeds this (0 = no split)
split_max_chars = 0
# Token diagnostics (CLI stderr output only; not written into context.md)
token_count_encoding = "o200k_base"
token_count_tree = false
token_count_tree_threshold = 0
top_files_len = 5
file_summary = true
Command Reference
pack - Pack Repository to Markdown
codecrate pack <root> [OPTIONS]
Options:
-o, --output PATH: Output markdown path (default:context.md)--dedupe: Deduplicate identical function bodies--layout {auto,stubs,full}: Output layout mode--nav-mode {auto,compact,full}: Navigation density mode--symbol-backend {auto,python,tree-sitter,none}: Non-Python symbol backend--keep-docstrings/--no-keep-docstrings: Keep docstrings in stubs--respect-gitignore/--no-respect-gitignore: Respect.gitignore--security-check/--no-security-check: Scan for sensitive files (set--no-security-checkto skip scanning for sensitive data like API keys and passwords)--security-content-sniff/--no-security-content-sniff: Optional content sniffing for key/token patterns--include GLOB: Include glob pattern (repeatable)--exclude GLOB: Exclude glob pattern (repeatable)--stdin: Read file paths from stdin (one per line)--split-max-chars N: Split output into.partN.mdfiles--token-count-tree [threshold]: Show file tree with token counts; optional threshold shows only files with >=N tokens (for example,--token-count-tree 100)--top-files-len N: Show top N largest files by token count--token-count-encoding NAME: Tokenizer encoding for token counting--file-summary/--no-file-summary: Enable or disable pack summary output
When --stdin is used, only stdin-listed files are considered. Include globs are
not applied, but exclude patterns and ignore files still apply.
By default, codecrate prints a compact pack summary (total files, total tokens,
total chars, output path). Disable it with --no-file-summary or
file_summary = false in config.
Code fences are automatically widened when file content contains backticks, so generated markdown remains parsable.
unpack - Reconstruct Files from Markdown
codecrate unpack <markdown> -o <out-dir>
Options:
-o, --out-dir PATH: Output directory for reconstructed files (required)
For combined packs (multiple # Repository: ... sections), files are unpacked to
<out-dir>/<repo-slug>/... per repository section.
patch - Generate Diff-Only Patch
codecrate patch <old_markdown> <root> [--repo <label-or-slug>] [OPTIONS]
Options:
--repo <label-or-slug>: Required when<old_markdown>contains multiple# Repository:sections; selects which repository baseline to diff against-o, --output PATH: Output patch markdown (default:patch.md)
apply - Apply Patch to Repository
codecrate apply <patch_markdown> <root> [--repo <label-or-slug>]
When <patch_markdown> contains multiple # Repository: sections, --repo is
required to select one section.
validate-pack - Validate Pack
codecrate validate-pack <markdown> [--root PATH]
Options:
--root PATH: Optional repo root to compare reconstructed files against
For combined packs, validation runs per repository section and reports scope-aware errors/warnings. Cross-repo anchor collisions are reported as errors.
Layout Modes
Stubs Mode (Default for auto when dedupe is effective)
Creates compact file stubs with function bodies replaced by markers:
def f(x):
... # ↪ FUNC:0F203CE2
class C:
def m(self):
... # ↪ FUNC:6F8ECF73
Function bodies are stored in a separate "Function Library" section:
## Function Library
### 0F203CE2 — `a.f` (a.py:L1–L2)
```python
def f(x):
return x + 1
```
6F8ECF73 — a.C.m (a.py:L5–L6)
def m(self):
return 42
This is ideal for:
- LLMs with limited context windows
- Repositories with duplicate code (when using `--dedupe`)
- Code review and analysis workflows
### Full Mode
Emits complete file contents without stubbing:
```python
def f(x):
return x + 1
class C:
def m(self):
return 42
This is ideal for:
- Repositories without much duplicate code
- When you need complete context in one place
- When token limits are not a concern
Workflow Example
Initial Pack
# Create a baseline pack of your repository
codecrate pack . -o baseline.md
# Send baseline.md to an LLM for analysis
# LLM can navigate using the Symbol Index
# and read full code in the Files section
Iterate with LLM
# After the LLM suggests changes, generate a patch
codecrate patch baseline.md . -o iteration1.md
# Send iteration1.md to the LLM (much smaller than full pack)
# Apply the LLM's changes
codecrate apply iteration1.md .
# Create new baseline for next iteration
codecrate pack . -o baseline.md
Advanced Usage
Packing Multiple Projects
# Pack different directories separately
codecrate pack src/backend -o backend.md
codecrate pack src/frontend -o frontend.md
# Or pack with specific include patterns
codecrate pack . --include "**/*.py" --exclude "**/migrations/**"
Handling Large Contexts
# Split into multiple files to fit context windows
codecrate pack . --split-max-chars 50000
# This creates context.md, context.part1.md, context.part2.md, etc.
Deduplication
# Enable deduplication to save tokens on duplicate code
codecrate pack . --dedupe
# Deduplication is most effective when you have:
# - Copy-pasted functions
# - Boilerplate code
# - Similar utility functions across modules
How It Works
- Discovery: Scans files according to include/exclude patterns
- Parsing: Extracts symbol information (functions, classes) using Python's AST
- Packing: Creates a structured manifest and canonical function definitions
- Rendering: Generates Markdown with directory tree, symbol index, and file contents
- Validation: Ensures round-trip consistency with SHA256 checksums
The Markdown format is designed to be:
- Self-contained: All necessary information in one file
- Navigable: Symbol index with jump links
- Reversible: Can reconstruct original files exactly
- Diff-friendly: Easy to generate minimal patches
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codecrate-0.2.0.tar.gz.
File metadata
- Download URL: codecrate-0.2.0.tar.gz
- Upload date:
- Size: 68.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e10b9e072441714adcb3c0269b195736ff50aca46f91fec0814f9bcc567cd29
|
|
| MD5 |
b6e6fea8bc56ad04ff72cd7042fa1b32
|
|
| BLAKE2b-256 |
adda1fe8f86108774944b9f74ce2bba7b7fad60de9ecab61732e61301254463d
|
File details
Details for the file codecrate-0.2.0-py3-none-any.whl.
File metadata
- Download URL: codecrate-0.2.0-py3-none-any.whl
- Upload date:
- Size: 47.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e18e3d633c78560bc0ed8b412dd9baac6a4fcf40cc82923e63702d509086c2bf
|
|
| MD5 |
15bcf720e12a45dee834951f5f42f3ea
|
|
| BLAKE2b-256 |
1d7d88c0ec3c55fa322524a71dcc425203090ce88cdd0ccc12aef374e4b79afc
|