Skip to main content

Pack Python codebases into Markdown optimized for LLM context delivery (pack/unpack/patch/apply)

Project description

PyPI - Version PyPI - Python Version PyPI - Downloads codecov

codecrate

codecrate turns a Python repository into a Markdown "context pack" optimized for LLM consumption, with full round-trip support:

  • pack: repo → context.md
  • unpack: context.md → reconstructed files
  • patch: old context.md + current repo → diff-only patch.md
  • apply: patch.md → apply changes to repo

Features

  • Markdown-native output: Generates self-contained Markdown files with syntax highlighting
  • Symbol index: Quick navigation to functions and classes
  • Deduplication: Optionally deduplicate identical function bodies to save tokens
  • Two layout modes:
    • stubs: Compact file stubs with function bodies in a separate "Function Library"
    • full: Complete file contents (no stubbing)
  • Round-trip support: Reconstruct original files exactly from Markdown packs
  • Diff generation: Create minimal patch Markdown files showing only changed code
  • Gitignore support: Respect .gitignore when scanning files

Installation

pip install -e .

Or for development:

pip install -e ".[dev]"

Quick Start

Pack a Repository

Pack your current directory into context.md:

codecrate pack .

Pack with specific output file:

codecrate pack . -o my_project.md

Unpack to Reconstruct Files

Reconstruct files from a packed Markdown:

codecrate unpack context.md -o reconstructed/

Generate and Apply Patches

  1. Pack your repository as a baseline:
codecrate pack . -o baseline.md
  1. Make changes to your code

  2. Generate a patch:

codecrate patch baseline.md . -o changes.md
  1. Apply the patch:
codecrate apply changes.md .

Configuration

Create a codecrate.toml file in your repository root:

[codecrate]
# File patterns to include (default: ["**/*.py"])
include = ["**/*.py"]

# File patterns to exclude
exclude = ["**/test_*.py", "**/tests/**"]

# Deduplicate identical function bodies (default: false)
dedupe = true

# Keep docstrings in stubbed file view (default: true)
keep_docstrings = true

# Respect .gitignore when scanning (default: true)
respect_gitignore = true

# Output layout: "auto", "stubs", or "full" (default: "auto")
# - auto: use stubs only if dedupe collapses something
# - stubs: always use stubs + Function Library
# - full: emit complete file contents
layout = "auto"

# Split output into multiple files if char count exceeds this (0 = no split)
split_max_chars = 0

Command Reference

pack - Pack Repository to Markdown

codecrate pack <root> [OPTIONS]

Options:

  • -o, --output PATH: Output markdown path (default: context.md)
  • --dedupe: Deduplicate identical function bodies
  • --layout {auto,stubs,full}: Output layout mode
  • --keep-docstrings / --no-keep-docstrings: Keep docstrings in stubs
  • --respect-gitignore / --no-respect-gitignore: Respect .gitignore
  • --include GLOB: Include glob pattern (repeatable)
  • --exclude GLOB: Exclude glob pattern (repeatable)
  • --split-max-chars N: Split output into .partN.md files

unpack - Reconstruct Files from Markdown

codecrate unpack <markdown> -o <out-dir>

Options:

  • -o, --out-dir PATH: Output directory for reconstructed files (required)

patch - Generate Diff-Only Patch

codecrate patch <old_markdown> <root> [OPTIONS]

Options:

  • -o, --output PATH: Output patch markdown (default: patch.md)

apply - Apply Patch to Repository

codecrate apply <patch_markdown> <root>

validate-pack - Validate Pack

codecrate validate-pack <markdown> [--root PATH]

Options:

  • --root PATH: Optional repo root to compare reconstructed files against

Layout Modes

Stubs Mode (Default for auto when dedupe is effective)

Creates compact file stubs with function bodies replaced by markers:

def f(x):
    ...  # ↪ FUNC:0F203CE2

class C:
    def m(self):
        ...  # ↪ FUNC:6F8ECF73

Function bodies are stored in a separate "Function Library" section:

## Function Library

### 0F203CE2 — `a.f` (a.py:L1–L2)

```python
def f(x):
    return x + 1
```

6F8ECF73 — a.C.m (a.py:L5–L6)

    def m(self):
        return 42

This is ideal for:
- LLMs with limited context windows
- Repositories with duplicate code (when using `--dedupe`)
- Code review and analysis workflows

### Full Mode

Emits complete file contents without stubbing:

```python
def f(x):
    return x + 1

class C:
    def m(self):
        return 42

This is ideal for:

  • Repositories without much duplicate code
  • When you need complete context in one place
  • When token limits are not a concern

Workflow Example

Initial Pack

# Create a baseline pack of your repository
codecrate pack . -o baseline.md

# Send baseline.md to an LLM for analysis
# LLM can navigate using the Symbol Index
# and read full code in the Files section

Iterate with LLM

# After the LLM suggests changes, generate a patch
codecrate patch baseline.md . -o iteration1.md

# Send iteration1.md to the LLM (much smaller than full pack)
# Apply the LLM's changes
codecrate apply iteration1.md .

# Create new baseline for next iteration
codecrate pack . -o baseline.md

Advanced Usage

Packing Multiple Projects

# Pack different directories separately
codecrate pack src/backend -o backend.md
codecrate pack src/frontend -o frontend.md

# Or pack with specific include patterns
codecrate pack . --include "**/*.py" --exclude "**/migrations/**"

Handling Large Contexts

# Split into multiple files to fit context windows
codecrate pack . --split-max-chars 50000

# This creates context.md, context.part1.md, context.part2.md, etc.

Deduplication

# Enable deduplication to save tokens on duplicate code
codecrate pack . --dedupe

# Deduplication is most effective when you have:
# - Copy-pasted functions
# - Boilerplate code
# - Similar utility functions across modules

How It Works

  1. Discovery: Scans files according to include/exclude patterns
  2. Parsing: Extracts symbol information (functions, classes) using Python's AST
  3. Packing: Creates a structured manifest and canonical function definitions
  4. Rendering: Generates Markdown with directory tree, symbol index, and file contents
  5. Validation: Ensures round-trip consistency with SHA256 checksums

The Markdown format is designed to be:

  • Self-contained: All necessary information in one file
  • Navigable: Symbol index with jump links
  • Reversible: Can reconstruct original files exactly
  • Diff-friendly: Easy to generate minimal patches

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codecrate-0.1.2.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codecrate-0.1.2-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file codecrate-0.1.2.tar.gz.

File metadata

  • Download URL: codecrate-0.1.2.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for codecrate-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3b7b1b80a941ea333037405da1357ee1f1779a30675f5e7f129cbf050ac775b9
MD5 41b0d1a989a5323ac39f74f099108e67
BLAKE2b-256 8b5cb2e9a02dc2a4021cce2b7383fca0caa563d0640dd71f211b82ed8bf17f86

See more details on using hashes here.

File details

Details for the file codecrate-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: codecrate-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for codecrate-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 414be46c1cc9e3d3eb7b73f1e9fbe9df277d87d77b0ab099d10d31a942b8e275
MD5 396f3bb947d6fbbc84f9bd7a146d3865
BLAKE2b-256 61ea77a419b5a42a5f4877ff7e74b67eaaff7acbf6b30bb6faef143b4cd5f4e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page