Pack Python codebases into Markdown optimized for LLM context delivery (pack/unpack/patch/apply)

These details have not been verified by PyPI

Project links

Homepage

Project description

PyPI - Python Version PyPI - Downloads

codecrate

codecrate turns a Python repository into a Markdown "context pack" optimized for LLM consumption, with full round-trip support:

pack: repo → context.md
unpack: context.md → reconstructed files
patch: old context.md + current repo → diff-only patch.md
apply: patch.md → apply changes to repo

Features

Markdown-native output: Generates self-contained Markdown files with syntax highlighting
Symbol index: Quick navigation to functions and classes
Deduplication: Optionally deduplicate identical function bodies to save tokens
Two layout modes:
- stubs: Compact file stubs with function bodies in a separate "Function Library"
- full: Complete file contents (no stubbing)
Round-trip support: Reconstruct original files exactly from Markdown packs
Diff generation: Create minimal patch Markdown files showing only changed code
Gitignore support: Respect .gitignore when scanning files
Tool ignore support: Respect .codecrateignore (always)
Targeted packing: Optional --stdin / --stdin0 mode to pack an explicit file list
Debug visibility: Optional --print-files and --print-skipped diagnostics
Token diagnostics: Optional CLI token reports (encoding, tree, top files)
Scale controls: Per-file skip budgets and hard total budgets (bytes/tokens)
Machine header: Compact checksum block for fast manifest validation
Tooling manifests: Optional JSON manifest sidecar output (--manifest-json)
Safety controls: Configurable path/content scanning rules, optional redaction, optional safety report
Environment diagnostics: codecrate doctor reports config precedence, ignore files, and backend availability

Installation

pip install -e .

Or for development:

pip install -e ".[dev]"

Quick Start

Pack a Repository

Pack your current directory into context.md:

codecrate pack .

Pack with specific output file:

codecrate pack . -o my_project.md

Unpack to Reconstruct Files

Reconstruct files from a packed Markdown:

codecrate unpack context.md -o reconstructed/

Generate and Apply Patches

Pack your repository as a baseline:

codecrate pack . -o baseline.md

Make changes to your code
Generate a patch:

codecrate patch baseline.md . -o changes.md

Apply the patch:

codecrate apply changes.md .

Configuration

Codecrate reads config from the repository root with this precedence:

CLI flags
.codecrate.toml / codecrate.toml
pyproject.toml under [tool.codecrate]

Create a .codecrate.toml or codecrate.toml file in your repository root:

[codecrate]
# File patterns to include (default: ["**/*.py"])
include = ["**/*.py"]

# File patterns to exclude
exclude = ["**/test_*.py", "**/tests/**"]

# Deduplicate identical function bodies (default: false)
dedupe = true

# Keep docstrings in stubbed file view (default: true)
keep_docstrings = true

# Respect .gitignore when scanning (default: true)
respect_gitignore = true

# Always respected when present (separate file, gitignore syntax):
# .codecrateignore

# Output layout: "auto", "stubs", or "full" (default: "auto")
# - auto: use stubs only if dedupe collapses something
# - stubs: always use stubs + Function Library
# - full: emit complete file contents
layout = "auto"

# Navigation density: "auto", "compact", or "full"
# - auto: compact for unsplit packs, full when split outputs are requested
nav_mode = "auto"

# Optional non-Python symbol extraction backend: auto|python|tree-sitter|none
# (Python files always use built-in AST parsing)
symbol_backend = "auto"

# Sensitive file filtering
security_check = true
security_content_sniff = false
security_redaction = false
safety_report = false
security_path_patterns = [".env", "*.pem", "*secrets*"]
security_content_patterns = [
  "private-key=(?i)-----BEGIN\\s+[A-Z ]*PRIVATE KEY-----",
  "aws-access-key-id=\\b(?:AKIA|ASIA)[0-9A-Z]{16}\\b",
]

# Split output into multiple files if char count exceeds this (0 = no split)
split_max_chars = 0

# Token diagnostics (CLI stderr output only; not written into context.md)
token_count_encoding = "o200k_base"
token_count_tree = false
token_count_tree_threshold = 0
top_files_len = 5

# Scale / performance controls
# - per-file limits skip files with a warning
# - total limits fail the run when exceeded
max_file_bytes = 0
max_total_bytes = 0
max_file_tokens = 0
max_total_tokens = 0

# Worker threads for IO/parsing/token counting (0 = auto)
max_workers = 0
file_summary = true

Command Reference

`pack` - Pack Repository to Markdown

codecrate pack <root> [OPTIONS]

Options:

-o, --output PATH: Output markdown path (default: context.md)
--dedupe: Deduplicate identical function bodies
--layout {auto,stubs,full}: Output layout mode
--nav-mode {auto,compact,full}: Navigation density mode
--symbol-backend {auto,python,tree-sitter,none}: Non-Python symbol backend
--keep-docstrings / --no-keep-docstrings: Keep docstrings in stubs
--respect-gitignore / --no-respect-gitignore: Respect .gitignore
--security-check / --no-security-check: Scan for sensitive files (set --no-security-check to skip scanning for sensitive data like API keys and passwords)
--security-content-sniff / --no-security-content-sniff: Optional content sniffing for key/token patterns
--security-redaction / --no-security-redaction: Redact flagged files instead of skipping them
--safety-report / --no-safety-report: Include Safety Report section in output
--security-path-pattern PATTERN: Override path rule set (repeatable)
--security-content-pattern RULE: Override content rule set (repeatable; name=regex or regex)
--include GLOB: Include glob pattern (repeatable)
--exclude GLOB: Exclude glob pattern (repeatable)
--stdin: Read file paths from stdin (one per line)
--stdin0: Read file paths from stdin as NUL-separated entries
--print-files: Debug-print selected files after filtering
--print-skipped: Debug-print skipped files and reasons
--split-max-chars N: Split output into .partN.md files
--token-count-tree [threshold]: Show file tree with token counts; optional threshold shows only files with >=N tokens (for example, --token-count-tree 100)
--top-files-len N: Show top N largest files by token count
--token-count-encoding NAME: Tokenizer encoding for token counting
--file-summary / --no-file-summary: Enable or disable pack summary output
--max-file-bytes N: Skip files above this byte limit
--max-total-bytes N: Fail if included files exceed this byte limit
--max-file-tokens N: Skip files above this token limit
--max-total-tokens N: Fail if included files exceed this token limit
--max-workers N: Max worker threads for IO/parsing/token counting
--manifest-json [PATH]: Write manifest JSON for tooling

When --stdin/--stdin0 is used, only explicitly listed files are considered. Include globs are not applied, but exclude patterns and ignore files still apply. With --print-skipped, explicit-file filtering also reports reasons such as not-a-file, outside-root, duplicate, ignored, and excluded.

By default, codecrate prints a compact pack summary (total files, total tokens, total chars, output path). Disable it with --no-file-summary or file_summary = false in config.

If tokenization backend initialization fails, codecrate falls back to heuristic token counting and still prints top-N largest file summaries.

Code fences are automatically widened when file content contains backticks, so generated markdown remains parsable.

When redaction is enabled, flagged files are kept in the pack with masked content. Use --safety-report to include file-level actions/reasons (skipped/redacted).

`unpack` - Reconstruct Files from Markdown

codecrate unpack <markdown> -o <out-dir>

Options:

-o, --out-dir PATH: Output directory for reconstructed files (required)
--strict: Fail when marker-based reconstruction cannot be fully resolved

For combined packs (multiple # Repository: ... sections), files are unpacked to <out-dir>/<repo-slug>/... per repository section.

`patch` - Generate Diff-Only Patch

codecrate patch <old_markdown> <root> [--repo <label-or-slug>] [OPTIONS]

Options:

--repo <label-or-slug>: Required when <old_markdown> contains multiple # Repository: sections; selects which repository baseline to diff against
-o, --output PATH: Output patch markdown (default: patch.md)

`apply` - Apply Patch to Repository

codecrate apply <patch_markdown> <root> [--repo <label-or-slug>] [--dry-run]

When <patch_markdown> contains multiple # Repository: sections, --repo is required to select one section.

Use --dry-run to parse and validate hunks without writing files.

`validate-pack` - Validate Pack

codecrate validate-pack <markdown> [--root PATH] [--strict]

Options:

--root PATH: Optional repo root to compare reconstructed files against
--strict: Treat unresolved marker mapping as validation errors

For combined packs, validation runs per repository section and reports scope-aware errors/warnings grouped by section, with short reproduction hints. Cross-repo anchor collisions are reported as errors.

If a pack was created with --no-manifest, machine operations (unpack, patch, validate-pack) fail with a consistent hint to re-pack with manifest enabled.

`doctor` - Environment Diagnostics

codecrate doctor [root]

Reports:

config discovery and precedence (.codecrate.toml > codecrate.toml > pyproject.toml)
detected ignore files (.gitignore, .codecrateignore)
token backend availability and encoding probe
optional parsing backend availability (tree-sitter)

Layout Modes

Stubs Mode (Default for `auto` when dedupe is effective)

Creates compact file stubs with function bodies replaced by markers:

def f(x):
    ...  # ↪ FUNC:v1:0F203CE2

class C:
    def m(self):
        ...  # ↪ FUNC:v1:6F8ECF73

Function bodies are stored in a separate "Function Library" section:

## Function Library

### 0F203CE2 — `a.f` (a.py:L1–L2)

```python
def f(x):
    return x + 1
```

6F8ECF73 — `a.C.m` (a.py:L5–L6)

    def m(self):
        return 42


This is ideal for:
- LLMs with limited context windows
- Repositories with duplicate code (when using `--dedupe`)
- Code review and analysis workflows

### Full Mode

Emits complete file contents without stubbing:

```python
def f(x):
    return x + 1

class C:
    def m(self):
        return 42

This is ideal for:

Repositories without much duplicate code
When you need complete context in one place
When token limits are not a concern

Workflow Example

Initial Pack

# Create a baseline pack of your repository
codecrate pack . -o baseline.md

# Send baseline.md to an LLM for analysis
# LLM can navigate using the Symbol Index
# and read full code in the Files section

Iterate with LLM

# After the LLM suggests changes, generate a patch
codecrate patch baseline.md . -o iteration1.md

# Send iteration1.md to the LLM (much smaller than full pack)
# Apply the LLM's changes
codecrate apply iteration1.md .

# Create new baseline for next iteration
codecrate pack . -o baseline.md

Advanced Usage

Packing Multiple Projects

# Pack different directories separately
codecrate pack src/backend -o backend.md
codecrate pack src/frontend -o frontend.md

# Or pack with specific include patterns
codecrate pack . --include "**/*.py" --exclude "**/migrations/**"

Handling Large Contexts

# Split into multiple files to fit context windows
codecrate pack . --split-max-chars 50000

# This creates context.md, context.part1.md, context.part2.md, etc.

# Skip single huge files, but fail if remaining total is still too large
codecrate pack . --max-file-bytes 200000 --max-total-bytes 4000000

# Same idea for token budgets
codecrate pack . --max-file-tokens 5000 --max-total-tokens 120000

Deduplication

# Enable deduplication to save tokens on duplicate code
codecrate pack . --dedupe

# Deduplication is most effective when you have:
# - Copy-pasted functions
# - Boilerplate code
# - Similar utility functions across modules

How It Works

Discovery: Scans files according to include/exclude patterns
Parsing: Extracts symbol information (functions, classes) using Python's AST
Packing: Creates a structured manifest and canonical function definitions
Rendering: Generates Markdown with directory tree, symbol index, and file contents
Validation: Ensures round-trip consistency with SHA256 checksums

The Markdown format is designed to be:

Self-contained: All necessary information in one file
Navigable: Symbol index with jump links
Reversible: Can reconstruct original files exactly
Diff-friendly: Easy to generate minimal patches

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.4.4

Apr 30, 2026

0.4.3

Apr 30, 2026

0.4.2

Apr 18, 2026

0.4.1

Apr 12, 2026

0.4.0

Apr 12, 2026

0.3.4

Feb 16, 2026

0.3.3

Feb 7, 2026

0.3.2

Feb 7, 2026

This version

0.3.1

Feb 7, 2026

0.3.0

Feb 7, 2026

0.2.0

Feb 6, 2026

0.1.3

Feb 3, 2026

0.1.2

Feb 3, 2026

0.1.1

Feb 1, 2026

0.1.0

Jan 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codecrate-0.3.1.tar.gz (91.4 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

codecrate-0.3.1-py3-none-any.whl (60.7 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file codecrate-0.3.1.tar.gz.

File metadata

Download URL: codecrate-0.3.1.tar.gz
Upload date: Feb 7, 2026
Size: 91.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for codecrate-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`33d18fe04ce881f52bae8e8e75cc40a8fb652a623c2ff1e99a9a3fd71af2fdc1`
MD5	`6004a0f28faf63bd5e7e3e201d0ee4a2`
BLAKE2b-256	`f0bdfb837133836ec239db6f83cf767bb10ec20446e502fe6d8df30804dfd783`

See more details on using hashes here.

File details

Details for the file codecrate-0.3.1-py3-none-any.whl.

File metadata

Download URL: codecrate-0.3.1-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 60.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for codecrate-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31ff2ce891144895c719ac51d27c973c378eb4cd7e0a8bbd5d57beb60b87997a`
MD5	`914a20599db9131e6fff087315c23fe6`
BLAKE2b-256	`d7f5ebd76a5ddca01dfa3b9e08e34b6fa868a34f4ddcbbfd5120a82c6fa2bdc3`

See more details on using hashes here.

codecrate 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

codecrate

Features

Installation

Quick Start

Pack a Repository

Unpack to Reconstruct Files

Generate and Apply Patches

Configuration

Command Reference

pack - Pack Repository to Markdown

unpack - Reconstruct Files from Markdown

patch - Generate Diff-Only Patch

apply - Apply Patch to Repository

validate-pack - Validate Pack

doctor - Environment Diagnostics

Layout Modes

Stubs Mode (Default for auto when dedupe is effective)

6F8ECF73 — a.C.m (a.py:L5–L6)

Workflow Example

Initial Pack

Iterate with LLM

Advanced Usage

Packing Multiple Projects

Handling Large Contexts

Deduplication

How It Works

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`pack` - Pack Repository to Markdown

`unpack` - Reconstruct Files from Markdown

`patch` - Generate Diff-Only Patch

`apply` - Apply Patch to Repository

`validate-pack` - Validate Pack

`doctor` - Environment Diagnostics

Stubs Mode (Default for `auto` when dedupe is effective)

6F8ECF73 — `a.C.m` (a.py:L5–L6)