Auto-generated code comprehension for AI-assisted development โ documentation as a development cadence.
Project description
๐ CodeLedger
Auto-generated code comprehension for AI-assisted development
Documentation as a development cadence โ not an afterthought.
Why CodeLedger?
AI-assisted ("vibe") coding moves fast. Code evolves through rapid iteration with LLMs, and traditional documentation can't keep up. Six months later, you (or a new team member) opens the project and has no idea why anything was built the way it was.
CodeLedger generates structured documentation at configurable intervals during development โ capturing architecture decisions, component logic, and integration patterns while your project evolves, not after.
Who is this for?
- Solo devs who vibe-code with AI and want to remember what they built
- Teams onboarding new members to AI-iterated codebases
- Anyone tired of writing docs after the fact (so... everyone?)
How It Works
Your Code โ Scan โ Parse โ Classify โ Compress โ Generate โ Doc
โ
Multiple Docs โ Merge โ Final Documentation
- Scan โ Walks your project, respects
.gitignore, builds a file manifest - Snapshot โ Hash-based change detection (no git required)
- Parse โ AST analysis (Python) or regex extraction (JS/TS, Java, Go, Rust)
- Classify โ Determines session scope: trivial โ minor โ standard โ major โ refactor
- Compress โ Token-efficient representation within your model's budget
- Generate โ Sends structured prompt to Anthropic, OpenAI, or Ollama
- Merge โ Combines multiple doc snapshots into a single conceptualized document
Quick Start
Install
pip install codeledger
Initialize
cd your-project
codeledger init --preset python_api
This creates .codeledger/config.yaml with sensible defaults for your project type.
Generate Documentation
codeledger generate
CodeLedger scans your code, detects what changed, classifies the session, and generates a structured doc snapshot.
Merge Into Final Docs
codeledger merge
Combines all generated snapshots into a single DOCUMENTATION.md.
Other Commands
codeledger status # Show project status and doc history
codeledger diff # Show changes since last snapshot
codeledger explain pd_001 # Display a specific doc by ID
codeledger version # Show version
Configuration
After codeledger init, edit .codeledger/config.yaml:
project:
name: my-project
language: python
type: api
cadence:
n_value: 5 # Generate every N interactions
trigger: manual # manual | file_watch | time_interval
model:
tier: api
provider: anthropic
model_name: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
max_input_tokens: 3000
max_output_tokens: 5000
focus:
include_patterns:
- "**/*.py"
exclude_patterns:
- "tests/**"
- "__pycache__/**"
highlight:
- "src/core/engine.py" # Pay extra attention to these files
Presets
Start fast with a preset that matches your project:
| Preset | Use Case |
|---|---|
python_api |
Python REST/GraphQL APIs |
react_frontend |
React/Next.js frontends |
fullstack |
Full-stack applications |
data_pipeline |
ETL and data processing |
ml_research |
ML/AI research projects |
cli_tool |
Command-line tools |
minimal |
Bare minimum setup |
codeledger init --preset fullstack --name my-app
Model Support
| Provider | Tier | Setup |
|---|---|---|
| Anthropic | Cloud | Set ANTHROPIC_API_KEY env var |
| OpenAI | Cloud | Set OPENAI_API_KEY env var |
| Ollama | Local (free) | Install Ollama, pull a model |
Using Ollama (free, runs locally)
model:
tier: local
provider: ollama
model_name: llama3.1
max_output_tokens: 5000
No API key needed โ runs entirely on your machine.
Smart Session Classification
CodeLedger doesn't waste tokens on tiny changes. It classifies each session to calibrate documentation depth:
| Type | When | Token Budget | What Happens |
|---|---|---|---|
| Trivial | <2 files, <30 lines | 0 | Deferred and batched |
| Minor | <5 files, <150 lines | ~1.5K | Micro-doc generated |
| Standard | <15 files, <500 lines | ~5K | Full documentation |
| Major | 15+ files, 500+ lines | ~8K | Comprehensive deep-dive |
| Refactor | Many deletes + creates | ~3K | Refactor-focused analysis |
Trivial sessions are automatically deferred and batched until they accumulate enough significance โ so you only pay for docs when they matter.
No Git Required
CodeLedger uses its own Snapshot Engine โ SHA-256 hashing of file contents for change detection. Git is completely optional.
This means it works for:
- Projects without version control
- Quick prototypes and experiments
- Environments where git isn't available
- Vibe coding sessions where you just want to build
What Gets Documented
Each generated doc includes up to 9 configurable sections:
| Section | What It Captures |
|---|---|
| Phase Execution Summary | What was built and current status |
| Code Architecture | File tree and structural overview |
| Decision Rationale | Why things were built this way |
| Component Logic | How non-obvious parts work |
| Integration & Data Flow | How components connect |
| Edge Cases & Error Handling | Boundary conditions and failure modes |
| Interview & Learning Notes | Q&A format insights |
| Technical Debt | Known issues and future work |
| Quick Reference | Common commands and entry points |
Sections are prioritized (P1/P2/P3) and automatically trimmed to fit your token budget.
Architecture
src/codeledger/
โโโ config/ # Pydantic schema, YAML loader, 7 presets
โโโ scanner/ # File scanner, snapshot engine, dependency resolver, Change DAG
โโโ parser/ # Python AST parser + regex fallback for JS/TS, Java, Go, Rust
โโโ classifier/ # Rule-based session classification with deferred batching
โโโ compressor/ # Token compression and budget-aware scope trimming
โโโ generator/ # Prompt builder, model router, API + local clients
โโโ postprocess/ # Output validation, formatting, file management
โโโ merge/ # Multi-doc extraction, deduplication, merge engine
โโโ templates/ # 4 prompt templates + 2 Jinja2 output templates
โโโ cli.py # Typer CLI entry point
Key design decisions:
- No git dependency โ Snapshot engine uses SHA-256 hashing
- Change DAG โ Dependency graph propagation for token-efficient scoping
- Budget-aware pipeline โ Every stage respects the configured token limit
- Validation layer โ Catches hallucinated file paths, checks section coverage
Development
git clone https://github.com/codeledger/codeledger.git
cd codeledger
pip install -e ".[dev]"
pytest # Run tests
ruff check src/ tests/ # Lint
ruff format src/ tests/ # Format
mypy src/codeledger/ # Type check
Roadmap
- Core pipeline (scan โ parse โ classify โ compress โ generate)
- Snapshot engine (git-free change detection)
- Change DAG with dependency propagation
- Session classifier with deferred batching
- Multi-model support (Anthropic, OpenAI, Ollama)
- Merge engine with deduplication
- File watcher mode for automatic triggers
- Tree-sitter parsers for deeper JS/TS and Java analysis
- MkDocs documentation site
- VS Code extension
License
MIT โ Use it however you want.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codeledger-0.1.5.tar.gz.
File metadata
- Download URL: codeledger-0.1.5.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84c966fe986b400a002f1470ea2ef1571ea81c43e02540c5994b3633119161ed
|
|
| MD5 |
583f79dd37ec1c3416db3ce337c62a40
|
|
| BLAKE2b-256 |
784e4d2c9d361442b5138716e3e3f64dda8006457a7196097e2117e495578f4b
|
Provenance
The following attestation bundles were made for codeledger-0.1.5.tar.gz:
Publisher:
release.yml on Parth-Vyas000/CodeLedger
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codeledger-0.1.5.tar.gz -
Subject digest:
84c966fe986b400a002f1470ea2ef1571ea81c43e02540c5994b3633119161ed - Sigstore transparency entry: 1505680621
- Sigstore integration time:
-
Permalink:
Parth-Vyas000/CodeLedger@71b94999cbcd72862457d1e787c4289963762d19 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/Parth-Vyas000
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@71b94999cbcd72862457d1e787c4289963762d19 -
Trigger Event:
push
-
Statement type:
File details
Details for the file codeledger-0.1.5-py3-none-any.whl.
File metadata
- Download URL: codeledger-0.1.5-py3-none-any.whl
- Upload date:
- Size: 62.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55d8ebfa70cfabbd2316fde0b8f4df24eab664cd1e619761728031983c6de8e4
|
|
| MD5 |
22f15ecf6ea76d8d7c2bd5157ff36c90
|
|
| BLAKE2b-256 |
0b1bcbeaf31517ebde185c7a39c81a6322eb31f257678bf342f25daa741a5154
|
Provenance
The following attestation bundles were made for codeledger-0.1.5-py3-none-any.whl:
Publisher:
release.yml on Parth-Vyas000/CodeLedger
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codeledger-0.1.5-py3-none-any.whl -
Subject digest:
55d8ebfa70cfabbd2316fde0b8f4df24eab664cd1e619761728031983c6de8e4 - Sigstore transparency entry: 1505680829
- Sigstore integration time:
-
Permalink:
Parth-Vyas000/CodeLedger@71b94999cbcd72862457d1e787c4289963762d19 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/Parth-Vyas000
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@71b94999cbcd72862457d1e787c4289963762d19 -
Trigger Event:
push
-
Statement type: