Skip to main content

Auto-generated code comprehension for AI-assisted development โ€” documentation as a development cadence.

Project description

๐Ÿ“„ CodeLedger

Auto-generated code comprehension for AI-assisted development

PyPI Python License: MIT

Documentation as a development cadence โ€” not an afterthought.


Why CodeLedger?

AI-assisted ("vibe") coding moves fast. Code evolves through rapid iteration with LLMs, and traditional documentation can't keep up. Six months later, you (or a new team member) opens the project and has no idea why anything was built the way it was.

CodeLedger generates structured documentation at configurable intervals during development โ€” capturing architecture decisions, component logic, and integration patterns while your project evolves, not after.

Who is this for?

  • Solo devs who vibe-code with AI and want to remember what they built
  • Teams onboarding new members to AI-iterated codebases
  • Anyone tired of writing docs after the fact (so... everyone?)

How It Works

Your Code โ†’ Scan โ†’ Parse โ†’ Classify โ†’ Compress โ†’ Generate โ†’ Doc
                                                     โ†“
                              Multiple Docs โ†’ Merge โ†’ Final Documentation
  1. Scan โ€” Walks your project, respects .gitignore, builds a file manifest
  2. Snapshot โ€” Hash-based change detection (no git required)
  3. Parse โ€” AST analysis (Python) or regex extraction (JS/TS, Java, Go, Rust)
  4. Classify โ€” Determines session scope: trivial โ†’ minor โ†’ standard โ†’ major โ†’ refactor
  5. Compress โ€” Token-efficient representation within your model's budget
  6. Generate โ€” Sends structured prompt to Anthropic, OpenAI, or Ollama
  7. Merge โ€” Combines multiple doc snapshots into a single conceptualized document

Quick Start

Install

pip install codeledger

Initialize

cd your-project
codeledger init --preset python_api

This creates .codeledger/config.yaml with sensible defaults for your project type.

Generate Documentation

codeledger generate

CodeLedger scans your code, detects what changed, classifies the session, and generates a structured doc snapshot.

Merge Into Final Docs

codeledger merge

Combines all generated snapshots into a single DOCUMENTATION.md.

Other Commands

codeledger status       # Show project status and doc history
codeledger diff         # Show changes since last snapshot
codeledger explain pd_001  # Display a specific doc by ID
codeledger version      # Show version

Configuration

After codeledger init, edit .codeledger/config.yaml:

project:
  name: my-project
  language: python
  type: api

cadence:
  n_value: 5          # Generate every N interactions
  trigger: manual     # manual | file_watch | time_interval

model:
  tier: api
  provider: anthropic
  model_name: claude-sonnet-4-20250514
  api_key_env: ANTHROPIC_API_KEY
  max_input_tokens: 3000
  max_output_tokens: 5000

focus:
  include_patterns:
    - "**/*.py"
  exclude_patterns:
    - "tests/**"
    - "__pycache__/**"
  highlight:
    - "src/core/engine.py"   # Pay extra attention to these files

Presets

Start fast with a preset that matches your project:

Preset Use Case
python_api Python REST/GraphQL APIs
react_frontend React/Next.js frontends
fullstack Full-stack applications
data_pipeline ETL and data processing
ml_research ML/AI research projects
cli_tool Command-line tools
minimal Bare minimum setup
codeledger init --preset fullstack --name my-app

Model Support

Provider Tier Setup
Anthropic Cloud Set ANTHROPIC_API_KEY env var
OpenAI Cloud Set OPENAI_API_KEY env var
Ollama Local (free) Install Ollama, pull a model

Using Ollama (free, runs locally)

model:
  tier: local
  provider: ollama
  model_name: llama3.1
  max_output_tokens: 5000

No API key needed โ€” runs entirely on your machine.

Smart Session Classification

CodeLedger doesn't waste tokens on tiny changes. It classifies each session to calibrate documentation depth:

Type When Token Budget What Happens
Trivial <2 files, <30 lines 0 Deferred and batched
Minor <5 files, <150 lines ~1.5K Micro-doc generated
Standard <15 files, <500 lines ~5K Full documentation
Major 15+ files, 500+ lines ~8K Comprehensive deep-dive
Refactor Many deletes + creates ~3K Refactor-focused analysis

Trivial sessions are automatically deferred and batched until they accumulate enough significance โ€” so you only pay for docs when they matter.

No Git Required

CodeLedger uses its own Snapshot Engine โ€” SHA-256 hashing of file contents for change detection. Git is completely optional.

This means it works for:

  • Projects without version control
  • Quick prototypes and experiments
  • Environments where git isn't available
  • Vibe coding sessions where you just want to build

What Gets Documented

Each generated doc includes up to 9 configurable sections:

Section What It Captures
Phase Execution Summary What was built and current status
Code Architecture File tree and structural overview
Decision Rationale Why things were built this way
Component Logic How non-obvious parts work
Integration & Data Flow How components connect
Edge Cases & Error Handling Boundary conditions and failure modes
Interview & Learning Notes Q&A format insights
Technical Debt Known issues and future work
Quick Reference Common commands and entry points

Sections are prioritized (P1/P2/P3) and automatically trimmed to fit your token budget.

Architecture

src/codeledger/
โ”œโ”€โ”€ config/          # Pydantic schema, YAML loader, 7 presets
โ”œโ”€โ”€ scanner/         # File scanner, snapshot engine, dependency resolver, Change DAG
โ”œโ”€โ”€ parser/          # Python AST parser + regex fallback for JS/TS, Java, Go, Rust
โ”œโ”€โ”€ classifier/      # Rule-based session classification with deferred batching
โ”œโ”€โ”€ compressor/      # Token compression and budget-aware scope trimming
โ”œโ”€โ”€ generator/       # Prompt builder, model router, API + local clients
โ”œโ”€โ”€ postprocess/     # Output validation, formatting, file management
โ”œโ”€โ”€ merge/           # Multi-doc extraction, deduplication, merge engine
โ”œโ”€โ”€ templates/       # 4 prompt templates + 2 Jinja2 output templates
โ””โ”€โ”€ cli.py           # Typer CLI entry point

Key design decisions:

  • No git dependency โ€” Snapshot engine uses SHA-256 hashing
  • Change DAG โ€” Dependency graph propagation for token-efficient scoping
  • Budget-aware pipeline โ€” Every stage respects the configured token limit
  • Validation layer โ€” Catches hallucinated file paths, checks section coverage

Development

git clone https://github.com/codeledger/codeledger.git
cd codeledger
pip install -e ".[dev]"

pytest                          # Run tests
ruff check src/ tests/          # Lint
ruff format src/ tests/         # Format
mypy src/codeledger/             # Type check

Roadmap

  • Core pipeline (scan โ†’ parse โ†’ classify โ†’ compress โ†’ generate)
  • Snapshot engine (git-free change detection)
  • Change DAG with dependency propagation
  • Session classifier with deferred batching
  • Multi-model support (Anthropic, OpenAI, Ollama)
  • Merge engine with deduplication
  • File watcher mode for automatic triggers
  • Tree-sitter parsers for deeper JS/TS and Java analysis
  • MkDocs documentation site
  • VS Code extension

License

MIT โ€” Use it however you want.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeledger-0.1.5.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codeledger-0.1.5-py3-none-any.whl (62.1 kB view details)

Uploaded Python 3

File details

Details for the file codeledger-0.1.5.tar.gz.

File metadata

  • Download URL: codeledger-0.1.5.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codeledger-0.1.5.tar.gz
Algorithm Hash digest
SHA256 84c966fe986b400a002f1470ea2ef1571ea81c43e02540c5994b3633119161ed
MD5 583f79dd37ec1c3416db3ce337c62a40
BLAKE2b-256 784e4d2c9d361442b5138716e3e3f64dda8006457a7196097e2117e495578f4b

See more details on using hashes here.

Provenance

The following attestation bundles were made for codeledger-0.1.5.tar.gz:

Publisher: release.yml on Parth-Vyas000/CodeLedger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codeledger-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: codeledger-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 62.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codeledger-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 55d8ebfa70cfabbd2316fde0b8f4df24eab664cd1e619761728031983c6de8e4
MD5 22f15ecf6ea76d8d7c2bd5157ff36c90
BLAKE2b-256 0b1bcbeaf31517ebde185c7a39c81a6322eb31f257678bf342f25daa741a5154

See more details on using hashes here.

Provenance

The following attestation bundles were made for codeledger-0.1.5-py3-none-any.whl:

Publisher: release.yml on Parth-Vyas000/CodeLedger

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page