Skip to main content

AI-powered documentation autopilot — commit code, docs update themselves. MCP server mode gives coding agents (Claude Code, Cursor, Windsurf) 11 documentation tools with no LLM key needed. Also runs as a standalone local-first pipeline with FAISS embeddings and section-level doc updates. Works with any LLM, optional Notion sync.

Project description

Codebase Cortex

PyPI version Python 3.11+ License: MIT PyPI Downloads

Automatically keep your engineering documentation in sync with code.

Codebase Cortex is a local-first, multi-agent documentation engine that watches your codebase for changes and updates markdown documentation automatically. It uses LangGraph to orchestrate nine pipeline nodes that analyze code, route updates to specific sections, write docs, validate accuracy, generate indexes, create tasks, and produce sprint reports. Docs live as plain markdown files in your repo — no cloud dependency required. Optional sync to Notion is available via the DocBackend protocol.

New in v0.3: MCP server mode — coding agents (Claude Code, Cursor, Windsurf) can use Cortex's documentation tools directly. No LLM API key needed. Learn more →

graph LR
    A[Git Commit] --> B[CodeAnalyzer]
    B --> C[SemanticFinder]
    C --> D[SectionRouter]
    D --> E[DocWriter]
    E --> F[DocValidator]
    F --> G[TOCGenerator]
    G --> H[TaskCreator]
    H --> I[SprintReporter]
    I --> J[OutputRouter]
    J --> K[docs/]
    J --> L[Notion]

Features

  • MCP server mode — Coding agents (Claude Code, Cursor, Windsurf) get 11 documentation tools via MCP. No LLM API key needed — the agent's own LLM does the thinking
  • Local-first documentation — Docs are plain markdown in your repo's docs/ directory. No cloud dependency required
  • Section-level updates — Only changed sections are rewritten, preserving human edits
  • Human-edit preservation — MetaIndex tracks section hashes and detects manual edits, which the pipeline protects
  • Semantic search — FAISS embeddings with TreeSitter-aware AST chunking find related code across your entire codebase
  • Incremental indexing — Only re-embeds changed files, with .cortexignore for custom exclusions
  • Draft banners — New pages are marked as drafts until reviewed; remove with cortex accept
  • Multi-backend output — DocBackend protocol with LocalMarkdownBackend (default) and NotionBackend
  • Output modesapply writes directly, propose stages for review, dry-run previews only
  • Sprint reports — Weekly summaries generated from commit activity with run metrics
  • Task tracking — Automatically identifies undocumented areas and creates tasks
  • Cost tracking — Run metrics aggregate token usage, wall-clock time, and cost per pipeline run
  • CI/CD integrationcortex ci command outputs structured JSON for GitHub Actions and GitLab CI pipelines (PR impact analysis, post-merge doc updates)

Quick Start

Prerequisites

  • Python 3.11+
  • uv package manager
  • An LLM — cloud API key (Gemini, Anthropic, OpenRouter, OpenAI) or a local model (Ollama, vLLM, LM Studio). Not required for MCP server mode — coding agents use their own LLM

Install

# Install from PyPI
pip install codebase-cortex

# Or with uv
uv tool install codebase-cortex

Both cortex and codebase-cortex commands are available after installation. If cortex conflicts with another package on your system, use codebase-cortex instead.

Install from source
git clone https://github.com/sarupurisailalith/codebase-cortex.git
cd codebase-cortex
uv sync
uv tool install .

Initialize in your project

cd /path/to/your-project

# Interactive setup — configures LLM, creates docs/ directory
cortex init

# Quick setup with defaults
cortex init --quick

# Run the pipeline
cortex run --once

The init wizard will:

  1. Ask for your LLM provider and API key
  2. Create a .cortex/ config directory and docs/ output directory
  3. Optionally connect to Notion via OAuth
  4. Optionally install a post-commit git hook

CLI Commands

Command Description
cortex init [--quick] Interactive setup wizard
cortex run --once [--full] [--dry-run] Run the full pipeline once
cortex status Show connection and config status
cortex analyze One-shot diff analysis (no doc writes)
cortex embed Rebuild the FAISS embedding index
cortex config show Display current configuration
cortex config set KEY VALUE Update a config value
cortex diff Show proposed documentation changes
cortex apply Apply proposed changes to docs/
cortex discard Discard proposed changes
cortex accept Remove draft banners after review
cortex resolve Resolve merge conflicts in docs/
cortex check Check documentation freshness
cortex sync --target notion Sync local docs to Notion (OAuth flow)
cortex ci [--on-pr] [--on-merge] CI/CD mode (JSON output for pipelines)
cortex map Generate knowledge map from FAISS clusters
cortex mcp serve Start MCP server for coding agents (stdio)

How It Works

Cortex creates a .cortex/ directory (gitignored) in your project repo that stores configuration, OAuth tokens, and the FAISS vector index. Documentation is written as markdown files to docs/. When you run the pipeline, nine nodes work in sequence:

graph TD
    START([Start]) --> CA[CodeAnalyzer]
    CA -->|Has analysis?| SF[SemanticFinder]
    CA -->|No changes| END1([End])
    SF --> SR[SectionRouter]
    SR --> DW[DocWriter]
    DW --> DV[DocValidator]
    DV --> TG[TOCGenerator]
    TG --> TC[TaskCreator]
    TC -->|Has updates?| SPR[SprintReporter]
    TC -->|Nothing to report| END2([End])
    SPR --> OR[OutputRouter]
    OR -->|apply| DOCS[docs/]
    OR -->|propose| STAGED[.cortex/proposed/]
    OR -->|dry-run| END3([End])

    style CA fill:#4A90D9,color:#fff
    style SF fill:#7B68EE,color:#fff
    style SR fill:#9B59B6,color:#fff
    style DW fill:#50C878,color:#fff
    style DV fill:#2ECC71,color:#fff
    style TG fill:#1ABC9C,color:#fff
    style TC fill:#FFB347,color:#fff
    style SPR fill:#FF6B6B,color:#fff
    style OR fill:#E74C3C,color:#fff
  1. CodeAnalyzer — Parses git diffs (or scans the full codebase) and produces a structured analysis of what changed
  2. SemanticFinder — Incrementally embeds changed files using TreeSitter AST chunking and searches the FAISS index for semantically related code
  3. SectionRouter — Reads INDEX.md and .cortex-meta.json to triage which sections in which pages need updating, respecting human-edited sections
  4. DocWriter — Reads only targeted sections by line range, generates updated content, and writes via the DocBackend protocol
  5. DocValidator — Compares generated docs against actual code for factual accuracy, flagging low-confidence sections for human review
  6. TOCGenerator — Regenerates TOC markers in updated files, refreshes INDEX.md and .cortex-meta.json, records run metrics
  7. TaskCreator — Identifies documentation gaps and creates task entries
  8. SprintReporter — Synthesizes all activity into a weekly sprint summary with run metrics
  9. OutputRouter — Applies the configured output mode (apply, propose, or dry-run)

Per-Repo Configuration

your-project/
├── .cortex/                    # Created by cortex init (gitignored)
│   ├── .env                    # LLM model, API keys, doc settings
│   ├── .gitignore              # Ignores everything in .cortex/
│   ├── .cortexignore           # User-defined FAISS indexing exclusions
│   ├── notion_tokens.json      # OAuth tokens (if Notion connected)
│   ├── page_cache.json         # Tracked Notion pages (if connected)
│   ├── proposed/               # Staged changes (propose mode)
│   └── faiss_index/            # Vector embeddings
│       ├── index.faiss
│       ├── chunks.json
│       ├── id_map.json
│       └── file_hashes.json
├── docs/                       # Generated documentation (local backend)
│   ├── INDEX.md                # Auto-generated index with heading tree
│   ├── .cortex-meta.json       # Section hashes, human-edit tracking
│   └── *.md                    # Documentation pages
└── src/

Supported LLM Providers

Cortex uses LiteLLM as a unified LLM interface. Any LiteLLM-compatible model works — cloud APIs, local models, or self-hosted endpoints. LiteLLM supports 100+ providers.

Provider Example Model Config
Google Gemini gemini/gemini-2.5-flash-lite GOOGLE_API_KEY
Anthropic anthropic/claude-sonnet-4-20250514 ANTHROPIC_API_KEY
OpenRouter openrouter/google/gemini-2.5-flash-lite OPENROUTER_API_KEY
Ollama (local) ollama/llama3 No key needed (runs locally)
vLLM (local) hosted_vllm/model-name LLM_API_BASE=http://localhost:8000
LM Studio lm_studio/model-name LLM_API_BASE=http://localhost:1234/v1
OpenAI gpt-4o OPENAI_API_KEY
Azure OpenAI azure/gpt-4o AZURE_API_KEY + AZURE_API_BASE

For local models, set the model name and optionally LLM_API_BASE in .cortex/.env:

LLM_MODEL=ollama/llama3
# or
LLM_MODEL=hosted_vllm/my-model
LLM_API_BASE=http://localhost:8000

Documentation

Document Description
Architecture System design, data flow, pipeline nodes
CLI Reference All commands, options, and examples
Agents How each pipeline node works
Configuration Setup, LLM providers, environment variables
Notion Integration OAuth flow, sync protocol, page management
Embeddings & Search FAISS index, TreeSitter chunking, semantic search
CI/CD Integration GitHub Actions, GitLab CI, branch strategies
MCP Server MCP server mode for coding agents (Claude Code, Cursor, Windsurf)
Contributing Development setup, testing, project structure

Changelog

0.3.0

  • MCP server mode — Coding agents (Claude Code, Cursor, Windsurf) get 11 documentation tools via MCP. No LLM API key needed — the agent's own LLM does the thinking
  • MCP toolscortex_search_related_docs, cortex_read_section, cortex_write_section, cortex_list_docs, cortex_check_freshness, cortex_get_doc_status, cortex_rebuild_index, cortex_accept_drafts, cortex_create_page, cortex_knowledge_map, cortex_sync
  • Human-edit protectioncortex_write_section checks MetaIndex for human edits, requires [force] suffix to overwrite
  • File locking — Concurrent access safety between MCP server and standalone pipeline via fcntl.flock
  • FAISS staleness detection — MCP server auto-reloads index when standalone pipeline rebuilds it
  • Init flowcortex init now offers MCP-only, standalone, or hybrid mode selection

0.2.0

  • Redesign: Local-first multi-backend documentation engine — docs live as markdown in docs/, no cloud dependency required
  • Pipeline: 9-node LangGraph pipeline with conditional routing (added SectionRouter, DocValidator, TOCGenerator, OutputRouter)
  • LLM: LiteLLM unified interface replaces all langchain LLM providers
  • Backends: DocBackend protocol with LocalMarkdownBackend (default) and NotionBackend
  • Embeddings: TreeSitter AST-aware code chunking with regex fallback; incremental FAISS index rebuild (only re-embeds changed files)
  • MetaIndex: .cortex-meta.json tracks section hashes and detects human edits for preservation
  • Exclusions: .cortexignore for user-defined FAISS indexing exclusions
  • CLI: 11 new commands (config, diff, apply, discard, accept, resolve, check, sync, map, and more)
  • Output modes: apply (default), propose (staged for review), dry-run (preview only)
  • Draft banners: New pages marked as drafts until reviewed with cortex accept
  • Metrics: Run metrics aggregation (token usage, cost, timing) via LangGraph state reducer
  • Branches: Branch strategy enforcement (main-only or branch-aware)
  • OAuth: Service connection pattern for Notion OAuth integration

0.1.4

  • Fix: Resolved duplicate child pages caused by emoji title mismatch between Notion and local cache
  • Fix: DocWriter now uses normalized title matching for section-level merges (prevents creating duplicates when LLM returns titles with/without emoji)
  • Fix: Parent page creation now warns user to verify page location in Notion workspace

0.1.3

  • Fix: API key input is now masked during cortex init
  • Fix: Sprint Log uses replace_content instead of appending on every run

0.1.2

  • Fix: Dynamic parent page title (uses repo directory name instead of hardcoded "Codebase Cortex")
  • Fix: Child page bootstrap only checks local cache, no longer adopts unrelated workspace pages

0.1.1

  • Initial public release

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebase_cortex-0.3.0.tar.gz (131.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codebase_cortex-0.3.0-py3-none-any.whl (101.4 kB view details)

Uploaded Python 3

File details

Details for the file codebase_cortex-0.3.0.tar.gz.

File metadata

  • Download URL: codebase_cortex-0.3.0.tar.gz
  • Upload date:
  • Size: 131.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for codebase_cortex-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c7f36263eb6f5283ca75b0d83be4bd930da0ea4cc59048dc99893cc09dc5be84
MD5 ae751c2fc5017cdd6f6a82acee5a8ec2
BLAKE2b-256 a81817f3dfc5abd1cacc1736816a03f50cfbca438ba31e6bdc2d4c1aa7ee096e

See more details on using hashes here.

File details

Details for the file codebase_cortex-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for codebase_cortex-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 caf41430b133990753eeaf843cb13b0291799f1f08240dff63f0644921131fec
MD5 6e091aa32e4c23335d83ea8e07a7b26c
BLAKE2b-256 6f378ab93fb349867feac2ad05904684e4d0860a8c716996aa70f9bef1b169e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page