Skip to main content

The universal project brain — scan any codebase, generate context files for every AI coding tool.

Project description

codebase-md

The universal project brain that works with every AI coding tool.

PyPI CI Python 3.11+ License: MIT Tests

One command scans your codebase and generates context files for Claude Code, Cursor, Codex, Windsurf, and more — auto-detected conventions, dependency health, architecture maps, and smart context routing. Stays fresh via git hooks.


Why?

Every AI coding tool needs project context to work well. But each tool has its own format:

  • Claude Code wants CLAUDE.md
  • Cursor wants .cursorrules
  • Codex wants codex.md
  • Windsurf wants .windsurfrules

Writing and maintaining these manually is tedious. codebase-md scans your project once and generates all of them from a single source of truth.

Features

  • Universal output — generates 6 formats from one scan (CLAUDE.md, .cursorrules, AGENTS.md, codex.md, .windsurfrules, PROJECT_CONTEXT.md)
  • Auto-detected conventions — naming style, import patterns, file organization, design patterns (powered by tree-sitter AST)
  • Dependency intelligence — health scores, version diffs, breaking change detection, migration plans with code impact
  • Architecture mapping — detects monolith/monorepo/microservice/library/CLI patterns, entry points, modules
  • Smart context routing — query-based context retrieval with TF-IDF relevance scoring
  • Git integration — hooks for auto-regeneration on commit, contributor analysis, file hotspots
  • Multi-language — Python, JavaScript, TypeScript (50+ file extensions recognized)

v0.1.0 — Current Status

Alpha release — this is the first public release of codebase-md. Core functionality is working and tested, but APIs and output formats may change between minor versions. Please pin your version (pip install codebase-md==0.1.0) and report issues.

What Works Well

  • Single-command scancodebase scan . analyzes your entire project in seconds
  • 5 output formats — CLAUDE.md, AGENTS.md, .cursorrules, codex.md, .windsurfrules (+ generic PROJECT_CONTEXT.md)
  • Language detection — Python, TypeScript, JavaScript, Go, Rust and 50+ file extensions
  • Dependency parsing — requirements.txt, pyproject.toml, package.json, go.mod, Cargo.toml, Gemfile
  • Convention inference — naming style, import patterns, file organization, design patterns (via tree-sitter AST)
  • Architecture detection — monolith, monorepo, microservice, library, CLI tool
  • Git insights — commit history, contributor analysis, file change hotspots
  • Dependency health — live registry queries (PyPI, npm) with health scoring and breaking change detection
  • Smart context routing — TF-IDF relevance scoring for query-based context retrieval

Known Limitations

  • AST grammars — tree-sitter support is limited to Python, JavaScript, and TypeScript; Go and Rust are parsed via heuristics
  • No incremental mode — every scan re-analyzes the full project (no watch/diff mode yet)
  • Large monorepos — projects with >10,000 files may experience slower scan times
  • Network dependency — DepShift registry queries (PyPI/npm health checks) require network access; use --offline to skip
  • No Windows CI — tested on Linux and macOS; Windows should work but is not yet part of CI

Tested Against

The test suite (354 tests) validates against these project archetypes:

Fixture Type Languages
Python CLI CLI tool Python
FastAPI App Web API Python
Next.js App Full-stack TypeScript, JavaScript
Go CLI CLI tool Go
Rust CLI CLI tool Rust
Mixed Language Multi-lang Python, JS, Go
Monorepo Monorepo Multiple
Empty Repo Edge case

Integration tests also run against real-world repositories (see test_real_repos.py).


Installation

From PyPI

pip install codebase-md

With AST support (recommended)

pip install "codebase-md[ast]"

From GitHub (latest dev)

pip install git+https://github.com/sauravanand542/codebase-md.git

For development

git clone https://github.com/sauravanand542/codebase-md.git
cd codebase-md
pip install -e ".[dev,ast]"

Quick Start

# Initialize config in your project
cd your-project/
codebase init

# Scan your codebase (builds internal project model)
codebase scan .

# Generate context files for all AI tools
codebase generate .

That's it. You now have CLAUDE.md, .cursorrules, AGENTS.md, codex.md, .windsurfrules, and PROJECT_CONTEXT.md in your project root.


Commands

codebase scan

Scans your project and builds a complete model: languages, architecture, dependencies, conventions, modules, git history.

codebase scan .                    # Scan current directory
codebase scan /path/to/project     # Scan a specific project

codebase generate

Generates context files from the last scan.

codebase generate .                # Generate all formats
codebase generate . --format claude  # Generate only CLAUDE.md

codebase deps

Dependency health dashboard — checks versions against registries, computes health scores.

codebase deps .                    # Health dashboard (queries PyPI/npm)
codebase deps . --offline          # Offline mode (no network)
codebase deps . --upgrade typer    # Migration plan for a specific package

codebase context

Query relevant project context with smart ranking.

codebase context "architecture"              # Find architecture info
codebase context "dependencies" --max 3      # Top 3 relevant chunks
codebase context "how to test" --compact     # Content-only output

codebase hooks

Install git hooks for automatic regeneration.

codebase hooks install .           # Install post-commit hooks
codebase hooks status .            # Show installed hooks
codebase hooks remove .            # Remove hooks

codebase init

Initialize .codebase/ configuration directory.

codebase init                      # Creates .codebase/config.yaml

Output Formats

Format File AI Tool Description
claude CLAUDE.md Claude Code Structured markdown with project summary, architecture, conventions
cursor .cursorrules Cursor Coding rules, language-specific guidance, tech stack
agents AGENTS.md Multi-agent Compact entry points, commands, architecture flow
codex codex.md Codex CLI Overview, setup, project structure, conventions
windsurf .windsurfrules Windsurf Rules-based format with architecture and file map
generic PROJECT_CONTEXT.md Any tool Complete markdown with all sections + metadata

What Gets Detected

Languages & Frameworks

50+ file extensions recognized. Framework detection for Python (Django, FastAPI, Flask), JavaScript/TypeScript (React, Next.js, Express, Vue).

Architecture Patterns

Monolith, monorepo, microservice, library, CLI tool — detected from folder structure, entry points, and package layout.

Conventions

  • Naming: snake_case, camelCase, PascalCase, kebab-case
  • Imports: absolute, relative, mixed
  • File organization: modular, layer-based, feature-based, flat
  • Design patterns: model, view, controller, service, repository, etc.

Dependencies

Parses package.json, requirements.txt, pyproject.toml, go.mod, Cargo.toml, Gemfile. Health scoring via live registry queries (PyPI, npm).


Project Structure

src/codebase_md/
├── cli.py                  # Typer CLI — all commands
├── model/                  # Pydantic v2 data models (frozen, validated)
├── scanner/                # Codebase analysis engine
│   ├── engine.py           # Orchestrates all scanners
│   ├── language_detector.py
│   ├── structure_analyzer.py
│   ├── dependency_parser.py
│   ├── convention_inferrer.py  # tree-sitter powered
│   ├── ast_analyzer.py        # tree-sitter AST
│   └── git_analyzer.py
├── generators/             # Output format generators (plugin-style)
├── depshift/               # Dependency intelligence engine
│   ├── analyzer.py         # Health scoring
│   ├── version_differ.py   # Breaking change detection
│   ├── usage_mapper.py     # Import → source location mapping
│   └── registries/         # PyPI + npm clients
├── context/                # Smart context routing
│   ├── chunker.py          # 12 topic-based chunks
│   ├── ranker.py           # 6-signal TF-IDF scoring
│   └── router.py           # Query pipeline
├── persistence/            # .codebase/ state management
└── integrations/           # Git hooks, GitHub Actions

Configuration

After codebase init, edit .codebase/config.yaml:

version: 1
generators:
  - claude
  - cursor
  - agents
  - codex
  - windsurf
  - generic
scan:
  exclude:
    - node_modules
    - .venv
    - dist
    - build
hooks:
  post_commit: true
  pre_push: false

Contributing

See CONTRIBUTING.md for development setup, coding conventions, and PR guidelines.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebase_md-0.1.0.tar.gz (111.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codebase_md-0.1.0-py3-none-any.whl (104.1 kB view details)

Uploaded Python 3

File details

Details for the file codebase_md-0.1.0.tar.gz.

File metadata

  • Download URL: codebase_md-0.1.0.tar.gz
  • Upload date:
  • Size: 111.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codebase_md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f73dc16d69e3d75edbe0c33b76bc0f02b681fe2e7b360b0e620f7975866b9214
MD5 2694d92c02ebda20c0321fdd2e2ba7ca
BLAKE2b-256 a2b9059bdad7ae54cf1e2e0b68bb81e06f1c3b8a540b7dfea63ec4c0a0b8fa02

See more details on using hashes here.

Provenance

The following attestation bundles were made for codebase_md-0.1.0.tar.gz:

Publisher: publish.yml on sauravanand542/codebase-md

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codebase_md-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codebase_md-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 104.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codebase_md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fcd3a14470cd3399d881ee9d705702eec155a52ef503d052875a8da1254eec59
MD5 d490839ec59af3e140dd3af662c72dd8
BLAKE2b-256 ca46b24e66185fb2a1fdc0998cefd6d9df8075f1d439daeb5c3ac50bfe09f4af

See more details on using hashes here.

Provenance

The following attestation bundles were made for codebase_md-0.1.0-py3-none-any.whl:

Publisher: publish.yml on sauravanand542/codebase-md

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page