Skip to main content

Auto-generate beautiful docs from any codebase

Project description

DeepDoc

PyPI version Python versions License: MIT

Auto-generate deep engineering documentation from real codebases using AI.

DeepDoc scans your repo, builds a bucket-based documentation plan, generates rich MDX pages with Mermaid diagrams, and builds a local-first Fumadocs site with Orama search.


Features

  • Bucket-Based Documentation Architecture — Docs are planned as system, feature, endpoint, endpoint reference, integration, and database buckets instead of noisy one-file-per-page output.
  • Five-Phase Pipeline — Scan, plan, generate, playground, build. Planning and generation are separated so large repos and large files are handled more cleanly.
  • Multi-Step AI Planner — The planner classifies the repo, proposes buckets, then assigns files, symbols, artifacts, and dependencies into the final doc structure.
  • Giant-File Handling — Large files are decomposed into feature-aligned clusters so giant controllers or service files can feed multiple doc pages.
  • Endpoint-Family + Per-Endpoint Docs — High-level endpoint family pages are AI-planned, and individual endpoint_ref pages are derived from scan data and generated separately.
  • Integration Discovery — Third-party systems like payment gateways, delivery providers, warehouse systems, and webhook integrations can be grouped into integration docs.
  • Incremental Updatesdeepdoc update uses persisted plan and ledger data to regenerate only stale or structurally affected docs.
  • Full Refresh and Clean Rebuild Modesgenerate --force fully refreshes DeepDoc-managed docs and removes stale generated pages; generate --clean --yes wipes output and rebuilds from scratch.
  • Safe Existing-Docs Behavior — Plain generate refuses to run over an existing DeepDoc-managed docs set and will not silently mix into a non-DeepDoc docs/ folder.
  • Multi-Language Support — JavaScript/TypeScript, Python, Go, PHP/Laravel with tree-sitter AST parsing and regex fallback.
  • Configurable LLM — Works with Anthropic, OpenAI, Azure OpenAI, Ollama, and other LiteLLM-compatible providers.
  • Mermaid Diagrams — Generated pages can include architecture, flow, and request-sequence diagrams.
  • OpenAPI-Aware API Docs — Auto-detects OpenAPI/Swagger specs and stages canonical interactive /api/* pages in the generated site.
  • Local-First Fumadocs Site — Generates a site/ Next.js app with Fumadocs UI, Mermaid rendering, and built-in Orama search.
  • Static Exportdeepdoc deploy exports a static site to site/out/ for any static host.

Installation

From PyPI (recommended)

pip install deepdoc

If you want DeepDoc's chatbot features, install the chatbot extra:

pip install "deepdoc[chatbot]"

The base install does not include chatbot dependencies.

From source (recommended during development)

git clone <your-repo-url>
cd deepdoc
pip install -e .

If you want chatbot features during development:

pip install -e ".[chatbot]"

If the full install is slow due to tree-sitter compilation, install core deps first:

pip install click litellm gitpython rich pyyaml jinja2
pip install -e . --no-deps

Verify installation

deepdoc --version
deepdoc --help
python -m deepdoc --help

If you installed the chatbot extra, you can verify those dependencies with:

pip show faiss-cpu fastapi uvicorn

Quick Start

# 1. Go to your project
cd /path/to/your-project

# 2. Initialize DeepDoc
deepdoc init

# 3. Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# 4. Generate docs
deepdoc generate

# 5. Preview locally
deepdoc serve
# → Open http://localhost:3000

Commands

Every command supports --help, including nested config commands:

deepdoc --help
deepdoc generate --help
deepdoc config --help
deepdoc config set --help

deepdoc init

Initializes DeepDoc in the current directory by creating a .deepdoc.yaml config file.

deepdoc init
deepdoc init --provider openai --model gpt-4o
deepdoc init --provider ollama --model ollama/llama3.2
deepdoc init --provider azure --model azure/gpt-4o
deepdoc init --output-dir documentation

Options:

Flag Default Description
--name directory name Project name
--description empty Short project description
--provider anthropic LLM provider: anthropic, openai, ollama, azure
--model provider default Model name
--output-dir docs Where generated docs are written

deepdoc generate

Full documentation generation. This is the first-run or explicit full-refresh command.

deepdoc generate
deepdoc generate --force           # Full refresh of DeepDoc-managed docs
deepdoc generate --clean --yes     # Wipe output + state and rebuild from scratch
deepdoc generate --deploy          # Generate + export the static site
deepdoc generate --batch-size 3    # Smaller batches for rate-limited APIs
deepdoc generate --include "src/**" --include "lib/**"
deepdoc generate --exclude "tests/**"

Current behavior:

  • deepdoc generate
    • intended for the first run
    • refuses to run if DeepDoc docs/state already exist
    • refuses to write into a non-DeepDoc docs/ folder unless you explicitly clean it
  • deepdoc generate --force
    • re-runs the full pipeline
    • regenerates all DeepDoc-managed pages even if they are not stale
    • removes stale generated pages that no longer belong in the new plan
    • preserves non-DeepDoc files
  • deepdoc generate --clean --yes
    • deletes the output dir and DeepDoc state
    • rebuilds everything from scratch

What happens under the hood (5-phase pipeline):

  1. Phase 1: Scan — Walk the repo, parse supported languages, detect endpoints, config/setup artifacts, integration signals, and OpenAPI specs.
  2. Phase 2: Plan — Run the multi-step bucket planner. It classifies the repo, proposes bucket candidates, and assigns files/symbols/artifacts to the final doc structure.
  3. Phase 3: Generate — Generate bucket pages in batches with parallel workers. High-level buckets are AI-planned; per-endpoint reference pages are derived from scan data and generated individually.
  4. Phase 4: API Ref — Stage OpenAPI assets for the generated Fumadocs /api/* pages when a spec exists.
  5. Phase 5: Build — Write the generated site/ Fumadocs scaffold, page tree, search route, and static assets from the generated plan.

Options:

Flag Default Description
--force off Full refresh of DeepDoc-managed docs and cleanup of stale generated pages
--clean off Delete output dir and DeepDoc state, then regenerate from scratch
--yes off Skip destructive confirmation for --clean
--include all files Glob patterns to include (can be repeated)
--exclude see config Additional glob patterns to exclude
--deploy off Build and export the static site after generation
--batch-size 10 Pages per batch before pausing (helps with rate limits)

deepdoc update

Incrementally update docs when source files change. This is the normal command after the first successful generate.

deepdoc update                    # Normal ongoing refresh
deepdoc update --since HEAD~3     # Changes in last 3 commits
deepdoc update --since main       # All changes since branching from main
deepdoc update --replan           # Force a full replan
deepdoc update --deploy           # Update + deploy

How it works:

  1. Loads the saved plan and generation ledger from .deepdoc/.
  2. Detects changed, new, and deleted files.
  3. Chooses a strategy automatically:
    • incremental update
    • targeted replan
    • full replan
  4. Regenerates only the affected bucket pages when safe.
  5. Rebuilds site config and nav afterward.

If git is unavailable, it falls back to hash-based staleness detection.

Options:

Flag Default Description
--since HEAD~1 Git ref to diff against
--replan off Force a full replan even if the change set looks incremental
--deploy off Deploy after updating

deepdoc status

Show how much documentation has been generated and whether any buckets are stale.

deepdoc status

This is useful after generate or update when you want a quick health check without opening the site.

deepdoc serve

Preview the generated docs locally with live reload using the generated Fumadocs app in site/.

deepdoc serve
deepdoc serve --port 8001

Requires Node.js >= 18 to be installed. Site dependencies are auto-installed into site/node_modules/ on first run.

deepdoc deploy

Build and export the generated Fumadocs site.

deepdoc deploy

This runs next build inside site/ and writes the static export to site/out/. You can deploy that directory to Vercel, Netlify, GitHub Pages, Cloudflare Pages, or any static host.

deepdoc config

View or update config values without editing YAML manually.

deepdoc config show                                    # Print all config
deepdoc config set llm.provider openai                 # Switch provider
deepdoc config set llm.model gpt-4o                    # Switch model
deepdoc config set llm.temperature 0.3                 # Adjust creativity
deepdoc config set output_dir documentation            # Change output dir
deepdoc config set llm.api_key_env AZURE_API_KEY       # Change API key env var

LLM Provider Setup

DeepDoc uses LiteLLM under the hood, which means it supports 100+ LLM providers. Here are the most common setups:

Anthropic (Claude) — Default

deepdoc init --provider anthropic
export ANTHROPIC_API_KEY=sk-ant-api03-...
deepdoc generate

Models: claude-3-5-sonnet-20241022, claude-3-opus-20240229, claude-3-haiku-20240307

OpenAI (GPT)

deepdoc init --provider openai --model gpt-4o
export OPENAI_API_KEY=sk-...
deepdoc generate

Models: gpt-4.1, gpt-4.1-mini, gpt-4o, gpt-4o-mini, gpt-4-turbo

Azure OpenAI

Azure requires a few more environment variables because deployments have custom names and endpoints.

# 1. Initialize with Azure
deepdoc init --provider azure --model azure/<your-deployment-name>

# 2. Set required environment variables
export AZURE_API_KEY=your-azure-api-key
export AZURE_API_BASE=https://<your-resource-name>.openai.azure.com
export AZURE_API_VERSION=2024-02-01

# 3. Update config to point to your deployment
deepdoc config set llm.model azure/<your-deployment-name>
deepdoc config set llm.base_url https://<your-resource-name>.openai.azure.com

# 4. Generate
deepdoc generate

Where to find these values in Azure Portal:

  1. Go to Azure Portal → Azure OpenAI resource.
  2. Click Keys and Endpoint in the sidebar → copy Key 1 (that's your AZURE_API_KEY) and the Endpoint (that's your AZURE_API_BASE).
  3. Go to Model deploymentsManage Deployments → note your deployment name (e.g., gpt-4o-deployment). Use this as azure/gpt-4o-deployment in the model field.
  4. API version: Use 2024-02-01 or the latest GA version shown in Azure docs.

Example .deepdoc.yaml for Azure:

project_name: my-project
output_dir: docs
llm:
  provider: azure
  model: azure/gpt-4o-deploy      # "azure/" prefix + your deployment name
  api_key_env: AZURE_API_KEY
  base_url: https://mycompany.openai.azure.com
  max_tokens: 4096
  temperature: 0.2

Azure AD / Managed Identity (token-based auth):

If you use Azure AD instead of API keys, set these instead:

export AZURE_AD_TOKEN=your-ad-token
export AZURE_API_BASE=https://<your-resource-name>.openai.azure.com
export AZURE_API_VERSION=2024-02-01

LiteLLM picks up AZURE_AD_TOKEN automatically when AZURE_API_KEY is not set.

Ollama (Local / Free)

No API key needed. Just make sure Ollama is running locally.

# 1. Install and start Ollama (https://ollama.com)
ollama pull llama3.2

# 2. Initialize
deepdoc init --provider ollama --model ollama/llama3.2

# 3. Generate (no API key needed)
deepdoc generate

Other Ollama models: ollama/codellama, ollama/mistral, ollama/mixtral

Any LiteLLM Provider

DeepDoc passes the model string directly to LiteLLM, so you can use any provider LiteLLM supports by using the correct prefix:

# Groq
deepdoc config set llm.model groq/llama3-70b-8192
export GROQ_API_KEY=...

# Together AI
deepdoc config set llm.model together_ai/meta-llama/Llama-3-70b-chat-hf
export TOGETHER_API_KEY=...

# AWS Bedrock
deepdoc config set llm.model bedrock/anthropic.claude-3-sonnet-20240229-v1:0
# (uses AWS credentials from environment)

See LiteLLM providers for the full list.


Configuration

The .deepdoc.yaml file in your repo root controls everything:

project_name: my-app
description: "A web application for managing tasks"
output_dir: docs
site_dir: site

llm:
  provider: anthropic
  model: claude-3-5-sonnet-20241022
  api_key_env: ANTHROPIC_API_KEY
  base_url: null                    # Set for Ollama/custom endpoints
  max_tokens: null                  # null = no cap (recommended); set a number to limit output
  temperature: 0.2

languages:
  - python
  - javascript
  - typescript
  - go
  - php

include: []                         # Empty = include everything
exclude:
  - node_modules
  - .git
  - __pycache__
  - "*.pyc"
  - vendor
  - dist
  - build
  - .env
  - "*.lock"
  - "*.sum"

generation_mode: feature_buckets

# Generation tuning
max_pages: 0                        # 0 = no cap; set a number to limit total pages
giant_file_lines: 2000              # Files above this get LLM-based feature clustering
source_context_budget: 200000       # Raw-source char budget before DeepDoc switches overflow files to compressed evidence cards
integration_detection: auto         # "auto" | "off"

# Page type toggles
include_endpoint_pages: true        # Generate endpoint documentation
include_integration_pages: true     # Generate integration documentation

# Parallelism — tune for your LLM provider's rate limits
max_parallel_workers: 6             # Concurrent LLM calls (increase for Azure PTU)
batch_size: 10                      # Pages per batch before rate-limit pause

github_pages:
  enabled: false
  branch: gh-pages
  remote: origin

site:
  repo_url: ""                      # e.g., https://github.com/you/your-repo
  favicon: ""
  logo: ""

Configuration Reference

Key Default Description
project_name directory name Project name used in site title
description "" Short project description
output_dir docs Where generated markdown pages are written
site_dir site Where MkDocs builds the static site
LLM
llm.provider anthropic anthropic, openai, azure, ollama, or any LiteLLM alias
llm.model claude-3-5-sonnet-20241022 Model name (use provider prefix for non-Anthropic, e.g. azure/gpt-4.1)
llm.api_key_env ANTHROPIC_API_KEY Environment variable that holds the API key
llm.base_url null Custom endpoint URL (required for Ollama, optional for Azure)
llm.max_tokens null Max output tokens per LLM call. null = no cap (recommended). Set explicitly if your provider requires it (e.g. some Azure deployments). Typical values: 4096 for shorter pages, 819216384 for detailed docs
llm.temperature 0.2 LLM sampling temperature
Generation
generation_mode feature_buckets Documentation generation mode
max_pages 0 Max pages to generate. 0 = no cap
giant_file_lines 2000 Files above this line count get LLM-based feature clustering
source_context_budget 200000 Raw-source char budget per page before overflow files are represented as compressed evidence cards
integration_detection auto Detect third-party integrations: auto or off
include_endpoint_pages true Generate endpoint documentation pages
include_integration_pages true Generate integration documentation pages
Parallelism
max_parallel_workers 6 Concurrent LLM calls. Increase for Azure PTU or high-TPM deployments
batch_size 10 Pages per batch before rate-limit pause
File filters
languages [python, javascript, typescript, go, php, vue] Languages to parse
include [] Glob patterns to include (empty = everything)
exclude (see config) Glob patterns to exclude (node_modules, .git, dist, etc.)
GitHub Pages
github_pages.branch gh-pages Branch for GitHub Pages deploy
github_pages.remote origin Git remote for deploy
Site
site.repo_url "" Repo URL shown in the generated Fumadocs navigation
site.favicon "" Path to favicon
site.logo "" Path to logo

Supported Languages & Frameworks

Parsing (tree-sitter AST + regex fallback):

Language Extensions Extracts
Python .py Functions, classes, decorators, imports
JavaScript .js, .jsx, .mjs, .cjs Functions, classes, arrow functions, imports
TypeScript .ts, .tsx Same as JS + interfaces, type aliases
Go .go Functions, methods, structs, interfaces
PHP .php Functions, classes, methods, namespaces
Vue .vue SFC script symbols, props/emits/slots, router/store usage

High-confidence framework support (fixture-backed):

Framework Language Proven patterns
FastAPI Python @app.get(), @router.post(), docstrings, response_model
Flask Python @app.route() with method expansion
Laravel PHP Route::get(), grouped prefixes, middleware, resource expansion
Django / DRF Python path(), re_path(), @api_view, as_view(), DRF routers, @action
Express JS/TS Mounted routers via app.use(), nested prefixes, chained route() calls
Fastify JS/TS Plugin register(..., { prefix }), shorthand methods, route({ ... }), schema hints
Vue Vue SFC Component detection, defineProps, defineEmits, defineModel, defineSlots, router/store signals

Supported but not headline-high-confidence yet:

Framework Language Current coverage
NestJS TS @Controller + @Get/@Post decorators
Falcon Python app.add_route() + on_get/on_post responders
Gin / Echo / Fiber Go Common route helpers (GET, POST, HandleFunc)
Next.js / Nuxt JS/TS Repo-level framework detection and planning hints

Architecture

The current system is bucket-based.

Planner bucket types:

Type Purpose
system Architecture, setup, testing, deployment/ops, auth, shared middleware, observability
feature Business workflows like checkout, refunds, order status, onboarding
endpoint Endpoint-family or resource-level API docs
endpoint_ref One generated page per concrete API endpoint
integration Third-party systems like payment, warehouse, delivery, webhook providers
database Cross-cutting database/schema/data-layer documentation

Five implemented phases:

  1. Repository scan/indexing
    • Parse supported source files
    • Detect endpoints, config files, setup artifacts, OpenAPI specs
    • Record file sizes, symbols, imports, and raw scan summaries
  2. Multi-step planning
    • Classify repo artifacts
    • Propose system/feature/endpoint/integration/database buckets
    • Assign files, symbols, and artifacts into the final plan
  3. Generation engine
    • Build evidence packs for buckets
    • Generate pages in batches with parallel workers
    • Create nested endpoint reference pages under endpoint families
    • Validate output and degrade gracefully on failures
  4. Persistence
    • Persist plan, file map, scan cache, and generation ledger in .deepdoc/
    • Keep enough state for updates, staleness detection, and cleanup
  5. Smart update
    • Choose incremental update vs targeted replan vs full replan
    • Refresh only stale docs when safe
    • Rebuild affected docs after structural repo changes

Generated Files

After running deepdoc generate, you'll find:

your-repo/
├── .deepdoc.yaml              # Config
├── .deepdoc/                  # Canonical persisted state
│   ├── plan.json               # Bucket plan
│   ├── scan_cache.json         # Lightweight scan snapshot
│   ├── ledger.json             # Generated-page ledger
│   ├── file_map.json           # file → bucket/page mapping
│   └── state.json              # last synced commit + update status
├── .deepdoc_manifest.json     # Legacy source hash manifest
├── .deepdoc_plan.json         # Legacy compatibility plan file
├── .deepdoc_file_map.json     # Legacy compatibility file map
├── docs/                       # Generated MDX pages
│   ├── index.mdx
│   ├── architecture.mdx
│   ├── setup-and-configuration.mdx
│   ├── orders-api.mdx
│   ├── get-api-v1-orders.mdx
│   └── ...
└── site/                       # Generated Fumadocs app
    ├── app/
    ├── components/
    ├── lib/
    ├── openapi/                # Staged OpenAPI assets (when a spec exists)
    ├── public/
    └── out/                    # Static export after `deepdoc deploy`

GitHub Actions CI/CD

Automate doc updates on every push to main:

# .github/workflows/docs.yml
name: Update Documentation

on:
  push:
    branches: [main]

jobs:
  update-docs:
    runs-on: ubuntu-latest
    permissions:
      contents: write       # Needed for gh-pages push

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0    # Full history needed for git diff

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - uses: actions/setup-node@v4
        with:
          node-version: "20"

      - name: Install dependencies
        run: |
          pip install deepdoc   # or: pip install "deepdoc[chatbot]" if you use chatbot features

      - name: Update and deploy docs
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          deepdoc update --deploy

Add your API key to repo Settings → Secrets → Actions → ANTHROPIC_API_KEY.


Releasing

DeepDoc now supports automated releases through GitHub Actions.

What happens automatically

When you push to main, the release workflow checks the version in pyproject.toml. If that version does not already have a matching Git tag like v0.1.1, GitHub Actions will:

  • build the package
  • publish it to PyPI
  • create the Git tag
  • create a GitHub Release and attach the built files

Your release flow

  1. Update version = "..." in pyproject.toml
  2. Commit your changes
  3. Push to main

That is it. You do not need to manually create tags or GitHub Releases anymore.

One-time setup

  1. On PyPI, open the deepdoc project
  2. Go to Publishing
  3. Add a Trusted Publisher for GitHub Actions with:
    • owner: tss-pranavkumar
    • repository: deepdoc
    • workflow filename: release.yml
    • environment name: pypi
  4. In GitHub, open Settings → Actions → General
  5. Set Workflow permissions to Read and write permissions

After that, every new version pushed to main can publish without a PyPI token.


Typical Workflow

First time:

cd your-repo
deepdoc init --provider anthropic
export ANTHROPIC_API_KEY=sk-ant-...
deepdoc generate
deepdoc serve                      # Preview at localhost:3000
deepdoc deploy                     # Export a static site to site/out/

Every time you update code:

git add . && git commit -m "feat: new feature"
deepdoc update                     # Only regenerates affected pages
deepdoc deploy                     # Or use --deploy flag with update

Full refresh after planner / prompt / generator changes:

deepdoc generate --force

Wipe docs and rebuild from zero:

deepdoc generate --clean --yes

Switch LLM mid-project:

deepdoc config set llm.provider openai
deepdoc config set llm.model gpt-4o
export OPENAI_API_KEY=sk-...
deepdoc generate --force           # Full regen with new model

Requirements

  • Python 3.10+
  • Git (for deepdoc update and deepdoc deploy)
  • An LLM API key (or Ollama running locally)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepdoc-0.1.2.tar.gz (220.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepdoc-0.1.2-py3-none-any.whl (215.2 kB view details)

Uploaded Python 3

File details

Details for the file deepdoc-0.1.2.tar.gz.

File metadata

  • Download URL: deepdoc-0.1.2.tar.gz
  • Upload date:
  • Size: 220.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepdoc-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bc22d21847efc56b5ca28d30d5e8bd105befbca62041745d2834607e18ffcde6
MD5 84d806b5f7d3ed64470579b5b31a49ab
BLAKE2b-256 012b9a45664342e8ceb7d781221d3fbe99f0b14c3cd55598378b5b8ed80cfbbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepdoc-0.1.2.tar.gz:

Publisher: release.yml on tss-pranavkumar/deepdoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepdoc-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: deepdoc-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 215.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepdoc-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ccd6054b01864d6e22f4087f1f653ad7f3fd8d3795b0cd2b97e84902d1aafea1
MD5 8b264d0d899fb06327df9896e0d6ad96
BLAKE2b-256 9e43d3590ba33d5aed897b8a18b6fcfc1a7b4c8aeb50768d59a61cc560d2f047

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepdoc-0.1.2-py3-none-any.whl:

Publisher: release.yml on tss-pranavkumar/deepdoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page