AI-powered tool to automatically generate documentation for developers and AI Agents
Project description
AI Documentation Writer (Preview Release)
⚠️ Preview Release: This is an early preview version (v0.1.1). While functional, expect breaking changes in future releases.
Automatically generate comprehensive documentation for any Git repository or local codebase using AI-powered analysis and structured document generation.
Features
- 🤖 AI-Powered Analysis: Multi-stage AI analysis using configurable models (Gemini 2.5 Pro/Flash by default)
- 📁 Universal Support: Works with Git repositories (HTTP/HTTPS/SSH) and local directories
- 🔄 Modular Pipeline: 3-stage resumable pipeline (Prepare Files → Generate Description → Document Codebase)
- 📊 Full Observability: Built-in Prefect orchestration with LMNR tracing support
- ⚙️ Flexible Configuration: FlowOptions system for model selection and batch processing control
- 📝 Comprehensive Output: Generates detailed file/directory summaries and project documentation
- 🚀 Production Ready: Async architecture, strong typing with Pydantic, and comprehensive error handling
- 🎯 Smart File Filtering: AI-powered file selection for relevant code documentation
Quick Start
Installation
pip install ai-documentation-writer
Prerequisites
- LiteLLM Proxy - Required for AI model abstraction:
# litellm-config.yml
model_list:
- model_name: gpt-5
litellm_params:
model: openai/gpt-4o
api_key: your-api-key
- model_name: gpt-5-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: your-api-key
- model_name: gemini-2.5-pro
litellm_params:
model: gemini/gemini-2.0-pro-exp-0111
api_key: your-api-key
Run the proxy:
docker run -d --name litellm-proxy -p 4000:4000 \
-v $(pwd)/litellm-config.yml:/app/config.yaml \
ghcr.io/berriai/litellm:main --config /app/config.yaml
- Environment Configuration:
# .env file
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=your-litellm-key
Usage
# Git repository
doc-writer https://github.com/user/repo ./output
# Local directory
doc-writer /path/to/project ./output
# With custom instructions
doc-writer /path/to/project ./output --instructions "Focus on API documentation"
# Resume from specific stage
doc-writer /path/to/project ./output --start 2
Pipeline Stages
Stage 1: Prepare Project Files
- Clones Git repository or copies local directory
- Intelligently selects relevant text files for documentation
- Filters out binary files, dependencies, and build artifacts
- Creates structured file tree with size information
Stage 2: Generate Initial Description
- Iteratively analyzes project files using AI
- Builds comprehensive understanding through multiple exploration rounds
- Extracts key architectural patterns and design decisions
- Produces detailed project overview with technical insights
Stage 3: Document Codebase
- Processes files in batches for efficient AI analysis
- Generates detailed summaries for each file and directory
- Creates hierarchical documentation structure
- Produces comprehensive codebase documentation with full context
Output Structure
output/
├── user_input_document/
│ └── user_input.json # Initial configuration
├── project_files_document/
│ └── project_files.json # Selected files with content
├── project_initial_description_document/
│ └── initial_description.md # AI-generated project overview
└── codebase_documentation_document/
└── codebase_documentation.json # Comprehensive file/directory summaries
Each document is a self-contained artifact that can be:
- Used as input for subsequent stages
- Exported for external processing
- Versioned and tracked in git
Development
Setup
git clone https://github.com/bbarwik/ai-documentation-writer.git
cd ai-documentation-writer
# Install with development dependencies
make install-dev # or: pip install -e ".[dev]" && pre-commit install
Testing
make test # Run all tests
make test-cov # With coverage
make lint # Run linting
make format # Auto-format code
make typecheck # Type checking
Architecture
The project implements a clean, layered architecture:
Core Components
-
Documents (
documents/flow/): Strongly-typed Pydantic models representing data at each stageUserInputDocument: User configuration and instructionsProjectFilesDocument: Selected project files with contentProjectInitialDescriptionDocument: AI-generated project overviewCodebaseDocumentationDocument: Complete file/directory documentation
-
Tasks (
tasks/): Atomic processing units that contain business logic- Each task has its own PromptManager for template management
- Tasks are pure async functions with full type hints
- Examples:
clone_repository_task,generate_initial_description_task
-
Flows (
flows/): Prefect-orchestrated pipelines- Compose tasks into larger workflows
- Handle document validation and persistence
- No direct PromptManager usage (separation of concerns)
-
FlowOptions: Configuration system for runtime parameters
- Model selection (core_model, small_model, supporting_models)
- Batch processing limits (batch_max_chars, batch_max_files)
- Feature flags (enable_file_filtering)
Design Principles
- Async-First: All I/O operations are async (no blocking calls)
- Type Safety: Complete type hints with Pydantic validation
- Minimal Code: Every line must justify its existence
- No Defensive Programming: Trust the types, fail fast
- Clear Boundaries: Strict separation between flows, tasks, and documents
Configuration
FlowOptions
Configure AI models and processing parameters:
from ai_documentation_writer.flow_options import FlowOptions
options = FlowOptions(
core_model="gemini-2.5-pro", # Primary model for complex analysis
small_model="gemini-2.5-flash", # Fast model for simple tasks
supporting_models=["gemini-2.5-flash"], # Additional models for planning
batch_max_chars=200_000, # Max characters per batch (default: 200K)
batch_max_files=50, # Max files per batch (default: 50)
enable_file_filtering=True # Enable AI-powered file selection
)
CLI Options
# Override default models
doc-writer /path/to/project ./output \
--core-model gpt-4o \
--small-model gpt-4o-mini \
--supporting-models claude-3-sonnet gemini-2.5-pro
# Adjust batch processing
doc-writer /path/to/project ./output \
--batch-max-chars 100000 \
--batch-max-files 25
# Resume from specific stage (1-3)
doc-writer /path/to/project ./output --start 2 --end 3
Implementation Details
Document System
All data flows through strongly-typed Document classes:
# Flow documents persist between stages
class ProjectFilesDocument(FlowDocument):
"""Contains selected project files with content."""
# Documents are versioned and self-describing
doc = ProjectFilesDocument.create_as_json(
name="project_files.json",
description="Selected text files from the repository",
data=ProjectFilesData(files={...})
)
Prompt Engineering
The project implements defensive prompt engineering patterns:
- Header Hierarchy: Instructions use
#, AI output uses##/### - File Separation: Files provided as separate messages to prevent injection
- Structured Output: Pydantic models for predictable AI responses
- Context Accumulation: Conversation history maintained across iterations
Batch Processing
Efficient handling of large codebases:
# Files are processed in configurable batches
batches = create_file_batches(
files=project_files,
max_chars=flow_options.batch_max_chars,
max_files=flow_options.batch_max_files
)
# Parallel processing with asyncio
results = await asyncio.gather(
*[process_batch(batch) for batch in batches]
)
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
| Git not found | Install Git or use local directory path |
| Clone failed | Check network, verify URL, or download manually |
| Model errors | Ensure LiteLLM proxy is running with correct config |
| Out of memory | Reduce batch sizes with --batch-max-chars |
| Slow processing | Use faster model with --small-model gemini-2.5-flash |
Advanced Usage
Custom Instructions
Provide specific documentation requirements:
doc-writer /path/to/project ./output \
--instructions "Focus on API endpoints and data models. Include examples for each function."
Integration with CI/CD
# GitHub Actions example
- name: Generate Documentation
run: |
pip install ai-documentation-writer
doc-writer . ./docs --instructions "Update based on latest changes"
- name: Commit Documentation
run: |
git add docs/
git commit -m "docs: auto-generated documentation"
git push
Programmatic Usage
import asyncio
from pathlib import Path
from ai_documentation_writer.flows import prepare_project_files
from ai_documentation_writer.flow_options import FlowOptions
from ai_documentation_writer.documents.flow.user_input import UserInputDocument
from ai_pipeline_core.documents import DocumentList
async def generate_docs():
# Create input document
user_input = UserInputDocument.create_as_json(
name="user_input.json",
description="Configuration",
data={"target": "/path/to/project"}
)
# Run flow
result = await prepare_project_files(
project_name="my-project",
documents=DocumentList([user_input]),
flow_options=FlowOptions()
)
return result
asyncio.run(generate_docs())
License
MIT License - see LICENSE file for details.
Credits
Built on ai-pipeline-core for robust async AI pipeline orchestration.
Contributing
Contributions are welcome! Please ensure:
- All tests pass (
make test) - Code is formatted (
make format) - Type hints are complete (
make typecheck) - Coverage meets minimum requirements (
make test-cov)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_documentation_writer-0.1.1.tar.gz.
File metadata
- Download URL: ai_documentation_writer-0.1.1.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5126ee5488f1f3fe56ec9dc9885791172b67c5b8e834536617f1deb7dd38713e
|
|
| MD5 |
fc966d6c986fc52e87e579dfa1696cf4
|
|
| BLAKE2b-256 |
a68cd1d4adb9a350a1f7e81a1a2d418637a2f40e8c997b2e70c2485188dc1ade
|
Provenance
The following attestation bundles were made for ai_documentation_writer-0.1.1.tar.gz:
Publisher:
release.yml on bbarwik/ai-documentation-writer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_documentation_writer-0.1.1.tar.gz -
Subject digest:
5126ee5488f1f3fe56ec9dc9885791172b67c5b8e834536617f1deb7dd38713e - Sigstore transparency entry: 422167133
- Sigstore integration time:
-
Permalink:
bbarwik/ai-documentation-writer@e81b362122fd17e9082f4123ff4f3cfb63dfe5bb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/bbarwik
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e81b362122fd17e9082f4123ff4f3cfb63dfe5bb -
Trigger Event:
push
-
Statement type:
File details
Details for the file ai_documentation_writer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ai_documentation_writer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fb4afd8323fd165f2e61f7b9d7731283579ca3f48ad61d3df5ac3951a2d2903
|
|
| MD5 |
f10a64fde1aea2d0ab0a8aba92c0f33d
|
|
| BLAKE2b-256 |
113e0c43fd03fdbcd47d20a3f3006528392b1f214a288e6e1b79437da25930d1
|
Provenance
The following attestation bundles were made for ai_documentation_writer-0.1.1-py3-none-any.whl:
Publisher:
release.yml on bbarwik/ai-documentation-writer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_documentation_writer-0.1.1-py3-none-any.whl -
Subject digest:
9fb4afd8323fd165f2e61f7b9d7731283579ca3f48ad61d3df5ac3951a2d2903 - Sigstore transparency entry: 422167146
- Sigstore integration time:
-
Permalink:
bbarwik/ai-documentation-writer@e81b362122fd17e9082f4123ff4f3cfb63dfe5bb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/bbarwik
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e81b362122fd17e9082f4123ff4f3cfb63dfe5bb -
Trigger Event:
push
-
Statement type: