AI-powered tool to automatically generate documentation for developers and AI Agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

AI Documentation Writer (Preview Release)

⚠️ Preview Release: This is an early preview version (v0.1.0). While functional, expect breaking changes in future releases.

Automatically generate comprehensive documentation for any Git repository or local codebase using AI-powered analysis and structured document generation.

Features

🤖 AI-Powered Analysis: Multi-stage AI analysis using configurable models (Gemini 2.5 Pro/Flash by default)
📁 Universal Support: Works with Git repositories (HTTP/HTTPS/SSH) and local directories
🔄 Modular Pipeline: 3-stage resumable pipeline (Prepare Files → Generate Description → Document Codebase)
📊 Full Observability: Built-in Prefect orchestration with LMNR tracing support
⚙️ Flexible Configuration: FlowOptions system for model selection and batch processing control
📝 Comprehensive Output: Generates detailed file/directory summaries and project documentation
🚀 Production Ready: Async architecture, strong typing with Pydantic, and comprehensive error handling
🎯 Smart File Filtering: AI-powered file selection for relevant code documentation

Quick Start

Installation

pip install ai-documentation-writer

Prerequisites

LiteLLM Proxy - Required for AI model abstraction:

# litellm-config.yml
model_list:
  - model_name: gpt-5
    litellm_params:
      model: openai/gpt-4o
      api_key: your-api-key
  - model_name: gpt-5-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: your-api-key
  - model_name: gemini-2.5-pro
    litellm_params:
      model: gemini/gemini-2.0-pro-exp-0111
      api_key: your-api-key

Run the proxy:

docker run -d --name litellm-proxy -p 4000:4000 \
  -v $(pwd)/litellm-config.yml:/app/config.yaml \
  ghcr.io/berriai/litellm:main --config /app/config.yaml

Environment Configuration:

# .env file
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=your-litellm-key

Usage

# Git repository
doc-writer https://github.com/user/repo ./output

# Local directory
doc-writer /path/to/project ./output

# With custom instructions
doc-writer /path/to/project ./output --instructions "Focus on API documentation"

# Resume from specific stage
doc-writer /path/to/project ./output --start 2

Pipeline Stages

Stage 1: Prepare Project Files

Clones Git repository or copies local directory
Intelligently selects relevant text files for documentation
Filters out binary files, dependencies, and build artifacts
Creates structured file tree with size information

Stage 2: Generate Initial Description

Iteratively analyzes project files using AI
Builds comprehensive understanding through multiple exploration rounds
Extracts key architectural patterns and design decisions
Produces detailed project overview with technical insights

Stage 3: Document Codebase

Processes files in batches for efficient AI analysis
Generates detailed summaries for each file and directory
Creates hierarchical documentation structure
Produces comprehensive codebase documentation with full context

Output Structure

output/
├── user_input_document/
│   └── user_input.json           # Initial configuration
├── project_files_document/
│   └── project_files.json         # Selected files with content
├── project_initial_description_document/
│   └── initial_description.md     # AI-generated project overview
└── codebase_documentation_document/
    └── codebase_documentation.json # Comprehensive file/directory summaries

Each document is a self-contained artifact that can be:

Used as input for subsequent stages
Exported for external processing
Versioned and tracked in git

Development

Setup

git clone https://github.com/bbarwik/ai-documentation-writer.git
cd ai-documentation-writer

# Install with development dependencies
make install-dev  # or: pip install -e ".[dev]" && pre-commit install

Testing

make test          # Run all tests
make test-cov      # With coverage
make lint          # Run linting
make format        # Auto-format code
make typecheck     # Type checking

Architecture

The project implements a clean, layered architecture:

Core Components

Documents (documents/flow/): Strongly-typed Pydantic models representing data at each stage
- UserInputDocument: User configuration and instructions
- ProjectFilesDocument: Selected project files with content
- ProjectInitialDescriptionDocument: AI-generated project overview
- CodebaseDocumentationDocument: Complete file/directory documentation
Tasks (tasks/): Atomic processing units that contain business logic
- Each task has its own PromptManager for template management
- Tasks are pure async functions with full type hints
- Examples: clone_repository_task, generate_initial_description_task
Flows (flows/): Prefect-orchestrated pipelines
- Compose tasks into larger workflows
- Handle document validation and persistence
- No direct PromptManager usage (separation of concerns)
FlowOptions: Configuration system for runtime parameters
- Model selection (core_model, small_model, supporting_models)
- Batch processing limits (batch_max_chars, batch_max_files)
- Feature flags (enable_file_filtering)

Design Principles

Async-First: All I/O operations are async (no blocking calls)
Type Safety: Complete type hints with Pydantic validation
Minimal Code: Every line must justify its existence
No Defensive Programming: Trust the types, fail fast
Clear Boundaries: Strict separation between flows, tasks, and documents

Configuration

FlowOptions

Configure AI models and processing parameters:

from ai_documentation_writer.flow_options import FlowOptions

options = FlowOptions(
    core_model="gemini-2.5-pro",         # Primary model for complex analysis
    small_model="gemini-2.5-flash",      # Fast model for simple tasks
    supporting_models=["gemini-2.5-flash"],  # Additional models for planning
    batch_max_chars=200_000,              # Max characters per batch (default: 200K)
    batch_max_files=50,                   # Max files per batch (default: 50)
    enable_file_filtering=True            # Enable AI-powered file selection
)

CLI Options

# Override default models
doc-writer /path/to/project ./output \
  --core-model gpt-4o \
  --small-model gpt-4o-mini \
  --supporting-models claude-3-sonnet gemini-2.5-pro

# Adjust batch processing
doc-writer /path/to/project ./output \
  --batch-max-chars 100000 \
  --batch-max-files 25

# Resume from specific stage (1-3)
doc-writer /path/to/project ./output --start 2 --end 3

Implementation Details

Document System

All data flows through strongly-typed Document classes:

# Flow documents persist between stages
class ProjectFilesDocument(FlowDocument):
    """Contains selected project files with content."""

# Documents are versioned and self-describing
doc = ProjectFilesDocument.create_as_json(
    name="project_files.json",
    description="Selected text files from the repository",
    data=ProjectFilesData(files={...})
)

Prompt Engineering

The project implements defensive prompt engineering patterns:

Header Hierarchy: Instructions use #, AI output uses ##/###
File Separation: Files provided as separate messages to prevent injection
Structured Output: Pydantic models for predictable AI responses
Context Accumulation: Conversation history maintained across iterations

Batch Processing

Efficient handling of large codebases:

# Files are processed in configurable batches
batches = create_file_batches(
    files=project_files,
    max_chars=flow_options.batch_max_chars,
    max_files=flow_options.batch_max_files
)

# Parallel processing with asyncio
results = await asyncio.gather(
    *[process_batch(batch) for batch in batches]
)

Troubleshooting

Common Issues

Issue	Solution
Git not found	Install Git or use local directory path
Clone failed	Check network, verify URL, or download manually
Model errors	Ensure LiteLLM proxy is running with correct config
Out of memory	Reduce batch sizes with `--batch-max-chars`
Slow processing	Use faster model with `--small-model gemini-2.5-flash`

Advanced Usage

Custom Instructions

Provide specific documentation requirements:

doc-writer /path/to/project ./output \
  --instructions "Focus on API endpoints and data models. Include examples for each function."

Integration with CI/CD

# GitHub Actions example
- name: Generate Documentation
  run: |
    pip install ai-documentation-writer
    doc-writer . ./docs --instructions "Update based on latest changes"

- name: Commit Documentation
  run: |
    git add docs/
    git commit -m "docs: auto-generated documentation"
    git push

Programmatic Usage

import asyncio
from pathlib import Path
from ai_documentation_writer.flows import prepare_project_files
from ai_documentation_writer.flow_options import FlowOptions
from ai_documentation_writer.documents.flow.user_input import UserInputDocument
from ai_pipeline_core.documents import DocumentList

async def generate_docs():
    # Create input document
    user_input = UserInputDocument.create_as_json(
        name="user_input.json",
        description="Configuration",
        data={"target": "/path/to/project"}
    )

    # Run flow
    result = await prepare_project_files(
        project_name="my-project",
        documents=DocumentList([user_input]),
        flow_options=FlowOptions()
    )

    return result

asyncio.run(generate_docs())

License

MIT License - see LICENSE file for details.

Credits

Built on ai-pipeline-core for robust async AI pipeline orchestration.

Contributing

Contributions are welcome! Please ensure:

All tests pass (make test)
Code is formatted (make format)
Type hints are complete (make typecheck)
Coverage meets minimum requirements (make test-cov)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bbarwik

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Aug 22, 2025

This version

0.1.0

Aug 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_documentation_writer-0.1.0.tar.gz (24.6 kB view details)

Uploaded Aug 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_documentation_writer-0.1.0-py3-none-any.whl (38.3 kB view details)

Uploaded Aug 21, 2025 Python 3

File details

Details for the file ai_documentation_writer-0.1.0.tar.gz.

File metadata

Download URL: ai_documentation_writer-0.1.0.tar.gz
Upload date: Aug 21, 2025
Size: 24.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ai_documentation_writer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`221b11ca39bf74f0e8c3fb1300447d34b30e62e1e232c9a07a7291060b40d737`
MD5	`65a19aa73567359b3922ec7a6082f650`
BLAKE2b-256	`3c26517b8ed078a8b783841d11ac28643c712870e03fdb76e19958058f492636`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_documentation_writer-0.1.0.tar.gz:

Publisher: release.yml on bbarwik/ai-documentation-writer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_documentation_writer-0.1.0.tar.gz
- Subject digest: 221b11ca39bf74f0e8c3fb1300447d34b30e62e1e232c9a07a7291060b40d737
- Sigstore transparency entry: 416476959
- Sigstore integration time: Aug 21, 2025
Source repository:
- Permalink: bbarwik/ai-documentation-writer@382fa9eb00459c033652a3f3cf05c4dda0eec845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/bbarwik
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@382fa9eb00459c033652a3f3cf05c4dda0eec845
- Trigger Event: push

File details

Details for the file ai_documentation_writer-0.1.0-py3-none-any.whl.

File metadata

Download URL: ai_documentation_writer-0.1.0-py3-none-any.whl
Upload date: Aug 21, 2025
Size: 38.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ai_documentation_writer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1562f07e043d2dca9b01304fb6b798fc49a5a410e8d5efaadd211264dfadd0ab`
MD5	`5e501c380170ae6c2877a45374d3f228`
BLAKE2b-256	`04eac3bd77b05efb86d972a0c759336728f95c8501d36e5be7276ffb289d3085`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_documentation_writer-0.1.0-py3-none-any.whl:

Publisher: release.yml on bbarwik/ai-documentation-writer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_documentation_writer-0.1.0-py3-none-any.whl
- Subject digest: 1562f07e043d2dca9b01304fb6b798fc49a5a410e8d5efaadd211264dfadd0ab
- Sigstore transparency entry: 416476975
- Sigstore integration time: Aug 21, 2025
Source repository:
- Permalink: bbarwik/ai-documentation-writer@382fa9eb00459c033652a3f3cf05c4dda0eec845
- Branch / Tag: refs/heads/main
- Owner: https://github.com/bbarwik
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@382fa9eb00459c033652a3f3cf05c4dda0eec845
- Trigger Event: push

ai-documentation-writer 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

AI Documentation Writer (Preview Release)

Features

Quick Start

Installation

Prerequisites

Usage

Pipeline Stages

Stage 1: Prepare Project Files

Stage 2: Generate Initial Description

Stage 3: Document Codebase

Output Structure

Development

Setup

Testing

Architecture

Core Components

Design Principles

Configuration

FlowOptions

CLI Options

Implementation Details

Document System

Prompt Engineering

Batch Processing

Troubleshooting

Common Issues

Advanced Usage

Custom Instructions

Integration with CI/CD

Programmatic Usage

License

Credits

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance