A language-agnostic docstyle compliance & remediation tool
Project description
DocOctopy
A language-agnostic docstyle compliance & remediation tool that scans code for docstring/docblock presence and style, reports findings, and can auto-propose LLM-based fixes.
Features
๐ Comprehensive Scanning
- Python-first with extensible architecture for other languages
- Google-style docstring validation with detailed compliance checking
- Context-specific rules for test functions, public APIs, and exception handling
- AST-based analysis for accurate symbol and signature detection
- Smart caching with incremental scanning for large codebases
- Sophisticated complexity detection using cyclomatic complexity metrics
- Multi-style docstring support (Google, Sphinx, NumPy, Facebook)
- Content quality validation (TODO/placeholders, conflict markers, delimiter style)
๐ Multiple Output Formats
- Pretty console output with Rich formatting
- JSON reports for CI/CD integration
- SARIF format for GitHub Code Scanning
- Configurable exit codes based on severity levels
๐ค LLM-Powered Remediation
- Automatic docstring generation for missing documentation
- Smart fixing of non-compliant docstrings
- Enhancement of existing docstrings with missing elements
- DSPy integration for reliable, structured LLM interactions
- Interactive review mode with diff preview and approval workflow
- Multiple LLM providers (OpenAI, Anthropic, Ollama)
โ๏ธ Flexible Configuration
- pyproject.toml integration with rule enable/disable switches
- Per-path overrides for different project sections
- Gitignore-style exclusions with pathspec support
- Rule severity customization (error, warning, info, off)
- LLM provider configuration with interactive setup commands
- Configuration hierarchy (environment โ user โ project โ CLI)
- Secure API key management with user-specific config files
Installation
Basic Installation
pip install dococtopy
With LLM Support
pip install dococtopy[llm]
Note for
uvusers:uvdoesn't support the[extra]syntax yet. Useuv pipinstead:
This works with uv
uv pip install dococtopy[llm]
This also works with uv (if you have a dev group in pyproject.toml)
uv pip install --group dev dococtopy[llm]
This doesn't work with uv (known limitation)
uv add dococtopy[llm] --dev # Won't install DSPy correctly
### Development Installation
```bash
git clone https://github.com/CrazyBonze/DocOctopy.git
cd DocOctopy
uv sync --group dev
Development Scripts
The scripts/ directory contains utility scripts for development and testing:
comprehensive_compare_models.py- Compare LLM models for docstring generation quality and costpre-commit.sh- Pre-commit hook for formatting and lintingpublish.sh- Publishing script for PyPI releases- See
scripts/README.mdfor detailed usage instructions
Quick Start
1. Scan Your Code
# Scan current directory
dococtopy scan .
# Scan specific paths
dococtopy scan src/ tests/
# Get JSON output
dococtopy scan . --format json --output-file report.json
# Use SARIF for GitHub Code Scanning
dococtopy scan . --format sarif --output-file report.sarif
2. Fix Issues with LLM Assistance
# Dry-run mode (safe, shows what would be fixed)
dococtopy fix . --dry-run
# Interactive mode (review each change)
dococtopy fix . --interactive
# Fix specific rules only
dococtopy fix . --rule DG101,DG202 --dry-run
# Use different LLM provider
dococtopy fix . --llm-provider anthropic --llm-model claude-haiku-3.5
# Use local Ollama server
dococtopy fix . --llm-provider ollama --llm-model codeqwen:latest --llm-base-url http://localhost:11434
3. Configure Your Project
Create a pyproject.toml file:
[tool.docguard]
exclude = ["**/.venv/**", "**/build/**", "**/node_modules/**"]
[tool.docguard.rules]
DG101 = "error" # Missing docstrings
DG201 = "error" # Google style parse errors
DG202 = "error" # Missing parameters
DG203 = "error" # Extra parameters
DG204 = "warning" # Returns section issues
DG205 = "info" # Raises validation
DG301 = "warning" # Summary style
DG302 = "warning" # Blank line after summary
DG211 = "info" # Yields section validation
DG212 = "info" # Attributes section validation
DG213 = "info" # Examples section validation
DG214 = "info" # Note section validation
DG215 = "info" # Private method docstring recommendation
DG216 = "info" # Dunder method docstring recommendation
DG303 = "warning" # Content quality validation
DG304 = "info" # Docstring delimiter style
DG401 = "warning" # Test function docstring style
DG402 = "warning" # Public API function documentation
DG403 = "warning" # Exception documentation completeness
# Per-path overrides
[[tool.docguard.overrides]]
patterns = ["tests/**"]
rules.DG101 = "off" # Disable missing docstrings in tests
# LLM configuration (optional)
[tool.docguard.llm]
default_provider = "openai"
default_model = "gpt-4o-mini"
[tool.docguard.llm.openai]
base_url = "https://api.openai.com/v1"
[tool.docguard.llm.ollama]
base_url = "http://localhost:11434"
LLM Setup
DocOctopy supports multiple LLM providers with a flexible configuration system. You can configure providers using the built-in configuration commands or environment variables.
Quick Setup (Recommended)
Use the interactive configuration command to set up your preferred LLM provider:
dococtopy config setup
This will guide you through setting up OpenAI, Anthropic, or Ollama with your preferred models and settings.
Manual Configuration
Option A: Local Ollama (Recommended for Development)
-
Install Ollama: Download from ollama.ai
-
Pull a model:
ollama pull codeqwen:latest # or ollama pull llama3.1:8b
-
Configure DocOctopy:
# Set up Ollama configuration dococtopy config setup # Select "ollama" as provider and configure base URL
-
Test DocOctopy:
dococtopy fix . --llm-provider ollama --llm-model codeqwen:latest --dry-run
Option B: OpenAI
-
Get API key: OpenAI API Keys
-
Configure DocOctopy:
# Interactive setup dococtopy config setup # Select "openai" and enter your API key # Or set environment variable export OPENAI_API_KEY="your-api-key"
-
Test DocOctopy:
dococtopy fix . --llm-provider openai --llm-model gpt-4o-mini --dry-run
Option C: Anthropic
-
Get API key: Anthropic Console
-
Configure DocOctopy:
# Interactive setup dococtopy config setup # Select "anthropic" and enter your API key # Or set environment variable export ANTHROPIC_API_KEY="your-api-key"
-
Test DocOctopy:
# Use best Anthropic model (claude-haiku-3.5) - Highest quality score dococtopy fix . --llm-provider anthropic --llm-model claude-haiku-3.5 --dry-run # Or use budget Anthropic option (claude-haiku-3) - Best value dococtopy fix . --llm-provider anthropic --llm-model claude-haiku-3 --dry-run
Configuration Management
DocOctopy provides several commands to manage your LLM configuration:
# Interactive setup (recommended)
dococtopy config setup
# Show current configuration
dococtopy config show
# Initialize project configuration
dococtopy config init
Configuration Hierarchy
DocOctopy uses a configuration hierarchy (highest to lowest priority):
- Environment variables (e.g.,
OPENAI_API_KEY) - User configuration (
~/.config/dococtopy/config.toml) - Project configuration (
pyproject.toml) - Command-line arguments
This allows you to:
- Set global defaults in your user config
- Override per-project in
pyproject.toml - Use environment variables for CI/CD
- Override with command-line arguments for testing
Rules Reference
Basic Compliance Rules
- DG101: Missing docstring (functions and classes)
- DG301: Summary first line should end with period
- DG302: Blank line required after summary
Google Style Validation Rules
- DG201: Google style docstring parse error
- DG202: Parameter missing from docstring
- DG203: Extra parameter in docstring
- DG204: Returns section missing or mismatched
- DG205: Raises section validation
- DG206: Args section format validation
- DG207: Returns section format validation
- DG208: Raises section format validation
- DG209: Summary length validation
- DG210: Docstring indentation consistency
Advanced Google Style Rules
- DG211: Generator functions should have Yields section
- DG212: Classes with public attributes should have Attributes section
- DG213: Complex functions should have Examples section
- DG214: Functions with special behavior should have Note section
Content Quality Rules
- DG303: Detect TODO/placeholder content and version control conflict markers
- DG304: Detect single quote delimiters vs double quotes (style consistency)
Recommendation Rules
- DG215: Private methods should have docstrings (recommendation)
- DG216: Standard dunder methods should have docstrings (recommendation)
Context-Specific Rules
- DG401: Test functions should have descriptive docstrings explaining what they test
- DG402: Public API functions should have comprehensive documentation (Args, Returns, Raises sections)
- DG403: Functions should document all exceptions they raise in the Raises section
Context-Specific Rules Details
DG401: Test Function Docstring Style
Ensures test functions have descriptive docstrings that explain what they're testing, improving test readability and debugging.
Examples:
# โ
Good - Descriptive test docstring
def test_user_authentication_works_correctly():
"""Test that user authentication validates credentials properly."""
# Test implementation...
# โ Bad - Generic or non-descriptive
def test_user_login():
"""Test.""" # Too short and generic
# Test implementation...
def test_something():
"""Test function.""" # Generic pattern
# Test implementation...
What it checks:
- Functions starting with
test_or containing "test" in the name - Docstrings must be descriptive (>20 characters)
- Avoids generic patterns like "Test", "Test function", etc.
DG402: Public API Function Documentation
Requires public API functions to have comprehensive documentation with Args, Returns, and Raises sections.
Examples:
# โ
Good - Complete public API documentation
def process_data(data, options=None):
"""Process the input data according to the given options.
Args:
data: The input data to process
options: Optional configuration options
Returns:
Processed data result
Raises:
ValueError: If data format is invalid
"""
# Implementation...
# โ Bad - Missing required sections
def process_data(data, options=None):
"""Process the input data.""" # Missing Args, Returns, Raises
# Implementation...
What it checks:
- Public functions (non-private, non-test, non-dunder)
- Requires Args, Returns, and Raises sections
- Skips internal/helper functions automatically
DG403: Exception Documentation Completeness
Ensures functions document all exceptions they raise in the Raises section.
Examples:
# โ
Good - All exceptions documented
def risky_function():
"""Do something risky.
Raises:
ValueError: If input is invalid
RuntimeError: If operation fails
"""
raise ValueError("test")
raise RuntimeError("test")
# โ Bad - Undocumented exceptions
def risky_function():
"""Do something risky."""
raise ValueError("test") # Not documented
raise RuntimeError("test") # Not documented
What it checks:
- Uses AST analysis to detect
raisestatements - Parses docstring Raises sections
- Flags undocumented exceptions with specific names
Interactive Fix Mode
DocOctopy includes an interactive mode that lets you review and approve each proposed change:
dococtopy fix . --interactive
Interactive Features
- Diff preview: See exactly what will be changed
- Change-by-change review: Accept or reject each fix individually
- Rich formatting: Beautiful console output with colors
- Summary statistics: Track approved vs rejected changes
Example Interactive Session
Found 3 changes for src/main.py
Change: process_data (function)
Issues: DG101
Proposed docstring:
"""Process the input data and return results.
Args:
data: The input data to process.
options: Processing options.
Returns:
Processed data results.
"""
Show diff? [Y/n]: y
--- Original
+++ Proposed
@@ -15,6 +15,15 @@
def process_data(data, options):
+ """Process the input data and return results.
+
+ Args:
+ data: The input data to process.
+ options: Processing options.
+
+ Returns:
+ Processed data results.
+ """
result = []
for item in data:
result.append(transform(item, options))
Apply this change? [Y/n]: y
โ Applied change for process_data
Summary:
- Total changes: 3
- Applied: 1
- Rejected: 1
- Skipped: 1
Recommended Models
DocOctopy supports multiple LLM providers with different models optimized for docstring generation. Based on comprehensive testing with real-world code:
๐ OpenAI Models (Recommended)
| Model | Cost (per 1M tokens) | Quality Score | Quality per Dollar | Best For |
|---|---|---|---|---|
| gpt-5-nano | $0.45 | 39/50 | 39,796 | โ Default choice - Best value |
| gpt-5-mini | $2.25 | 41/50 | 8,367 | โ Premium choice - Enterprise quality |
| gpt-4.1-mini | $2.00 | 41/50 | 9,491 | Alternative option |
| gpt-4.1-nano | $0.50 | 46/50 | 42,593 | Budget option |
Key Findings:
- gpt-5-nano: 5x cheaper than GPT-5-mini with 95% of the quality - exceptional value
- gpt-5-mini: Comprehensive documentation with detailed business logic and examples
- gpt-4.1-mini: Solid alternative with good quality but higher cost than GPT-5-nano
- gpt-4.1-nano: Budget option with good quality-per-dollar ratio
๐ค Anthropic Models
| Model | Cost (per 1M tokens) | Quality Score | Quality per Dollar | Best For |
|---|---|---|---|---|
| claude-haiku-3.5 | $0.25 | 67/50 | 6,442 | โ Best Anthropic - Highest quality |
| claude-sonnet-4 | $3.00 | 41/50 | 1,051 | High performance option |
| claude-haiku-3 | $0.25 | 41/50 | 12,615 | Budget Anthropic option |
| claude-opus-4.1 | $15.00 | 41/50 | 210 | Premium option (expensive) |
Anthropic Highlights:
- claude-haiku-3.5: Highest quality score (67/50) with excellent cost efficiency
- claude-haiku-3: Best quality-per-dollar ratio among Anthropic models
- All Anthropic models provide reliable, consistent docstring generation
๐ก Model Selection Guide
| Use Case | Recommended Model | Reason |
|---|---|---|
| Development | gpt-5-nano | Best value, reliable quality |
| Testing/CI | gpt-5-nano | Cost-effective for automated runs |
| Production | gpt-5-mini | Maximum quality for end users |
| Enterprise | gpt-5-mini | Comprehensive documentation |
| Budget-Conscious | gpt-5-nano | Excellent quality at low cost |
| Privacy-First | Ollama codeqwen | Local processing |
| Anthropic Preference | claude-haiku-3.5 | Highest quality score |
| Free Tier | claude-haiku-3 | Good quality-per-dollar ratio |
๐ Quick Start Commands
# Use default (gpt-5-nano) - Best value
dococtopy fix . --rule DG101
# Use premium (gpt-5-mini) - Maximum quality
dococtopy fix . --rule DG101 --llm-model gpt-5-mini
# Use best Anthropic (claude-haiku-3.5) - Highest quality score
dococtopy fix . --rule DG101 --llm-provider anthropic --llm-model claude-haiku-3.5
# Use local (Ollama) - Privacy-first
dococtopy fix . --rule DG101 --llm-provider ollama --llm-model codeqwen:latest
๐ See detailed comparison results: docs/model-comparison/ - Compare actual generated docstrings side-by-side
๐ Quick summary: docs/model-comparison/SUMMARY.md - Decision matrix and recommendations
CLI Reference
dococtopy scan
Scan paths for documentation compliance issues.
dococtopy scan [PATHS...] [OPTIONS]
Options:
--format {pretty,json,sarif,both} Output format [default: pretty]
--config PATH Config file path [default: pyproject.toml]
--fail-level {error,warning,info} Exit code threshold [default: error]
--no-cache Disable caching
--changed-only Only scan changed files
--stats Show cache statistics
--output-file PATH Write output to file
dococtopy fix
Fix documentation issues using LLM assistance.
dococtopy fix [PATHS...] [OPTIONS]
Options:
--dry-run Show changes without applying [default: False]
--interactive Accept/reject each fix interactively
--rule TEXT Comma-separated rule IDs to fix
--max-changes INTEGER Maximum number of changes
--llm-provider {openai,anthropic,ollama} LLM provider [default: openai]
--llm-model TEXT LLM model to use [default: gpt-5-nano]
--llm-base-url TEXT Base URL for LLM provider (for Ollama, etc.)
--config PATH Config file path
dococtopy config
Manage DocOctopy configuration for LLM providers.
dococtopy config [OPTIONS]
Options:
--setup Interactive configuration setup
--show Show current configuration
--init Initialize project configuration
Examples:
# Interactive setup (recommended)
dococtopy config setup
# Show current configuration
dococtopy config show
# Initialize project configuration
dococtopy config init
CI/CD Integration
Create .github/workflows/docstring-check.yml:
name: Docstring Compliance
on: [push, pull_request]
jobs:
docstring-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install dococtopy
- run: dococtopy scan . --format json --output-file report.json --fail-level error
- name: Upload report
uses: actions/upload-artifact@v4
with:
name: docstring-report
path: report.json
Architecture
DocOctopy is built with a modular, extensible architecture:
dococtopy/
โโโ cli/ # Command-line interface
โโโ core/ # Core engine, discovery, caching
โโโ adapters/ # Language-specific adapters
โโโ rules/ # Compliance rules and registry
โโโ remediation/ # LLM-powered fixing
โโโ reporters/ # Output formatters
Key Components
- Discovery Engine: Finds files using gitignore-style patterns
- Language Adapters: Parse code and extract symbols/docstrings
- Rule Engine: Applies compliance rules with configurable severity
- Remediation Engine: Uses DSPy for structured LLM interactions
- Caching System: Incremental scanning with fingerprint-based invalidation
- Interactive Reviewer: Handles interactive fix workflows with diff preview
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
git clone https://github.com/CrazyBonze/DocOctopy.git
cd DocOctopy
uv sync --group dev
uv run pytest
Development Workflow
We use pre-commit hooks to ensure code quality and prevent CI failures:
# Install pre-commit hooks (one-time setup)
uv run task pre-commit:install
# Run pre-commit checks manually
uv run task pre-commit:run
# Or use the convenience script
./scripts/pre-commit.sh
Pre-commit checks include:
- Black: Code formatting
- isort: Import sorting
- MyPy: Type checking
- Pytest: Fast test suite
Available tasks:
uv run task format # Format code
uv run task lint # Run linting
uv run task test:fast # Run fast tests
uv run task test:cov # Run tests with coverage
uv run task ci # Run full CI pipeline
Version Management
DocOctopy uses a centralized version management system to prevent version inconsistencies across the codebase.
Quick Commands
# Show current version
uv run task version:show
# Bump versions
uv run task version:bump:patch # Bug fixes (0.2.1 โ 0.2.2)
uv run task version:bump:minor # New features (0.2.1 โ 0.3.0)
uv run task version:bump:major # Breaking changes (0.2.1 โ 1.0.0)
# Set specific version
uv run task version:set 1.0.0
How It Works
- Single Source of Truth: Version is defined only in
src/dococtopy/_version.py - Automatic Updates: All components import version dynamically from
dococtopy.__version__ - Test Integration: Version tests are automatically updated when version changes
- Build Integration: Hatchling reads version from the centralized file
Benefits
- โ No Version Drift: Eliminates inconsistencies across files
- โ Automated Updates: All dependent files updated automatically
- โ Easy Management: Simple CLI commands for version operations
- โ Build Integration: Works seamlessly with packaging system
Adding New Rules
- Create rule class in
src/dococtopy/rules/ - Implement
check()method - Register with
register()function - Add tests in
tests/unit/
Adding New Languages
- Implement
LanguageAdapterinterface - Create symbol extraction logic
- Add language-specific rules
- Update discovery patterns
Roadmap
MVP (Current)
- โ Python docstring compliance checking
- โ Google-style validation rules
- โ LLM-powered remediation
- โ Multiple output formats
- โ Configuration system
- โ Caching and incremental scanning
- โ Interactive fix workflows
- โ File writing capabilities
- โ Advanced Google style rules (DG211-DG214)
- โ Context-specific rules (DG401-DG403)
- โ Content quality rules (DG303-DG304)
- โ Recommendation rules (DG215-DG216)
- โ Sophisticated complexity detection with cyclomatic complexity
- โ Multi-style docstring support (Google, Sphinx, NumPy, Facebook)
- โ Comprehensive test suite (811 tests, 79% coverage)
- โ Enhanced AST utilities with full parameter support
V1 (Next)
- ๐ GitHub Action and pre-commit hooks
- ๐ Playground UI for prompt experimentation
- ๐ Additional Python rules (coverage thresholds, etc.)
- ๐ Batch processing for large codebases
Future
- ๐ JavaScript/TypeScript support
- ๐ Go documentation checking
- ๐ Rust documentation checking
- ๐ Language server integration
- ๐ Advanced prompt optimization
License
MIT License - see LICENSE file for details.
Acknowledgments
- Built with DSPy for reliable LLM interactions
- Uses docstring-parser for Google-style parsing
- Powered by Typer for CLI interface
- Styled with Rich for beautiful output
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dococtopy-0.2.1.tar.gz.
File metadata
- Download URL: dococtopy-0.2.1.tar.gz
- Upload date:
- Size: 351.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b99ff9344bdf3006ee37ee84a99d1c16fd2b10b31f816f98ae6ef9deffdaf6c3
|
|
| MD5 |
6bf68eee5381a8721684f4748094a31c
|
|
| BLAKE2b-256 |
477717236871bfd539d1e1c2695c366476a5085e33991ade9b7139b33493be43
|
Provenance
The following attestation bundles were made for dococtopy-0.2.1.tar.gz:
Publisher:
publish.yml on CrazyBonze/DocOctopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dococtopy-0.2.1.tar.gz -
Subject digest:
b99ff9344bdf3006ee37ee84a99d1c16fd2b10b31f816f98ae6ef9deffdaf6c3 - Sigstore transparency entry: 556999846
- Sigstore integration time:
-
Permalink:
CrazyBonze/DocOctopy@6073f4fb2ee1b8d38a29180f0f5a70e8dea105b3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/CrazyBonze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6073f4fb2ee1b8d38a29180f0f5a70e8dea105b3 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dococtopy-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dococtopy-0.2.1-py3-none-any.whl
- Upload date:
- Size: 70.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d56a502702492ce41f8c874f1b84ea0e11c11317864df0c34bd1b158a1e33199
|
|
| MD5 |
46761e1609e932f520a6b2fb2ec8803f
|
|
| BLAKE2b-256 |
ec82c1e10301e1ea395e85179ef7473b85a2c31328ce73df0a2d655f48866b7f
|
Provenance
The following attestation bundles were made for dococtopy-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on CrazyBonze/DocOctopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dococtopy-0.2.1-py3-none-any.whl -
Subject digest:
d56a502702492ce41f8c874f1b84ea0e11c11317864df0c34bd1b158a1e33199 - Sigstore transparency entry: 556999861
- Sigstore integration time:
-
Permalink:
CrazyBonze/DocOctopy@6073f4fb2ee1b8d38a29180f0f5a70e8dea105b3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/CrazyBonze
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6073f4fb2ee1b8d38a29180f0f5a70e8dea105b3 -
Trigger Event:
workflow_dispatch
-
Statement type: