A language-agnostic docstyle compliance & remediation tool

These details have not been verified by PyPI

Project description

DocOctopy

A language-agnostic docstyle compliance & remediation tool that scans code for docstring/docblock presence and style, reports findings, and can auto-propose LLM-based fixes.

Features

🔍 Comprehensive Scanning

Python-first with extensible architecture for other languages
Google-style docstring validation with detailed compliance checking
Context-specific rules for test functions, public APIs, and exception handling
AST-based analysis for accurate symbol and signature detection
Smart caching with incremental scanning for large codebases
Sophisticated complexity detection using cyclomatic complexity metrics
Multi-style docstring support (Google, Sphinx, NumPy, Facebook)
Content quality validation (TODO/placeholders, conflict markers, delimiter style)

📊 Multiple Output Formats

Pretty console output with Rich formatting
JSON reports for CI/CD integration
SARIF format for GitHub Code Scanning
Configurable exit codes based on severity levels

🤖 LLM-Powered Remediation

Automatic docstring generation for missing documentation
Smart fixing of non-compliant docstrings
Enhancement of existing docstrings with missing elements
DSPy integration for reliable, structured LLM interactions
Interactive review mode with diff preview and approval workflow
Multiple LLM providers (OpenAI, Anthropic, Ollama)

⚙️ Flexible Configuration

pyproject.toml integration with rule enable/disable switches
Per-path overrides for different project sections
Gitignore-style exclusions with pathspec support
Rule severity customization (error, warning, info, off)

Installation

Basic Installation

pip install dococtopy

With LLM Support

pip install dococtopy[llm]

Note for uv users: uv doesn't support the [extra] syntax yet. Use uv pip instead:

This works with uv

uv pip install dococtopy[llm]

This also works with uv (if you have a dev group in pyproject.toml)

uv pip install --group dev dococtopy[llm]

This doesn't work with uv (known limitation)

uv add dococtopy[llm] --dev # Won't install DSPy correctly


### Development Installation

```bash
git clone https://github.com/CrazyBonze/DocOctopy.git
cd DocOctopy
uv sync --group dev

Development Scripts

The scripts/ directory contains utility scripts for development and testing:

comprehensive_compare_models.py - Compare LLM models for docstring generation quality and cost
pre-commit.sh - Pre-commit hook for formatting and linting
publish.sh - Publishing script for PyPI releases
See scripts/README.md for detailed usage instructions

Quick Start

1. Scan Your Code

# Scan current directory
dococtopy scan .

# Scan specific paths
dococtopy scan src/ tests/

# Get JSON output
dococtopy scan . --format json --output-file report.json

# Use SARIF for GitHub Code Scanning
dococtopy scan . --format sarif --output-file report.sarif

2. Fix Issues with LLM Assistance

# Dry-run mode (safe, shows what would be fixed)
dococtopy fix . --dry-run

# Interactive mode (review each change)
dococtopy fix . --interactive

# Fix specific rules only
dococtopy fix . --rule DG101,DG202 --dry-run

# Use different LLM provider
dococtopy fix . --llm-provider anthropic --llm-model claude-haiku-3.5

# Use local Ollama server
dococtopy fix . --llm-provider ollama --llm-model codeqwen:latest --llm-base-url http://localhost:11434

3. Configure Your Project

Create a pyproject.toml file:

[tool.docguard]
exclude = ["**/.venv/**", "**/build/**", "**/node_modules/**"]

[tool.docguard.rules]
DG101 = "error"    # Missing docstrings
DG201 = "error"    # Google style parse errors
DG202 = "error"    # Missing parameters
DG203 = "error"    # Extra parameters
DG204 = "warning"  # Returns section issues
DG205 = "info"     # Raises validation
DG301 = "warning"  # Summary style
DG302 = "warning"  # Blank line after summary
DG211 = "info"     # Yields section validation
DG212 = "info"     # Attributes section validation
DG213 = "info"     # Examples section validation
DG214 = "info"     # Note section validation
DG215 = "info"     # Private method docstring recommendation
DG216 = "info"     # Dunder method docstring recommendation
DG303 = "warning"  # Content quality validation
DG304 = "info"     # Docstring delimiter style
DG401 = "warning"  # Test function docstring style
DG402 = "warning"  # Public API function documentation
DG403 = "warning"  # Exception documentation completeness

# Per-path overrides
[[tool.docguard.overrides]]
patterns = ["tests/**"]
rules.DG101 = "off"  # Disable missing docstrings in tests

LLM Setup

Option A: Local Ollama (Recommended for Development)

Install Ollama: Download from ollama.ai

Pull a model:

ollama pull codeqwen:latest
# or
ollama pull llama3.1:8b

Test DocOctopy:

dococtopy fix . --llm-provider ollama --llm-model codeqwen:latest --llm-base-url http://localhost:11434 --dry-run

Option B: OpenAI

Get API key: OpenAI API Keys
Set environment variable:
```
export OPENAI_API_KEY="your-api-key"
```

Test DocOctopy:

dococtopy fix . --llm-provider openai --llm-model gpt-5-nano --dry-run

Option C: Anthropic

Get API key: Anthropic Console

Set environment variable:

export ANTHROPIC_API_KEY="your-api-key"

Test DocOctopy:

# Use best Anthropic model (claude-haiku-3.5) - Highest quality score
dococtopy fix . --llm-provider anthropic --llm-model claude-haiku-3.5 --dry-run

# Or use budget Anthropic option (claude-haiku-3) - Best value
dococtopy fix . --llm-provider anthropic --llm-model claude-haiku-3 --dry-run

Rules Reference

Basic Compliance Rules

DG101: Missing docstring (functions and classes)
DG301: Summary first line should end with period
DG302: Blank line required after summary

Google Style Validation Rules

DG201: Google style docstring parse error
DG202: Parameter missing from docstring
DG203: Extra parameter in docstring
DG204: Returns section missing or mismatched
DG205: Raises section validation
DG206: Args section format validation
DG207: Returns section format validation
DG208: Raises section format validation
DG209: Summary length validation
DG210: Docstring indentation consistency

Advanced Google Style Rules

DG211: Generator functions should have Yields section
DG212: Classes with public attributes should have Attributes section
DG213: Complex functions should have Examples section
DG214: Functions with special behavior should have Note section

Content Quality Rules

DG303: Detect TODO/placeholder content and version control conflict markers
DG304: Detect single quote delimiters vs double quotes (style consistency)

Recommendation Rules

DG215: Private methods should have docstrings (recommendation)
DG216: Standard dunder methods should have docstrings (recommendation)

Context-Specific Rules

DG401: Test functions should have descriptive docstrings explaining what they test
DG402: Public API functions should have comprehensive documentation (Args, Returns, Raises sections)
DG403: Functions should document all exceptions they raise in the Raises section

Context-Specific Rules Details

DG401: Test Function Docstring Style

Ensures test functions have descriptive docstrings that explain what they're testing, improving test readability and debugging.

Examples:

# ✅ Good - Descriptive test docstring
def test_user_authentication_works_correctly():
    """Test that user authentication validates credentials properly."""
    # Test implementation...

# ❌ Bad - Generic or non-descriptive
def test_user_login():
    """Test."""  # Too short and generic
    # Test implementation...

def test_something():
    """Test function."""  # Generic pattern
    # Test implementation...

What it checks:

Functions starting with test_ or containing "test" in the name
Docstrings must be descriptive (>20 characters)
Avoids generic patterns like "Test", "Test function", etc.

DG402: Public API Function Documentation

Requires public API functions to have comprehensive documentation with Args, Returns, and Raises sections.

Examples:

# ✅ Good - Complete public API documentation
def process_data(data, options=None):
    """Process the input data according to the given options.
    
    Args:
        data: The input data to process
        options: Optional configuration options
        
    Returns:
        Processed data result
        
    Raises:
        ValueError: If data format is invalid
    """
    # Implementation...

# ❌ Bad - Missing required sections
def process_data(data, options=None):
    """Process the input data."""  # Missing Args, Returns, Raises
    # Implementation...

What it checks:

Public functions (non-private, non-test, non-dunder)
Requires Args, Returns, and Raises sections
Skips internal/helper functions automatically

DG403: Exception Documentation Completeness

Ensures functions document all exceptions they raise in the Raises section.

Examples:

# ✅ Good - All exceptions documented
def risky_function():
    """Do something risky.
    
    Raises:
        ValueError: If input is invalid
        RuntimeError: If operation fails
    """
    raise ValueError("test")
    raise RuntimeError("test")

# ❌ Bad - Undocumented exceptions
def risky_function():
    """Do something risky."""
    raise ValueError("test")  # Not documented
    raise RuntimeError("test")  # Not documented

What it checks:

Uses AST analysis to detect raise statements
Parses docstring Raises sections
Flags undocumented exceptions with specific names

Interactive Fix Mode

DocOctopy includes an interactive mode that lets you review and approve each proposed change:

dococtopy fix . --interactive

Interactive Features

Diff preview: See exactly what will be changed
Change-by-change review: Accept or reject each fix individually
Rich formatting: Beautiful console output with colors
Summary statistics: Track approved vs rejected changes

Example Interactive Session

Found 3 changes for src/main.py

Change: process_data (function)
Issues: DG101
Proposed docstring:
    """Process the input data and return results.

    Args:
        data: The input data to process.
        options: Processing options.

    Returns:
        Processed data results.
    """
Show diff? [Y/n]: y
--- Original
+++ Proposed
@@ -15,6 +15,15 @@
 def process_data(data, options):
+    """Process the input data and return results.
+
+    Args:
+        data: The input data to process.
+        options: Processing options.
+
+    Returns:
+        Processed data results.
+    """
     result = []
     for item in data:
         result.append(transform(item, options))
Apply this change? [Y/n]: y
✓ Applied change for process_data

Summary:
- Total changes: 3
- Applied: 1
- Rejected: 1
- Skipped: 1

Recommended Models

DocOctopy supports multiple LLM providers with different models optimized for docstring generation. Based on comprehensive testing with real-world code:

🏆 OpenAI Models (Recommended)

Model	Cost (per 1M tokens)	Quality Score	Quality per Dollar	Best For
gpt-5-nano	$0.45	39/50	39,796	✅ Default choice - Best value
gpt-5-mini	$2.25	41/50	8,367	✅ Premium choice - Enterprise quality
gpt-4.1-mini	$2.00	41/50	9,491	Alternative option
gpt-4.1-nano	$0.50	46/50	42,593	Budget option

Key Findings:

gpt-5-nano: 5x cheaper than GPT-5-mini with 95% of the quality - exceptional value
gpt-5-mini: Comprehensive documentation with detailed business logic and examples
gpt-4.1-mini: Solid alternative with good quality but higher cost than GPT-5-nano
gpt-4.1-nano: Budget option with good quality-per-dollar ratio

🤖 Anthropic Models

Model	Cost (per 1M tokens)	Quality Score	Quality per Dollar	Best For
claude-haiku-3.5	$0.25	67/50	6,442	✅ Best Anthropic - Highest quality
claude-sonnet-4	$3.00	41/50	1,051	High performance option
claude-haiku-3	$0.25	41/50	12,615	Budget Anthropic option
claude-opus-4.1	$15.00	41/50	210	Premium option (expensive)

Anthropic Highlights:

claude-haiku-3.5: Highest quality score (67/50) with excellent cost efficiency
claude-haiku-3: Best quality-per-dollar ratio among Anthropic models
All Anthropic models provide reliable, consistent docstring generation

💡 Model Selection Guide

Use Case	Recommended Model	Reason
Development	gpt-5-nano	Best value, reliable quality
Testing/CI	gpt-5-nano	Cost-effective for automated runs
Production	gpt-5-mini	Maximum quality for end users
Enterprise	gpt-5-mini	Comprehensive documentation
Budget-Conscious	gpt-5-nano	Excellent quality at low cost
Privacy-First	Ollama codeqwen	Local processing
Anthropic Preference	claude-haiku-3.5	Highest quality score
Free Tier	claude-haiku-3	Good quality-per-dollar ratio

🚀 Quick Start Commands

# Use default (gpt-5-nano) - Best value
dococtopy fix . --rule DG101

# Use premium (gpt-5-mini) - Maximum quality
dococtopy fix . --rule DG101 --llm-model gpt-5-mini

# Use best Anthropic (claude-haiku-3.5) - Highest quality score
dococtopy fix . --rule DG101 --llm-provider anthropic --llm-model claude-haiku-3.5

# Use local (Ollama) - Privacy-first
dococtopy fix . --rule DG101 --llm-provider ollama --llm-model codeqwen:latest

📊 See detailed comparison results: docs/model-comparison/ - Compare actual generated docstrings side-by-side
📋 Quick summary: docs/model-comparison/SUMMARY.md - Decision matrix and recommendations

CLI Reference

`dococtopy scan`

Scan paths for documentation compliance issues.

dococtopy scan [PATHS...] [OPTIONS]

Options:
  --format {pretty,json,sarif,both}  Output format [default: pretty]
  --config PATH                      Config file path [default: pyproject.toml]
  --fail-level {error,warning,info}  Exit code threshold [default: error]
  --no-cache                        Disable caching
  --changed-only                    Only scan changed files
  --stats                           Show cache statistics
  --output-file PATH                Write output to file

`dococtopy fix`

Fix documentation issues using LLM assistance.

dococtopy fix [PATHS...] [OPTIONS]

Options:
  --dry-run                         Show changes without applying [default: False]
  --interactive                     Accept/reject each fix interactively
  --rule TEXT                       Comma-separated rule IDs to fix
  --max-changes INTEGER             Maximum number of changes
  --llm-provider {openai,anthropic,ollama}  LLM provider [default: openai]
  --llm-model TEXT                  LLM model to use [default: gpt-5-nano]
  --llm-base-url TEXT               Base URL for LLM provider (for Ollama, etc.)
  --config PATH                     Config file path

CI/CD Integration

Create .github/workflows/docstring-check.yml:

name: Docstring Compliance
on: [push, pull_request]

jobs:
  docstring-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install dococtopy
      - run: dococtopy scan . --format json --output-file report.json --fail-level error
      - name: Upload report
        uses: actions/upload-artifact@v4
        with:
          name: docstring-report
          path: report.json

Architecture

DocOctopy is built with a modular, extensible architecture:

dococtopy/
├── cli/           # Command-line interface
├── core/          # Core engine, discovery, caching
├── adapters/      # Language-specific adapters
├── rules/         # Compliance rules and registry
├── remediation/   # LLM-powered fixing
└── reporters/     # Output formatters

Key Components

Discovery Engine: Finds files using gitignore-style patterns
Language Adapters: Parse code and extract symbols/docstrings
Rule Engine: Applies compliance rules with configurable severity
Remediation Engine: Uses DSPy for structured LLM interactions
Caching System: Incremental scanning with fingerprint-based invalidation
Interactive Reviewer: Handles interactive fix workflows with diff preview

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/CrazyBonze/DocOctopy.git
cd DocOctopy
uv sync --group dev
uv run pytest

Development Workflow

We use pre-commit hooks to ensure code quality and prevent CI failures:

# Install pre-commit hooks (one-time setup)
uv run task pre-commit:install

# Run pre-commit checks manually
uv run task pre-commit:run

# Or use the convenience script
./scripts/pre-commit.sh

Pre-commit checks include:

Black: Code formatting
isort: Import sorting
MyPy: Type checking
Pytest: Fast test suite

Available tasks:

uv run task format          # Format code
uv run task lint            # Run linting
uv run task test:fast       # Run fast tests
uv run task test:cov        # Run tests with coverage
uv run task ci              # Run full CI pipeline

Adding New Rules

Create rule class in src/dococtopy/rules/
Implement check() method
Register with register() function
Add tests in tests/unit/

Adding New Languages

Implement LanguageAdapter interface
Create symbol extraction logic
Add language-specific rules
Update discovery patterns

Roadmap

MVP (Current)

✅ Python docstring compliance checking
✅ Google-style validation rules
✅ LLM-powered remediation
✅ Multiple output formats
✅ Configuration system
✅ Caching and incremental scanning
✅ Interactive fix workflows
✅ File writing capabilities
✅ Advanced Google style rules (DG211-DG214)
✅ Context-specific rules (DG401-DG403)
✅ Content quality rules (DG303-DG304)
✅ Recommendation rules (DG215-DG216)
✅ Sophisticated complexity detection with cyclomatic complexity
✅ Multi-style docstring support (Google, Sphinx, NumPy, Facebook)
✅ Comprehensive test suite (811 tests, 79% coverage)
✅ Enhanced AST utilities with full parameter support

V1 (Next)

🔄 GitHub Action and pre-commit hooks
🔄 Playground UI for prompt experimentation
🔄 Additional Python rules (coverage thresholds, etc.)
🔄 Batch processing for large codebases

Future

📋 JavaScript/TypeScript support
📋 Go documentation checking
📋 Rust documentation checking
📋 Language server integration
📋 Advanced prompt optimization

License

MIT License - see LICENSE file for details.

Acknowledgments

Built with DSPy for reliable LLM interactions
Uses docstring-parser for Google-style parsing
Powered by Typer for CLI interface
Styled with Rich for beautiful output

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Sep 24, 2025

This version

0.2.0

Sep 24, 2025

0.1.4

Sep 23, 2025

0.1.3

Sep 23, 2025

0.1.2

Sep 18, 2025

0.1.1

Sep 16, 2025

0.1.0

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dococtopy-0.2.0.tar.gz (129.5 kB view details)

Uploaded Sep 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dococtopy-0.2.0-py3-none-any.whl (66.0 kB view details)

Uploaded Sep 24, 2025 Python 3

File details

Details for the file dococtopy-0.2.0.tar.gz.

File metadata

Download URL: dococtopy-0.2.0.tar.gz
Upload date: Sep 24, 2025
Size: 129.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dococtopy-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e25f90d81c308a2832d89d2c6e7760d0780d625d55e295e382f0d67cedfc0ddb`
MD5	`0ec04427ff47f143ca9417fd3b02c16f`
BLAKE2b-256	`ee08630faaef11217c8e1ed70d5a8875418e2a4c08b4d6ff1865403d8e6eac0b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dococtopy-0.2.0.tar.gz:

Publisher: publish.yml on CrazyBonze/DocOctopy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dococtopy-0.2.0.tar.gz
- Subject digest: e25f90d81c308a2832d89d2c6e7760d0780d625d55e295e382f0d67cedfc0ddb
- Sigstore transparency entry: 556536599
- Sigstore integration time: Sep 24, 2025
Source repository:
- Permalink: CrazyBonze/DocOctopy@fc6e5a8774c3e18166abb093ac44a8cf04c0bb3f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/CrazyBonze
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fc6e5a8774c3e18166abb093ac44a8cf04c0bb3f
- Trigger Event: workflow_dispatch

File details

Details for the file dococtopy-0.2.0-py3-none-any.whl.

File metadata

Download URL: dococtopy-0.2.0-py3-none-any.whl
Upload date: Sep 24, 2025
Size: 66.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dococtopy-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8dd02920c36c469973c31b9d101ac7fe693c8872660149b2ddfaf8b47c0f22e0`
MD5	`79c4ca83967492155e1ba4fc7ef14ceb`
BLAKE2b-256	`467f0b8d92a19e14a35cd29fb423bc7fd6d5e791f87ac72401e1daa3ff63220a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dococtopy-0.2.0-py3-none-any.whl:

Publisher: publish.yml on CrazyBonze/DocOctopy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dococtopy-0.2.0-py3-none-any.whl
- Subject digest: 8dd02920c36c469973c31b9d101ac7fe693c8872660149b2ddfaf8b47c0f22e0
- Sigstore transparency entry: 556536611
- Sigstore integration time: Sep 24, 2025
Source repository:
- Permalink: CrazyBonze/DocOctopy@fc6e5a8774c3e18166abb093ac44a8cf04c0bb3f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/CrazyBonze
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fc6e5a8774c3e18166abb093ac44a8cf04c0bb3f
- Trigger Event: workflow_dispatch

dococtopy 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DocOctopy

Features

🔍 Comprehensive Scanning

📊 Multiple Output Formats

🤖 LLM-Powered Remediation

⚙️ Flexible Configuration

Installation

Basic Installation

With LLM Support

This works with uv

This also works with uv (if you have a dev group in pyproject.toml)

This doesn't work with uv (known limitation)

Development Scripts

Quick Start

1. Scan Your Code

2. Fix Issues with LLM Assistance

3. Configure Your Project

LLM Setup

Option A: Local Ollama (Recommended for Development)

Option B: OpenAI

Option C: Anthropic

Rules Reference

Basic Compliance Rules

Google Style Validation Rules

Advanced Google Style Rules

Content Quality Rules

Recommendation Rules

Context-Specific Rules

Context-Specific Rules Details

DG401: Test Function Docstring Style

DG402: Public API Function Documentation

DG403: Exception Documentation Completeness

Interactive Fix Mode

Interactive Features

Example Interactive Session

Recommended Models

🏆 OpenAI Models (Recommended)

🤖 Anthropic Models

💡 Model Selection Guide

🚀 Quick Start Commands

CLI Reference

dococtopy scan

dococtopy fix

CI/CD Integration

Architecture

Key Components

Contributing

Development Setup

Development Workflow

Adding New Rules

Adding New Languages

Roadmap

MVP (Current)

V1 (Next)

Future

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

`dococtopy scan`

`dococtopy fix`