A tool for summarizing documents and code using AI
Project description
DocDog
Overview
DocDog is an AI-powered tool that automatically generates comprehensive README documentation for software projects. By analyzing the project's source code, configuration files, and existing documentation, DocDog can create a well-structured README file covering installation, usage, API documentation, examples, and more.
The tool aims to streamline the documentation process for developers, saving time and effort while ensuring accurate and up-to-date documentation that reflects the project's current state. With DocDog, you can focus on writing code while keeping your project's documentation in sync.
Features
- Automatic README Generation: DocDog analyzes your project's codebase, configuration files, and existing documentation to generate a comprehensive README file.
- Structured Documentation: The generated README follows a standardized structure, including sections for installation, usage, API documentation, examples, troubleshooting, and more.
- Code Analysis: DocDog examines your code to extract relevant information, such as function signatures, docstrings, and code comments, to include in the documentation.
- Configuration Options: Customize the documentation generation process by specifying configuration options, such as allowed file extensions, output directory, and more.
- Parallel Processing: Leverage parallel processing for efficient chunking and analysis of large codebases.
- Template Support: Use built-in or custom templates to control the structure and formatting of the generated README.
- Reasoning Documentation: Optionally include the reasoning behind the generated content in a separate file (
reasoning.md) for transparency and understanding the AI's decision-making process.
Installation
# Clone the repository
git clone https://github.com/duriantaco/docdog.git
cd docdog
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install DocDog
pip install .
Quick Start Guide
To generate a README for your project, navigate to your project's root directory and run:
docdog
This will analyze your project's files and generate a README.md file in the current directory.
Usage
usage: docdog [-h] [-o OUTPUT] [-m MODEL] [--reasoning] [-p PROMPT_TEMPLATE] [--max-iterations MAX_ITERATIONS] [--workers WORKERS] [--cache-size CACHE_SIZE]
DocDog - AI Document & Code Summarizer
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output file path for the generated README (default: README.md)
-m MODEL, --model MODEL
AI model to use for documentation generation (default: gpt-4o-mini)
--reasoning Include reasoning behind the generated content
-p PROMPT_TEMPLATE, --prompt-template PROMPT_TEMPLATE
Path to a custom prompt template file
--max-iterations MAX_ITERATIONS
Maximum number of iterations for the AI model (default: 15)
--workers WORKERS, -w WORKERS
Number of worker threads (default: auto)
--cache-size CACHE_SIZE
Size of the LRU cache (default: 128)
API Documentation
MCPTools
The MCPTools class provides a set of tools for interacting with the project's codebase, such as listing files, reading file contents, and batch reading multiple files. The class supports caching for improved performance and parallel processing for batch operations.
__init__(project_root, max_workers=None, cache_size=128)
Initializes the MCPTools instance.
project_root(str): The root directory of the project.max_workers(int, optional): The maximum number of worker threads for parallel processing. IfNone, the number of workers is determined automatically.cache_size(int, optional): The size of the LRU cache for caching file reads and listings. Default is 128.
list_files(directory)
Lists files in the specified directory within the project root, excluding ignored patterns.
directory(str): The directory path relative to the project root.- Returns: A string containing the list of files, with one file path per line.
read_file(file_path)
Reads the content of a file within the project root.
file_path(str): The file path relative to the project root.- Returns: A string containing the file content. For Python files, it includes the content, docstrings, and comments.
batch_read_files(file_paths)
Reads the contents of multiple files within the project root in parallel.
file_paths(list): A list of file paths relative to the project root.- Returns: A JSON string containing a list of dictionaries, where each dictionary represents a file with its content or error message.
handle_tool_call(tool_name, tool_input)
Handles tool calls from the AI assistant, dispatching to the appropriate tool based on the tool_name.
tool_name(str): The name of the tool to execute.tool_input(dict): The input parameters for the tool.- Returns: The result of the tool execution.
Chunking
The chunking module provides functionality for splitting the project's files into chunks for efficient processing by the AI assistant.
chunk_project(project_root, output_dir="chunks", config=None)
Chunks the project's files into smaller files, splitting them based on token count or in parallel.
project_root(str): The root directory of the project.output_dir(str, optional): The directory to store the chunked files. Default is "chunks".config(dict, optional): A configuration dictionary containing chunking options. IfNone, default options are used.- Returns: A list of file paths for the created chunk files.
Other Modules
sanitize_prompt: A utility function for sanitizing prompts to prevent Unicode obfuscation and prompt injection attacks.templates: Contains template files for the initial prompt, validation prompt, and reasoning instructions.
Configuration
DocDog can be configured using environment variables, command-line arguments, and a configuration file.
Environment Variables
ANTHROPIC_API_KEY: Your Anthropic API key. Required for DocDog to function.
Command-line Arguments
--output: Specify the output file path for the generated README (default:README.md).--model: Set the AI model to use for documentation generation (default:gpt-4o-mini).--reasoning: Include the reasoning behind the generated content in a separate file (reasoning.md).--prompt-template: Path to a custom prompt template file.--max-iterations: Set the maximum number of iterations for the AI model (default: 15).--workers: Specify the number of worker threads for parallel processing (default: auto-detected).--cache-size: Set the size of the LRU cache for caching file reads and listings (default: 128).
Configuration File
DocDog supports a configuration file (config.json) for additional settings. The default configuration is:
{
"num_chunks": 5,
"model": "gpt-4o-mini",
"max_tokens": 5000,
"temperature": 0.7,
"verbose": false,
"allowed_extensions": [
".txt", ".md", ".py", ".pdf", ".sh", ".json", ".yaml", ".ipynb",
".js", ".tsx", ".ts", "jsx", ".html", ".css", ".csv", ".xml",
".yml", ".sql", ".java", ".php", ".rb", ".c", ".cpp", ".h",
".hpp", ".cs", ".go", ".rs", ".swift", ".kt", ".m", ".pl",
".r", ".lua", ".sh", ".bash", ".zsh", ".ps1", ".psm1", ".psd1",
".ps1xml", ".pssc", ".psc1", ".pssc", ".pss1", ".pssm", ".pssc", ".pss"
]
}
You can create a config.json file in your project's root directory to override these settings.
Examples and Use Cases
Basic Usage
docdog
This will generate a README.md file in the current directory, analyzing all files in the project with the default configuration.
Specifying an Output File
docdog --output docs/PROJECT_README.md
This will generate the README file as docs/PROJECT_README.md instead of the default README.md.
Including Reasoning
docdog --reasoning
This will generate a reasoning.md file alongside the README.md, explaining the reasoning behind the generated content.
Using a Custom Prompt Template
docdog --prompt-template custom_prompt.txt
This will use the custom_prompt.txt file as the prompt template for the AI model, allowing you to customize the structure and content of the generated README.
Adjusting Configuration
You can create a config.json file in your project's root directory to adjust settings like the number of chunks, AI model, temperature, and allowed file extensions.
{
"num_chunks": 10,
"model": "gpt-4",
"max_tokens": 6000,
"temperature": 0.8,
"allowed_extensions": [".py", ".md", ".txt", ".js"]
}
This configuration will create 10 chunks, use the gpt-4 model with a temperature of 0.8, limit the token count to 6000, and only analyze Python, Markdown, text, and JavaScript files.
Troubleshooting/FAQ
Error: ANTHROPIC_API_KEY not found in environment variables
Make sure you have set the ANTHROPIC_API_KEY environment variable with your valid Anthropic API key. You can set it temporarily in your shell session or add it to your shell configuration file (e.g., .bashrc, .zshrc).
export ANTHROPIC_API_KEY=your_api_key_here
Incomplete or Missing Information
If the generated README is missing important information or sections, it's likely due to the tool being unable to find relevant information in your project's files. Double-check that your source code and configuration files are up-to-date and well-documented (e.g., using docstrings, comments, and descriptive variable/function names).
Unsatisfactory README Quality
If the generated README quality is not satisfactory, you can try the following:
- Increase the
max_iterationsoption to allow the AI model more iterations for refining the output. - Use a more capable AI model (e.g.,
gpt-4) by setting the--modeloption. - Adjust the
temperaturesetting in theconfig.jsonfile to control the randomness and creativity of the generated text. - Provide a custom prompt template with more specific instructions tailored to your project.
Contributing
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository. See the CONTRIBUTING.md file for more details.
License
DocDog is released under the Apache 2.0 License.
Generated by DocDog on 2025-03-25
Generated by DocDog on 2025-03-25
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docdog-0.0.2.tar.gz.
File metadata
- Download URL: docdog-0.0.2.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8d6b9f54d29d9463e07b053b0fb5ba77eb5f0cec958171addc9ade13ac13efe
|
|
| MD5 |
fd96adca1a27fb360d0540a9a20a3765
|
|
| BLAKE2b-256 |
1119ab1e2a56f35f4e82ee75f9a644791f328f1394e07048a5852244b53269e6
|
File details
Details for the file docdog-0.0.2-py3-none-any.whl.
File metadata
- Download URL: docdog-0.0.2-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b72dde6a06f25b23df05615ce974d9cbc8adff46a968b81f55fed8f5082504e5
|
|
| MD5 |
73866ffc4b6c25ecfe13502136e94636
|
|
| BLAKE2b-256 |
ee210ca398bb0def556881b2a7b0e4106d7e18bf992fa6de389870f8301dc8c9
|