This Project helps you to create docs for your projects
Project description
Project Overview: Auto Doc Generator
Project Title
Auto Doc Generator (ADG)
Project Goal
The Auto Doc Generator (ADG) is a modular and extensible software solution designed to automate the creation of structured, comprehensive, and up-to-date documentation for software projects. By leveraging AI models, modular components, and text processing techniques, ADG analyzes codebases, detects changes, and dynamically generates documentation. This tool is particularly suited for integration into CI/CD pipelines, ensuring that project documentation remains current with minimal manual intervention.
Core Logic & Principles
The Auto Doc Generator employs a layered architecture to ensure modularity, scalability, and maintainability. The system is divided into multiple components, each responsible for a specific aspect of the documentation generation process. Below is an overview of the core logic and principles:
1. Initialization
- The process begins with the
run_file.pyscript, which serves as the main entry point. - The
autodocconfig.ymlconfiguration file is parsed by theconfig_reader.pymodule to initialize project-specific settings, such as ignored files, language preferences, and metadata. - API tokens and environment variables are managed by the
token_auth.pymodule to enable secure communication with external services.
2. Change Detection
- The
check_git_status.pymodule analyzes Git repository changes by comparing the current state with the last commit. If no changes are detected, the process terminates early, saving computational resources.
3. Documentation Generation
- The
Managerclass orchestrates the documentation generation lifecycle, which includes:- Codebase Analysis: The
CodeMixmodule builds a structured representation of the repository's content, filtering out ignored files and directories. - Global Information Generation: The
compressor.pyandspliter.pymodules process global project information, compressing and splitting data into manageable chunks. - Modular Documentation Creation: The
DocFactoryinvokes a sequence of modular components (e.g.,CustomModule,IntroLinks,IntroText) to generate specific sections of the documentation. - Embedding and Sorting: The
embedding.pymodule generates semantic embeddings for document parts, while thesorting.pymodule organizes and reorders content for better readability.
- Codebase Analysis: The
4. Post-Processing
- Generated documentation undergoes enhancement and optimization through modules such as
custom_intro.py,embedding.py, andsorting.py. These modules ensure the documentation is well-structured, concise, and easy to navigate.
5. Publishing
- The final documentation is saved locally in
.auto_doc_cache_file.jsonand uploaded to a remote server using thepost_to_server.pymodule. Logs are generated and saved toagd_report.txtfor debugging and monitoring purposes.
6. Extensibility
- The system is designed for flexibility, allowing developers to add new AI models by extending the
ParentModelclass or integrate new documentation components via theDocFactoryandBaseModuleinterface.
Key Features
- Automated Documentation Updates: Automatically detects changes in the codebase and updates documentation accordingly.
- AI-Powered Content Generation: Utilizes advanced AI models (e.g., Azure, GPT-3, GPT-4) to generate high-quality documentation.
- Modular Design: Supports the addition of new documentation components and AI models with minimal effort.
- Preprocessing and Post-Processing: Includes utilities for compressing, splitting, embedding, and organizing documentation content.
- CI/CD Integration: Seamlessly integrates with GitHub Actions for automated documentation generation during the CI/CD pipeline.
- Customizable Configurations: Allows users to define project-specific settings, such as ignored files, language preferences, and metadata, via a YAML configuration file.
- Error Handling and Logging: Provides robust error handling with custom exceptions and detailed logging for debugging and monitoring.
- Semantic Search: Adds embedding layers to documentation for enhanced search capabilities.
Dependencies
To run the Auto Doc Generator, the following libraries and tools are required:
-
Programming Language:
- Python (version 3.8 or higher)
-
Libraries:
- AI model libraries (e.g., Azure, OpenAI, Groq APIs)
- YAML parsing library (e.g.,
PyYAML) - Git interaction library (e.g.,
GitPython) - Logging utilities (e.g.,
loggingmodule) - JSON handling library (e.g.,
jsonmodule) - Subprocess module for shell command execution
-
External Tools:
- Git: For change detection and repository analysis.
- GitHub Actions: For CI/CD pipeline integration.
-
Configuration Files:
autodocconfig.yml: Defines project-specific settings..auto_doc_cache_file.json: Stores cached documentation data.
-
Environment Variables:
ADG_API_TOKEN: API key for authenticating with external services.DEFAULT_SERVER_URL: Endpoint for uploading generated documentation.GITHUB_EVENT_NAME: GitHub event that triggers the workflow.
Conclusion
The Auto Doc Generator is a cutting-edge tool that streamlines the documentation process for software projects. By leveraging AI-driven content generation, modular architecture, and CI/CD integration, ADG ensures that project documentation remains accurate, up-to-date, and easy to navigate. Its extensible design and robust error-handling mechanisms make it a reliable and scalable solution for modern software development teams.
Executive Navigation Tree
📂 Repository Structure
- Repo Structure
- Codemix Build Repo Content
- Content Description
- Pyproject.toml
- Install Script
- Install Workflow Scripts and API Key Setup
📄 Configuration & Settings
- Autodocconfig Options
- Git Status Check
- Check Git Status Result Schema Class
- Config Reader
- Config Module
- Projectsettings Class
- Cache Settings Class
- Token Auth
⚙️ Logging & Exceptions
📊 Progress Management
🤖 AI Models
- Azure Model
- GPT Model Class
- History Class
- Parentmodel Class
- Model Asyncmodel Classes
- Embedding Class
- Embedding Functions
🔍 Data Processing
- Sorting Functions
- Checker Parse Answer
- Checker Have to Change
- Codemix Should Ignore
- Compressor Functions
- Split Data Function
- Write Docs by Parts Function
- Gen Doc Parts Function
📄 Documentation Factory
🛠️ Modules
📚 Introduction & Postprocessing
🗂️ Manager
Repository Structure Overview
This section outlines the hierarchical structure of the Auto Doc Generator repository, detailing the organization of files and directories that support the CI/CD workflows and the core functionality of the documentation generation system.
Directory Tree
.github/
workflows/
autodoc.yml
main.yml
reuseble_agd.yml
agd_report.txt
autodocconfig.yml
autodocgenerator/
__init__.py
auto_runner/
check_git_status.py
config_reader.py
post_to_server.py
run_file.py
token_auth.py
config/
config.py
engine/
__init__.py
config/
config.py
exceptions.py
models/
azure_model.py
gpt_model.py
model.py
factory/
__init__.py
base_factory.py
modules/
general_modules.py
intro.py
manage.py
postprocessor/
custom_intro.py
embedding.py
sorting.py
preprocessor/
checker.py
code_mix.py
compressor.py
settings.py
spliter.py
schema/
cache_settings.py
doc_schema.py
ui/
__init__.py
logging.py
progress_base.py
install.ps1
install.sh
poetry.lock
pyproject.toml
Key Files and Directories
1. .github/workflows/
- Purpose: Contains GitHub Actions workflows for CI/CD automation and documentation generation.
- Files:
autodoc.yml: Triggers the AutoDoc workflow onpushevents to themainbranch or manual dispatch.main.yml: Handles CI/CD tasks such as dependency installation and publishing the library to PyPI.reuseble_agd.yml: A reusable workflow for generating documentation, posting it to a server, and committing changes.
2. autodocconfig.yml
- Purpose: Configuration file for the Auto Doc Generator system. Defines settings such as ignored files, language preferences, and metadata for documentation generation.
3. autodocgenerator/
- Purpose: Core directory containing the implementation of the Auto Doc Generator system.
- Subdirectories:
auto_runner/: Handles execution workflows, including Git status checks, configuration parsing, API communication, and main entry points.config/: Manages project-specific configurations.engine/: Implements AI models and exception handling for documentation generation.factory/: Provides modular components for generating specific documentation sections.postprocessor/: Enhances and organizes documentation content.preprocessor/: Validates and prepares data for documentation generation.schema/: Defines schemas for caching and documentation structure.ui/: Contains utilities for logging and progress tracking.
4. install.ps1 & install.sh
- Purpose: Scripts for installing dependencies and setting up the environment on Windows (
.ps1) and Unix-based systems (.sh).
5. pyproject.toml & poetry.lock
- Purpose: Configuration files for managing Python dependencies using Poetry.
6. agd_report.txt
- Purpose: Log file generated during the documentation process, containing information about errors, warnings, and model usage.
Functional Flow of Workflows
Autodoc Workflow (autodoc.yml)
- Trigger: Push to
mainbranch or manual dispatch. - Steps:
- Run Job: Executes the reusable workflow defined in
reuseble_agd.yml. - Secrets Management: Utilizes
ADG_API_TOKENfor authentication.
- Run Job: Executes the reusable workflow defined in
Reusable Workflow (reuseble_agd.yml)
- Trigger: Workflow call with required secrets.
- Steps:
- Checkout Code: Fetches repository code with full history.
- Python Setup: Installs Python 3.12 and the
autodocgeneratorpackage. - API Key Retrieval: Runs
token_auth.pyto fetch API keys for external services. - Documentation Generation: Executes
run_file.pyto generate documentation. - Post to Server: Sends generated documentation to a remote server using
post_to_server.py. - Output Handling: Copies generated documentation to
README.mdand logs toagd_report.txt. - Commit and Push: Updates the repository with new documentation and logs.
CI/CD Workflow (main.yml)
- Trigger: Push or pull request to
mainbranch affectingpyproject.toml. - Steps:
- Checkout Code: Fetches repository code.
- Python Setup: Installs Python 3.12 and Poetry.
- Dependency Installation: Installs project dependencies using Poetry.
- Library Publishing: Publishes the library to PyPI using Poetry.
Key Context for Workflow Integration
| Entity | Type | Role | Notes |
|---|---|---|---|
ADG_API_TOKEN |
Secret | API key for server authentication | Required for posting documentation to the server. |
DEFAULT_SERVER_URL |
Environment Var | URL of the documentation server | Used by post_to_server.py. |
pyproject.toml |
File | Defines project dependencies | Triggers CI/CD workflow on changes. |
README.md |
File | Repository documentation | Updated with generated documentation. |
.auto_doc_cache_file.json |
File | Cache file for generated documentation | Contains structured documentation data. |
agd_report.txt |
File | Log file for documentation generation | Contains logs from the documentation generation process. |
Critical Notes
- The repository is structured to support modular development, allowing for easy integration of new features and components.
- GitHub Actions workflows are designed for seamless automation of documentation generation and library publishing.
- The Auto Doc Generator relies heavily on external API keys and AI models for content generation and processing.
Function: build_repo_content
Purpose
Generates a structured representation of the repository's content, including directory structure and file contents.
Technical Logic Flow
- Initializes a list with the header "Repository Structure:".
- Iterates through all files and directories in the root directory:
- Skips paths that match ignore patterns using
should_ignore. - Logs ignored paths using
InfoLog. - Adds directory names with indentation based on depth.
- Adds file names without indentation.
- Skips paths that match ignore patterns using
- Appends a separator line (
=) to the content list. - Iterates through all files:
- Reads file contents unless the file matches ignore patterns.
- Handles exceptions for unreadable files and logs errors.
- Returns the repository structure and file contents as a single string.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
root_dir |
str |
Root Directory Path | Path of the repository to analyze. |
ignore_patterns |
list[str] |
Ignore Patterns | Patterns to exclude from the analysis. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
content |
str |
Repository Content | Structured representation of the repository. |
Critical Notes
- Error Handling: The
build_repo_contentfunction gracefully handles file read errors and logs them in the output.- Ignore Patterns: The
should_ignorefunction usesfnmatchfor pattern matching, ensuring flexibility in specifying ignore rules.- Logging Integration: Ignored paths and errors are logged using the
InfoLogclass for traceability.
Shared Observations
- AI Model Dependency:
- The
have_to_changefunction relies on theModelclass for decision-making based on code changes and global information.
- The
- Structured Output:
- The
build_repo_contentfunction provides a detailed and hierarchical view of the repository, making it suitable for documentation generation.
- The
- Extensibility:
- Sends the prompt to the AI model to generate a description.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
str |
Input Data | Raw documentation data containing HTML links. |
links |
list[str] |
Extracted Links | List of HTML links extracted from the input data. |
model |
Model |
AI Model | Used for generating introductions and descriptions. |
language |
str |
Language Setting | Specifies the language for generated content. |
global_data |
str |
Global Project Info | Contains reusable global documentation data. |
custom_description |
str |
Custom Description Task | User-defined task for generating specific content. |
splited_data |
list[str] |
Split Data Chunks | Smaller chunks of data for iterative processing. |
intro_links |
str |
Generated Link-Based Introduction | Final output of the link-based introduction generation. |
intro |
str |
Generated Global Introduction | Final output of the global introduction generation. |
result |
str |
Generated Custom Description | Final output of the custom description generation process. |
Critical Notes
- Regex-Based Link Extraction: The
get_all_html_linksfunction relies on regex patterns to identify anchor links. Ensure that the input data adheres to the expected format for accurate extraction.- AI Model Dependency: All introduction and description generation functions depend heavily on the
Modelclass for processing prompts and generating content.- Strict Formatting Rules: The
generete_custom_discription_withoutfunction enforces strict rules for content formatting, ensuring consistency and adherence to predefined standards.
Shared Observations
-
Logging Integration:
- All major functions use
BaseLoggerfor logging progress and outputs, improving traceability during execution.
- All major functions use
-
Language Flexibility:
- Functions support multi-language outputs by accepting a
languageparameter, making the module adaptable for diverse documentation needs.
- Functions support multi-language outputs by accepting a
-
Iterative Processing:
- The
generete_custom_discriptionfunction processes data chunks iteratively, ensuring that valid descriptions are generated even if some chunks fail.
- The
-
Predefined Prompts:
- The module relies on predefined instructions (
BASE_INTRODACTION_CREATE_LINKS,BASE_INTRO_CREATE,BASE_CUSTOM_DISCRIPTIONS) for consistent content generation across different tasks.
- The module relies on predefined instructions (
Python Project Configuration: pyproject.toml
Purpose
Defines the metadata, dependencies, and build settings for the Auto Doc Generator project.
Key Sections
1. Project Metadata
| Field | Value | Notes |
|---|---|---|
name |
"autodocgenerator" |
Name of the project. |
version |
"1.6.0.9" |
Current version of the project. |
description |
"This Project helps you to create docs for your projects" |
Brief description of the project. |
authors |
[{name = "dima-on", email = "sinica911@gmail.com"}] |
Author information. |
license |
"MIT" |
License type. |
readme |
"README.md" |
Path to the README file. |
requires-python |
">=3.11,<4.0" |
Python version compatibility. |
2. Dependencies
Specifies the required Python packages for the project. Key dependencies include:
- AI and ML Libraries:
openai,azure-ai-inference,numpy - Data Handling:
pyyaml,fastjsonschema,pydantic - Utilities:
requests,tqdm,rich - Version Control:
dulwich - Caching:
CacheControl
3. Build Settings
| Field | Value | Notes |
|---|---|---|
requires |
["poetry-core>=2.0.0"] |
Specifies the build system requirements. |
build-backend |
"poetry.core.masonry.api" |
Defines the backend for building the project. |
4. Poetry Exclusions
| Field | Value | Notes |
|---|---|---|
exclude |
[".auto_doc_cache_file.json"] |
Excludes specific files from the build process. |
Critical Notes
- Dependency Management: Ensures all required libraries are installed for the project to function correctly.
- Python Compatibility: Restricts the project to Python versions
>=3.11and<4.0.- Build System: Uses Poetry for dependency management and packaging.
Summary
The install.sh script and pyproject.toml file together streamline the setup and configuration of the Auto Doc Generator project. The script automates the creation of essential configuration files, while the pyproject.toml file defines the project's dependencies, metadata, and build settings. This setup ensures a seamless and efficient initialization process for developers.
Bash Script: install.sh
Purpose
Automates the setup of the Auto Doc Generator project by creating required configuration files and directories.
Functionality
-
Directory Creation:
- Ensures the
.github/workflowsdirectory exists before generating the GitHub Actions workflow file.
- Ensures the
-
GitHub Workflow File Generation:
- Creates the
autodoc.ymlfile for GitHub Actions workflow. - Configures the workflow to use the reusable workflow
reuseble_agd.ymlfrom theDrag-GameStudio/ADGrepository. - Includes secret handling for
GROCK_API_KEY.
- Creates the
-
Project Configuration File Generation:
- Generates the
autodocconfig.ymlfile with project-specific settings:- Dynamically sets the project name based on the current folder name.
- Configures language settings, ignored files, build settings, and structure settings.
- Generates the
Key Variables
| Variable | Type | Role | Notes |
|---|---|---|---|
$content |
string |
GitHub Workflow Configuration | Contains the YAML configuration for the workflow. |
$currentFolderName |
string |
Current Folder Name | Dynamically retrieves the name of the current directory. |
$configContent |
string |
Project Configuration Content | Contains the YAML configuration for autodocconfig.yml. |
Generated Files
1. .github/workflows/autodoc.yml
name: AutoDoc
on: [workflow_dispatch]
jobs:
run:
permissions:
contents: write
uses: Drag-GameStudio/ADG/.github/workflows/reuseble_agd.yml@main
secrets:
GROCK_API_KEY: ${{ secrets.GROCK_API_KEY }}
Note: The
$symbol is escaped for proper interpretation in the Bash script.
2. autodocconfig.yml
project_name: "<current-folder-name>"
language: "en"
ignore_files:
# Python bytecode and cache
- "*.pyc"
- "*.pyo"
- "*.pyd"
- "__pycache__"
- ".ruff_cache"
- ".mypy_cache"
- ".auto_doc_cache"
- ".auto_doc_cache_file.json"
# Environments and IDE settings
- "venv"
- "env"
- ".venv"
- ".env"
- ".vscode"
- ".idea"
- "*.iml"
# Databases and binary data
- "*.sqlite3"
- "*.db"
- "*.pkl"
- "data"
# Logs and coverage reports
- "*.log"
- ".coverage"
- "htmlcov"
# Version control and assets
- ".git"
- ".gitignore"
- "migrations"
- "static"
- "staticfiles"
# Miscellaneous
- "*.pdb"
- "*.md"
build_settings:
save_logs: false
log_level: 2
structure_settings:
include_intro_links: true
include_intro_text: true
include_order: true
use_global_file: true
max_doc_part_size: 5000
Dynamic Project Name: The
project_nameis automatically set to the name of the current directory usingbasename "$PWD".
Critical Notes
- Directory Creation: Ensures the
.github/workflowsdirectory exists before generating the workflow file.- Dynamic Project Name: Automatically sets the project name in
autodocconfig.ymlbased on the current folder name.- Ignored Files: Includes a comprehensive list of file patterns to exclude from documentation generation.
- Extensibility: The script can be modified to include additional settings or configurations as needed.
The installation workflow involves using platform-specific scripts (install.ps1 for PowerShell and install.sh for Linux-based systems) to set up the required environment. Below is a detailed explanation:
Installation Process
-
PowerShell Installation:
- Execute the following command in PowerShell to run the
install.ps1script:irm https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.ps1 | iex
This command usesInvoke-RestMethod(irm) to fetch the script from the specified URL and pipes it toInvoke-Expression(iex) for execution.
- Execute the following command in PowerShell to run the
-
Linux-Based Systems Installation:
- Run the following command in the terminal to execute the
install.shscript:curl -sSL https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.sh | bash
Here,curlfetches the script from the provided URL, and thebashcommand executes it directly.
- Run the following command in the terminal to execute the
GitHub Actions Configuration
To ensure the workflow operates correctly, you need to add a secret variable to your GitHub Actions configuration:
- Secret Variable Setup:
- Navigate to your GitHub repository settings.
- Under the "Secrets and variables" section, click on "Actions".
- Add a new secret with the name
GROCK_API_KEY. - Use the API key obtained from the Grock documentation as the value for this secret.
By following these steps, the installation scripts will be executed properly, and the required API key will be securely integrated into your GitHub Actions workflow.
The autodocconfig.yml file is used to configure the behavior of the Auto Doc Generator project. Below are the available options and their descriptions based on the provided context:
-
project_name:- Specifies the name of the project.
- Example:
"Auto Doc Generator"
-
language:- Defines the language of the documentation.
- Example:
"en"
-
ignore_files:- A list of files or directories to be ignored during the documentation generation process.
- Supports specific file names, file extensions (e.g.,
*.pyc), and directory names. - Example:
ignore_files: - "dist" - "*.pyc" - "__pycache__" - "venv" - ".git"
-
build_settings:- Configures settings related to the build process.
- Options:
save_logs: Boolean value to enable or disable saving logs.- Example:
false
- Example:
log_level: Integer value to set the verbosity of logs.- Example:
2
- Example:
threshold_changes: Integer value to define the maximum number of changes before triggering specific actions.- Example:
20000
- Example:
-
structure_settings:- Configures the structure of the generated documentation.
- Options:
include_intro_links: Boolean value to include introductory links in the documentation.- Example:
true
- Example:
include_intro_text: Boolean value to include introductory text in the documentation.- Example:
true
- Example:
include_order: Boolean value to maintain the order of included sections.- Example:
true
- Example:
use_global_file: Boolean value to determine if a global file should be used.- Example:
true
- Example:
max_doc_part_size: Integer value to set the maximum size of a documentation part (in characters).- Example:
4000
- Example:
-
project_additional_info:- Provides a description or additional information about the project.
- Example:
project_additional_info: global idea: "This project was created to help developers make documentations for them projects"
-
custom_descriptions:- A list of custom descriptions or instructions to include in the documentation.
- Example:
custom_descriptions: - "explain how to write autodocconfig.yml file what options are available" - "explain how to use Manager class and what methods are available"
Git Status Check and Change Detection
This module is responsible for detecting changes in the Git repository to determine whether documentation updates are required. It analyzes the differences between the latest commit and the previous commit stored in the cache, and generates a detailed report of the changes.
Functional Role
The check_git_status module performs the following tasks:
- Retrieves the latest Git commit hash.
- Compares the current state of the repository with the previous commit.
- Generates a detailed report of added, deleted, and modified files.
- Determines whether documentation updates are necessary based on the detected changes.
Technical Logic Flow
-
Retrieve Latest Commit Hash:
- The
get_git_revision_hash()function uses thegit rev-parse HEADcommand to fetch the hash of the latest commit.
- The
-
Calculate Diff:
- The
get_diff_by_hash(target_hash)function executesgit diffto calculate the differences between the target commit and the current state of the repository. It excludes.mdfiles from the comparison.
- The
-
Detailed Diff Statistics:
- The
get_detailed_diff_stats(target_hash)function runsgit diff --numstatto generate a detailed report of added, deleted, and modified files. It categorizes the changes as:- ADDED: Files added to the repository.
- DELETED: Files removed from the repository.
- MODIFIED: Files with both additions and deletions.
- The
-
Change Detection:
- The
check_git_status(manager)function determines whether documentation updates are required based on the changes detected. If the GitHub event isworkflow_dispatchor the cache does not contain a previous commit, it marks the documentation for regeneration.
- The
Inputs, Outputs, and Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
target_hash |
str |
Target commit hash for comparison | Used in get_diff_by_hash and get_detailed_diff_stats. |
manager.cache_settings |
CacheSettings |
Stores cached settings and last commit | Contains the last processed commit hash. |
GITHUB_EVENT_NAME |
str |
GitHub event triggering the workflow | Determines if the workflow was manually triggered (workflow_dispatch). |
changes |
list[dict] |
List of file change details | Contains added, deleted, and modified file statistics. |
CheckGitStatusResultSchema |
Schema |
Result schema for change detection | Indicates whether documentation updates are required. |
Function Breakdown
get_git_revision_hash()
- Purpose: Fetches the latest Git commit hash.
- Logic: Executes
git rev-parse HEADand decodes the result. - Output: Returns the commit hash as a string.
get_diff_by_hash(target_hash)
- Purpose: Retrieves the diff between the target commit and the current state.
- Logic: Executes
git diffwith theHEADandtarget_hasharguments, excluding.mdfiles. - Output: Returns the diff as a string.
get_detailed_diff_stats(target_hash)
- Purpose: Generates a detailed report of file changes.
- Logic:
- Executes
git diff --numstat. - Parses the output to extract added, deleted, and modified file statistics.
- Categorizes changes into
ADDED,DELETED, andMODIFIED.
- Executes
- Output: Returns a list of dictionaries containing file change details.
check_git_status(manager)
- Purpose: Determines if documentation updates are required.
- Logic:
- Checks if the GitHub event is
workflow_dispatchor if no previous commit is cached. - Fetches detailed diff statistics using
get_detailed_diff_stats. - Calls
manager.check_sense_changes(changes)to analyze the impact of changes.
- Checks if the GitHub event is
- Output: Returns a
CheckGitStatusResultSchemaindicating whether updates are needed.
Critical Notes
- The module excludes
.mdfiles from the diff analysis to avoid unnecessary documentation regeneration.- If the GitHub event is
workflow_dispatch, the documentation is always marked for regeneration.- File changes are categorized into
ADDED,DELETED, andMODIFIEDbased on the number of lines added and deleted.
Class: CheckGitStatusResultSchema
Purpose
Represents the result of checking the Git status to determine if documentation updates are needed.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
need_to_remake |
bool |
Update Flag | Indicates if documentation needs to be updated. |
remake_gl_file |
bool |
Global File Update Flag | Indicates if the global documentation file needs to be updated. |
Critical Notes
- Iterative Processing: The
gen_doc_partsfunction processes chunks iteratively, ensuring scalability for large codebases.- Caching: The
CacheSettingsclass enables efficient caching of documentation and commit data, reducing redundant processing.- Progress Tracking: The progress tracker provides real-time feedback during documentation generation, improving user experience.
config_reader.py: Configuration Parsing and Initialization
Functional Role
The config_reader.py module is responsible for parsing the autodocconfig.yml file and initializing the core configuration objects (Config, StructureSettings, and custom modules). It serves as the entry point for setting up the documentation generation workflow based on user-defined settings.
Inputs, Outputs, and Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
file_data |
str |
YAML configuration file content | Contains user-defined settings for the documentation generation process. |
data |
dict[str, Any] |
Parsed YAML data | Extracted from file_data using yaml.safe_load. |
Config |
Config |
Centralized configuration object | Stores language, project name, ignored files, and additional info. |
StructureSettings |
StructureSettings |
Defines structural rules for documentation | Includes settings like intro links, order, and global file usage. |
custom_modules |
list[BaseModule] |
List of modular components for documentation | Generated based on custom_descriptions in the YAML file. |
Function Breakdown
read_config(file_data: str)
- Purpose: Parses the YAML configuration file and initializes the core configuration objects.
- Logic:
- Loads the YAML data using
yaml.safe_load. - Initializes a
Configobject and sets its properties:- Language (
language) - Project name (
project_name) - Ignored files (
ignore_files) - Project additional information (
project_additional_info) - Project build settings (
build_settings) viaProjectBuildConfig.
- Language (
- Creates a list of
BaseModuleinstances based oncustom_descriptions.- If a description starts with
%, aCustomModuleWithOutContextis created. - Otherwise, a
CustomModuleis created.
- If a description starts with
- Initializes a
StructureSettingsobject and loads its properties fromstructure_settings.
- Loads the YAML data using
- Output: Returns a tuple containing:
Configobject- List of
BaseModuleinstances StructureSettingsobject
Class Breakdown
StructureSettings
- Purpose: Defines structural rules for documentation generation.
- Attributes:
include_intro_links: Whether to include introductory links in the documentation.include_order: Whether to sort documentation sections in a specific order.use_global_file: Whether to use a global file for reusable documentation.max_doc_part_size: Maximum size of a documentation part (default: 5000 characters).include_intro_text: Whether to include introductory text in the documentation.
- Methods:
load_settings(data: dict[str, Any]): Dynamically loads settings from a dictionary.
Critical Notes
- The
read_configfunction is tightly coupled with theautodocconfig.ymlfile structure. Any changes to the YAML schema must be reflected in this function.- The
custom_descriptionsfield in the YAML file supports special syntax:
%prefix indicates aCustomModuleWithOutContext.- Without
%, aCustomModuleis created.
StructureSettingsprovides flexibility for customizing the structure and size of the generated documentation.
config.py: Project Configuration Management
Functional Role
The config.py module defines and manages project-specific configurations, including ignored files, language settings, project metadata, and build settings. It provides a centralized configuration object (Config) that can be customized and extended dynamically during runtime.
Inputs, Outputs, and Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
ignore_files |
list[str] |
List of file patterns to ignore | Default patterns include common temporary files, directories, and caches. |
language |
str |
Language for documentation generation | Default is "en". |
project_name |
str |
Name of the project | Can be dynamically set using set_project_name(). |
project_additional_info |
dict[str, str] |
Additional metadata for the project | Key-value pairs added via add_project_additional_info(). |
pbc |
ProjectBuildConfig |
Build configuration settings | Includes logging and change threshold settings. |
Class Breakdown
ProjectBuildConfig
- Purpose: Encapsulates build-specific settings for the project.
- Attributes:
save_logs: Boolean flag to enable or disable logging.log_level: Integer representing the logging verbosity level.threshold_changes: Integer threshold for detecting significant changes in the project.
- Methods:
load_settings(data: dict[str, Any]): Dynamically loads settings from a dictionary and assigns them to the corresponding attributes.
Config
- Purpose: Represents the main configuration object for the project.
- Attributes:
ignore_files: List of file patterns to exclude during processing.language: Language setting for documentation generation.project_name: Name of the project.project_additional_info: Dictionary for storing additional project metadata.pbc: Instance ofProjectBuildConfigfor managing build settings.
- Methods:
set_language(language: str): Updates the language setting.set_pcs(pcs: ProjectBuildConfig): Updates the project build configuration.set_project_name(name: str): Sets the project name.add_project_additional_info(key: str, value: str): Adds metadata toproject_additional_info.add_ignore_file(pattern: str): Appends a new file pattern to theignore_fileslist.get_project_settings(): Creates and returns aProjectSettingsobject populated with the current configuration.
Visible Interactions
-
Integration with
ProjectSettings:- The
get_project_settings()method initializes aProjectSettingsobject using the current configuration. - Additional project metadata is added to the
ProjectSettingsobject dynamically.
- The
-
Interaction with Other Modules:
- Imports
CustomModuleandIntroLinksfrom thefactory.modulespackage, indicating potential use in modular documentation generation. - Imports
ProjectSettingsfrom thepreprocessor.settingsmodule, suggesting integration with preprocessing workflows. - Imports
DocFactoryfrom thefactory.base_factorymodule, indicating its role in modular documentation generation.
- Imports
Technical Logic Flow
-
Initialization:
- The
Configclass is instantiated with default values forignore_files,language, andproject_additional_info. - A
ProjectBuildConfiginstance is created and assigned to thepbcattribute.
- The
-
Dynamic Configuration:
- Methods like
set_language(),set_project_name(), andadd_project_additional_info()allow dynamic customization of the configuration during runtime. - The
load_settings()method inProjectBuildConfigenables bulk updates to build-specific settings.
- Methods like
-
Project Settings Generation:
- The
get_project_settings()method creates aProjectSettingsobject using the current configuration. - Additional metadata from
project_additional_infois added to theProjectSettingsobject.
- The
Critical Notes
- The
ignore_fileslist includes patterns for common temporary files, directories, and caches. These can be extended dynamically usingadd_ignore_file().- The
ProjectBuildConfigclass provides flexibility for managing build-specific settings, such as logging and change thresholds.- The
Configclass is designed to be extensible, allowing for dynamic updates to project settings during runtime.
Class: ProjectSettings
Purpose
Manages project-specific metadata and generates a prompt for AI model interactions.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
project_name |
str |
Project Name | Name of the project. |
info |
dict[str, str] |
Project Metadata | Key-value pairs of project-specific information. |
Methods
add_info(key, value)
Adds a key-value pair to the info dictionary.
prompt (Property)
Generates a formatted string containing project metadata for use in AI model prompts.
Critical Notes
- Dynamic Prompt Generation: The
promptproperty dynamically constructs a string based on the project's metadata, ensuring flexibility for different projects.- Extensibility: The
ProjectSettingsclass can be extended to include additional attributes or methods as needed.
Class: CacheSettings
Purpose
Manages caching of the last commit and previously generated documentation.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
last_commit |
str |
Last Commit Hash | Stores the hash of the last Git commit. |
doc |
DocInfoSchema |
Documentation Schema | Stores the structure of the generated documentation. |
token_auth.py: API Token Authentication
Functional Role
The token_auth.py module retrieves API keys for external services (e.g., GitHub, Google Embedding) and writes them to the environment file (GITHUB_ENV) for subsequent workflow steps.
Inputs, Outputs, and Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
ADG_API_TOKEN |
str |
API authentication token | Retrieved from environment variables. |
DEFAULT_SERVER_URL |
str |
Base URL of the API server | Retrieved from environment variables. |
MODELS_API_KEYS |
str |
GitHub token for AI models | Retrieved from the API response. |
GOOGLE_EMBEDDING_API_KEY |
str |
Google token for embedding operations | Retrieved from the API response. |
TYPE_OF_MODEL |
str |
Specifies the type of AI model | Default value is git. |
GITHUB_ENV |
str |
Path to the GitHub environment file | Used to store retrieved API keys for subsequent workflow steps. |
Function Breakdown
main()
- Purpose: Retrieves API keys from the server and writes them to the environment file.
- Logic:
- Retrieves
ADG_API_TOKENandDEFAULT_SERVER_URLfrom environment variables. - Constructs the API endpoint (
{DEFAULT_SERVER_URL}/github/get_api_keys) and sends a GET request. - Validates the server response:
- Checks for successful status (
data["status"] == "success"). - Extracts
github_tokenandgoogle_tokenfrom the response data.
- Checks for successful status (
- Writes the retrieved keys to the environment file (
GITHUB_ENV) for use in subsequent steps. - Prints success or error messages based on the outcome.
- Retrieves
- Output: None (writes keys to
GITHUB_ENVor prints them locally).
Critical Notes
- The module assumes the presence of environment variables (
ADG_API_TOKEN,DEFAULT_SERVER_URL,GITHUB_ENV).- Failure to retrieve API keys or write to
GITHUB_ENVresults in error messages and termination (exit(1)).- The
TYPE_OF_MODELis set togitby default, but can be modified based on the retrieved keys.
Security Considerations
exceptions.py: Custom Exception Handling
Functional Role
The exceptions.py module defines custom exceptions for handling specific error scenarios within the Auto Doc Generator system.
Class Breakdown
ModelExhaustedException
- Purpose: Raised when no AI models are available for use in the documentation generation process.
- Attributes: None.
- Methods: None.
Critical Notes
- The
ModelExhaustedExceptionis intended to signal the exhaustion of available AI models, allowing for graceful error handling and fallback mechanisms.- This exception can be used in workflows involving AI model selection, ensuring that the system does not proceed without a valid model.
Module: logging.py
Purpose
Provides logging utilities for debugging, monitoring, and tracking application events.
Classes
Class: BaseLog
Purpose
Represents a base log message with customizable log levels.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
message |
str |
Log Message | Stores the log message. |
level |
int |
Log Level | Indicates the severity of the log. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
format |
None | str |
Formats the log message. | Returns the log message as a string. |
Class: ErrorLog
Purpose
Represents an error-level log message.
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
format |
None | str |
Formats the error log message. | Prepends [ERROR] to the log message. |
Class: WarningLog
Purpose
Represents a warning-level log message.
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
format |
None | str |
Formats the warning log message. | Prepends [WARNING] to the log message. |
Class: InfoLog
Purpose
Represents an info-level log message.
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
format |
None | str |
Formats the info log message. | Prepends [INFO] to the log message. |
Class: BaseLoggerTemplate
Purpose
Provides a template for logging messages to various outputs.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
log_level |
int |
Log Level Filter | Filters logs based on their severity. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
log |
log: BaseLog |
None |
Logs the message to the console. | Prints the formatted log message. |
global_log |
log: BaseLog |
None |
Logs the message globally. | Filters logs based on log_level. |
Class: FileLoggerTemplate
Purpose
Extends BaseLoggerTemplate to log messages to a file.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
file_path |
str |
Log File Path | Specifies the file path for logging. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
log |
log: BaseLog |
None |
Logs the message to the specified file. | Appends the formatted log message to the file. |
Class: BaseLogger
Purpose
Provides a singleton logger instance for centralized logging.
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
set_logger |
logger: BaseLoggerTemplate |
None |
Sets the logger template for the instance. | Allows customization of logging behavior. |
log |
log: BaseLog |
None |
Logs the message using the configured template. | Delegates logging to the logger_template. |
Critical Notes
- Singleton Design:
BaseLoggerensures a single logger instance throughout the application.- Extensibility:
FileLoggerTemplateandBaseLoggerTemplateallow flexible logging to different outputs.- Log Level Filtering: Logs can be filtered based on severity using
log_level.
post_to_server.py: Documentation Publishing via API
Functional Role
The post_to_server.py module handles the final step of the documentation generation workflow by uploading the generated documentation to a remote server using an API. It reads the cached documentation file (.auto_doc_cache_file.json) and sends it to the server.
Inputs, Outputs, and Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
ADG_API_TOKEN |
str |
API authentication token | Retrieved from environment variables. |
DEFAULT_SERVER_URL |
str |
Base URL of the documentation server | Retrieved from environment variables. |
REPO_ID |
str |
Repository identifier | Used to specify the target repository for documentation upload. |
.auto_doc_cache_file.json |
str |
Cache file containing documentation | Contains the generated documentation in JSON format. |
Function Breakdown
main()
- Purpose: Uploads the generated documentation to the remote server.
- Logic:
- Retrieves
ADG_API_TOKENandDEFAULT_SERVER_URLfrom environment variables. - Reads the cached documentation file (
.auto_doc_cache_file.json). - Sends a POST request to the server API:
- URL:
{DEFAULT_SERVER_URL}/docs/{REPO_ID}/push - Headers: Includes
Authorizationwith the API token. - Payload: Contains the cached documentation as JSON.
- URL:
- Raises an exception if the request fails (
result.raise_for_status()). - Prints the response data from the server.
- Retrieves
- Output: None (prints server response to the console).
Critical Notes
- The module assumes the presence of environment variables (
ADG_API_TOKEN,DEFAULT_SERVER_URL,REPO_ID) and the cache file (.auto_doc_cache_file.json).- The server API endpoint is dynamically constructed using
DEFAULT_SERVER_URLandREPO_ID.- Failure to authenticate or upload documentation will raise an exception (
result.raise_for_status()).
Security Considerations
run_file.py: Main Entry Point for Documentation Generation
Functional Role
The run_file.py module serves as the primary entry point for the Auto Doc Generator system. It orchestrates the entire documentation generation workflow, including initialization, change detection, documentation creation, post-processing, and saving the final output.
Inputs, Outputs, and Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
project_path |
str |
Path to the project directory | Specifies the root directory of the project to be documented. |
config |
Config |
Configuration object | Parsed from autodocconfig.yml using the read_config() function. |
custom_modules |
list[BaseModule] |
List of custom modules | Modules for generating specific sections of documentation. |
structure_settings |
StructureSettings |
Documentation structure settings | Defines structural rules for documentation generation. |
MODELS_CONFIG |
dict |
Mapping of model types to classes | Specifies available AI models (GPT4oModel, AzureModel, GPTModel). |
sync_model |
Model |
AI model instance | Used for generating documentation content. |
embedding_model |
Embedding |
Embedding model instance | Handles embedding operations for document search functionality. |
change_info |
CheckGitStatusResultSchema |
Change detection result | Indicates whether documentation updates are required. |
.auto_doc_cache_file.json |
str |
Cache file containing documentation | Stores the generated documentation for reuse or upload. |
Function Breakdown
gen_doc()
- Purpose: Generates structured documentation for the specified project.
- Logic:
- Initialization:
- Selects the AI model based on
TYPE_OF_MODELfromMODELS_CONFIG. - Initializes the
Managerwith project path, configuration, AI model, embedding model, and progress tracker.
- Selects the AI model based on
- Change Detection:
- Calls
check_git_status(manager)to determine if documentation updates are required. - If no changes are detected, loads cached documentation and terminates the workflow early.
- Calls
- Documentation Generation:
- Calls
manager.generate_code_file()to create base documentation files. - Optionally generates global reusable documentation (
manager.generate_global_info()) based onstructure_settings. - Splits documentation into manageable parts (
manager.generete_doc_parts()). - Uses
DocFactorywithcustom_modulesto generate modular documentation sections.
- Calls
- Post-Processing:
- Optionally adds introductory text and links using additional modules (
IntroText,IntroLinks). - Reorders document content (
manager.order_doc()). - Creates embedding layers for document search functionality (
manager.create_embedding_layer()).
- Optionally adds introductory text and links using additional modules (
- Finalization:
- Clears cache and saves the generated documentation (
manager.save()).
- Clears cache and saves the generated documentation (
- Output: Returns the full documentation content (
manager.doc_info.doc.get_full_doc()).
- Initialization:
Critical Notes
- The workflow is terminated early if no changes are detected (
sys.exit(0)).- The AI model and embedding model are dynamically selected based on environment variables (
MODELS_API_KEYS,GOOGLE_EMBEDDING_API_KEY).structure_settingscontrols optional features such as global files, introductory sections, and document ordering.
Security Considerations
- Ensure sensitive keys (
MODELS_API_KEYS,GOOGLE_EMBEDDING_API_KEY) are securely retrieved and stored.- Validate the integrity of
autodocconfig.ymlto avoid misconfigurations.- Handle missing or invalid environment variables gracefully to prevent runtime errors.
Progress Management Classes
Class: BaseProgress
Purpose
Provides a base interface for managing progress tracking tasks. This class is designed to be extended by specific implementations for different progress tracking mechanisms.
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
create_new_subtask |
name: str, total_len: int |
None |
Creates a new subtask for tracking. | Abstract method, implementation required in subclasses. |
update_task |
None | None |
Updates the progress of the current task. | Abstract method, implementation required in subclasses. |
remove_subtask |
None | None |
Removes the current subtask. | Abstract method, implementation required in subclasses. |
Class: LibProgress
Purpose
Extends BaseProgress to provide progress tracking using the rich.progress library.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
progress |
Progress |
Rich Progress Instance | Used to manage and display progress bars. |
_base_task |
Task |
General Progress Task | Represents the overall progress of the operation. |
_cur_sub_task |
Task |
Current Subtask | Tracks progress for the current subtask. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
create_new_subtask |
name: str, total_len: int |
None |
Creates a new subtask with a progress bar. | Adds a task to the rich.progress instance. |
update_task |
None | None |
Updates the progress of the current task. | Advances the progress bar for either the base task or the current subtask. |
remove_subtask |
None | None |
Removes the current subtask. | Resets the _cur_sub_task attribute to None. |
Class: ConsoleTask
Purpose
Provides a simple console-based progress tracker for tasks.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
name |
str |
Task Name | Specifies the name of the task. |
total_len |
int |
Total Length | Defines the total progress length. |
current_len |
int |
Current Progress | Tracks the current progress. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
start_task |
None | None |
Initializes the task progress. | Prints the task name and total length. |
progress |
None | None |
Updates and displays the task progress. | Calculates and prints the percentage of completion. |
Class: ConsoleGtiHubProgress
Purpose
Extends BaseProgress to provide console-based progress tracking for GitHub workflows.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
curr_task |
ConsoleTask |
Current Subtask | Tracks the current subtask progress. |
gen_task |
ConsoleTask |
General Progress Task | Tracks overall progress. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
create_new_subtask |
name: str, total_len: int |
None |
Creates a new subtask for tracking. | Initializes a new ConsoleTask instance for the subtask. |
update_task |
None | None |
Updates the progress of the current task. | Updates either the current subtask or the general task. |
remove_subtask |
None | None |
Removes the current subtask. | Resets the curr_task attribute to None. |
AzureModel: Azure AI Integration for Documentation Generation
Functional Role
The AzureModel class integrates Azure AI's ChatCompletionsClient to generate documentation content using AI models. It provides mechanisms for handling model selection, prompt parsing, and response cleaning, ensuring seamless interaction with Azure's AI inference services.
Class Breakdown
AzureModel
- Purpose: Implements AI-driven documentation generation using Azure's DeepSeek models.
- Attributes:
client: Instance ofChatCompletionsClientfor interacting with Azure AI services.logger: Instance ofBaseLoggerfor logging operations and errors.api_key: API key for authenticating with Azure AI.history: Instance ofHistoryfor managing conversation history.models_list: List of available AI models for inference.use_random: Boolean flag to determine if models should be selected randomly.current_key_index: Tracks the current API key in use.current_model_index: Tracks the current AI model in use.regen_models_name: List of models available for regeneration.
Visible Interactions
-
Integration with Azure AI:
- The
ChatCompletionsClientis initialized with an endpoint and credentials to interact with Azure AI's inference services. - The
complete()method is used to generate responses based on user/system messages.
- The
-
Interaction with Logging:
- Logs are generated using
BaseLoggerto track the progress of operations, errors, and warnings. - Log levels include
InfoLog,ErrorLog, andWarningLog.
- Logs are generated using
-
Exception Handling:
- Raises
ModelExhaustedExceptionwhen no models are available for use, ensuring graceful error handling.
- Raises
Technical Logic Flow
Initialization
- The
AzureModelconstructor initializes:- The
ChatCompletionsClientwith the provided API key and endpoint. - A logger instance for tracking operations.
- The
Prompt Parsing
_parse_prompt(data: list[dict[str, str]]):- Converts a list of dictionaries (
data) into a list ofUserMessageorSystemMessageobjects. - Differentiates between
systemanduserroles based on theroleattribute.
- Converts a list of dictionaries (
Response Cleaning
_clean_deepseek_response(text: str):- Removes
<think>...</think>blocks and extra whitespace from the AI-generated response using regex.
- Removes
Answer Generation
generate_answer(with_history: bool, prompt: list[dict[str, str]] | None):- Logs the start of the answer generation process.
- Determines the source of messages (
historyorprompt). - Parses the messages into
UserMessageorSystemMessageobjects. - Iteratively attempts to generate a response using available models:
- If a model fails, logs a warning and switches to the next model.
- Updates
current_key_indexandcurrent_model_indexto cycle through available API keys and models.
- Cleans the generated response using
_clean_deepseek_response. - Logs the generated answer and returns it.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
api_key |
str |
Authentication | API key for Azure AI services. |
history |
History |
Conversation Management | Stores previous user/system messages for context. |
models_list |
list[str] |
Model Selection | List of available AI models for inference. |
use_random |
bool |
Model Selection Strategy | Determines if models are selected randomly. |
messages |
list[dict[str, str]] |
Input Messages | User/system messages for generating responses. |
response |
ChatRequestMessage |
AI Response | Generated response from the AI model. |
result |
str |
Cleaned Response | Final processed response after cleaning. |
current_key_index |
int |
API Key Index | Tracks the current API key in use. |
current_model_index |
int |
Model Index | Tracks the current AI model in use. |
regen_models_name |
list[str] |
Regeneration Models | List of models available for regeneration. |
Critical Notes
- Error Handling: If all models fail, the system raises a
ModelExhaustedExceptionto prevent further execution without a valid model.- Dynamic Model Switching: The class cycles through available API keys and models to ensure continuity in response generation.
- Response Cleaning: The
_clean_deepseek_responsemethod ensures that unnecessary tags and whitespace are removed from the AI-generated content.- Logging: Comprehensive logging is implemented to track operations, errors, and warnings, aiding in debugging and monitoring.
Visible Interactions with Other Modules
History:- Manages conversation history for generating context-aware responses.
ModelExhaustedException:- Ensures graceful error handling when no models are available.
BaseLogger:- Provides logging utilities for tracking operations, errors, and warnings.
Terminal Points
- The
generate_answer()method returns the cleaned AI-generated response as a string. - If no models are available, the process terminates with a
ModelExhaustedException.
Critical Assumptions
- The Azure AI endpoint (
https://models.github.ai/inference) is operational and accessible.- The provided API keys and model names are valid and authorized for use.
- The
Historyobject is correctly populated with user/system messages for context-aware response generation.
GPTModel and GPT4oModel: AI Model Integration and Response Generation
Functional Role
The GPTModel and GPT4oModel classes are responsible for generating AI-driven responses using different sets of models. These classes implement the core logic for interacting with external AI services (OpenAI and Groq) and managing model selection, API key rotation, and error handling.
Technical Logic Flow
Initialization
Both classes inherit from the Model base class and initialize the following attributes:
- API Key Management:
api_key: List of API keys used for authentication with external AI services.current_key_index: Tracks the index of the currently active API key.
- Model Selection:
models_list: List of available AI models for inference.regen_models_name: Dynamically updated list of models available for regeneration.current_model_index: Tracks the index of the currently active model.use_random: Determines whether models are selected randomly.
- History:
history: Stores previous user/system messages for context-aware response generation.
- Client Initialization:
GPT4oModel: Uses OpenAI'sOpenAIclient for AI inference.GPTModel: Uses Groq'sGroqclient for AI inference.
- Logging:
- Both classes use
BaseLoggerfor tracking operations, errors, and warnings.
- Both classes use
Response Generation Workflow
generate_answer() Method
-
Input Handling:
- If
with_historyisTrue, retrieves the conversation history from theHistoryobject. - If
promptis provided, uses it as the input for response generation.
- If
-
Model Selection and API Key Rotation:
- Iterates through
regen_models_nameandapi_keysto find a working model and key. - If all models fail, raises a
ModelExhaustedException.
- Iterates through
-
AI Inference:
- Sends the
messagesinput to the AI service (OpenAIorGroq) using the selected model. - Parameters for OpenAI:
temperature: Controls randomness in responses (set to 0.3).top_p: Controls diversity in responses (set to 1.0).max_tokens: Limits the response length (set to 16384).
- Parameters for Groq:
- Model name is passed directly without additional parameters.
- Sends the
-
Error Handling:
- Logs warnings for failed models and rotates to the next available model and API key.
- If no models are available, logs an error and raises
ModelExhaustedException.
-
Response Cleaning:
- Extracts the AI-generated response (
chat_completion.choices[0].message.content). - Logs the generated response and the model used.
- Returns the cleaned response or an empty string if the result is
None.
- Extracts the AI-generated response (
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
api_key |
list[str] |
API Key Management | List of keys for authenticating with external AI services. |
current_key_index |
int |
API Key Index | Tracks the current API key in use. |
models_list |
list[str] |
Model Selection | List of available AI models for inference. |
regen_models_name |
list[str] |
Regeneration Models | List of models available for regeneration. |
current_model_index |
int |
Model Index | Tracks the current AI model in use. |
history |
History |
Conversation Management | Stores previous user/system messages for context. |
messages |
list[dict[str, str]] |
Input Messages | User/system messages for generating responses. |
chat_completion |
dict |
AI Response Object | Contains the raw response from the AI model. |
result |
str |
Cleaned Response | Final processed response after cleaning. |
logger |
BaseLogger |
Logging Utility | Tracks operations, errors, and warnings. |
Critical Notes
- Error Handling: If all models fail, the system raises a
ModelExhaustedExceptionto prevent further execution without a valid model.- Dynamic Model Switching: The class cycles through available API keys and models to ensure continuity in response generation.
- Response Cleaning: Ensures that unnecessary tags and whitespace are removed from the AI-generated content.
- Logging: Comprehensive logging is implemented to track operations, errors, and warnings, aiding in debugging and monitoring.
Visible Interactions with Other Modules
History:- Manages conversation history for generating context-aware responses.
ModelExhaustedException:- Ensures graceful error handling when no models are available.
BaseLogger:- Provides logging utilities for tracking operations, errors, and warnings.
Terminal Points
- The
generate_answer()method returns the cleaned AI-generated response as a string. - If no models are available, the process terminates with a
ModelExhaustedException.
Critical Assumptions
History Class
Functional Role
The History class manages conversation history for AI model interactions. It stores a sequence of user and system messages, enabling context-aware responses by maintaining a structured dialogue history.
Technical Logic Flow
-
Initialization:
- The constructor (
__init__) initializes thehistoryattribute as an empty list. - If a
system_promptis provided, it is added to the history with the role"system".
- The constructor (
-
Adding to History:
- The
add_to_history(role, content)method appends a dictionary containing theroleandcontentto thehistorylist.
- The
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
system_prompt |
str |
System Initialization | Initial system message for context. |
history |
list[dict[str, str]] |
Conversation History | Stores user/system messages. |
role |
str |
Message Role | Specifies the sender (user, system, assistant). |
content |
str |
Message Content | Actual message text. |
Critical Notes
- Stateful Context: The
Historyclass is essential for maintaining the context of conversations, which is critical for generating coherent responses from AI models.- System Prompt: The initial system prompt sets the tone and context for the AI model's responses.
ParentModel Abstract Base Class
Functional Role
The ParentModel class serves as an abstract base for AI model implementations. It defines the interface for generating responses and managing API keys and model selection.
Technical Logic Flow
-
Initialization:
- Accepts
api_key,history, andmodels_listas parameters. - Randomizes the order of
models_listifuse_randomis enabled. - Tracks the current API key and model indices.
- Accepts
-
Abstract Methods:
generate_answer(with_history, prompt): Generates an AI response based on the provided prompt.get_answer_without_history(prompt): Generates a response without using conversation history.get_answer(prompt): Generates a response using conversation history.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
api_key |
list[str] |
API Keys | List of API keys for authentication. |
history |
History |
Conversation History | Manages context for AI responses. |
models_list |
list[str] |
Model Selection | List of available AI models. |
regen_models_name |
list[str] |
Regeneration Models | Randomized list of models for inference. |
current_model_index |
int |
Model Index | Tracks the current model in use. |
current_key_index |
int |
API Key Index | Tracks the current API key in use. |
Critical Notes
- Dynamic Model Switching: Randomized model selection ensures diverse AI responses.
- Error Handling: Abstract methods enforce implementation of response generation logic in subclasses.
Model and AsyncModel Classes
Functional Role
These classes implement the ParentModel abstract methods to provide synchronous and asynchronous AI response generation.
Technical Logic Flow
-
ModelClass:- Implements synchronous methods for generating responses (
generate_answer,get_answer_without_history,get_answer). - Updates the
Historyobject with user and assistant messages.
- Implements synchronous methods for generating responses (
-
AsyncModelClass:- Implements asynchronous methods for generating responses (
generate_answer,get_answer_without_history,get_answer). - Uses
awaitfor asynchronous response generation.
- Implements asynchronous methods for generating responses (
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
generate_answer |
str or Coroutine |
Response Generation | Generates AI responses based on prompts. |
get_answer_without_history |
str or Coroutine |
History-Free Response | Generates responses without context. |
get_answer |
str or Coroutine |
Context-Aware Response | Generates responses using conversation history. |
Critical Notes
- Stateful Context: Both classes rely on the
Historyobject for managing conversation context.- Async Support:
AsyncModelenables non-blocking response generation for scalable applications.
Embedding Class: Vector Generation and Management
The Embedding class is responsible for generating embedding vectors for textual content using the genai library. These embeddings are used for semantic analysis and content organization within the Auto Doc Generator system.
Class: Embedding
Purpose
The Embedding class interfaces with the genai library to generate high-dimensional embedding vectors for text input. These vectors are used for semantic operations such as sorting and similarity calculations.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
client |
genai.Client |
API Client | Handles communication with the genai API. |
Methods
| Method | Input Parameters | Output Type | Role | Notes |
|---|---|---|---|---|
__init__ |
api_key: str |
None |
Initializes the API client | Requires a valid API key for authentication. |
get_vector |
prompt: str |
list |
Generates embedding vector | Uses the genai API to generate a 768-dimensional embedding vector from the input prompt. |
Critical Notes
- API Dependency: The
Embeddingclass relies on thegenailibrary and thegemini-embedding-2-previewmodel for embedding generation. Ensure the API key is valid and the model is accessible.- Error Handling: If the embedding generation fails, an exception is raised with the message
"promblem with embedding". This should be handled appropriately in higher-level workflows.
Embedding Functions: Sorting and Distance Calculation
The following functions are utility methods for processing embedding vectors and organizing content based on semantic similarity.
Function: bubble_sort_by_dist
Purpose
Sorts a list of tuples based on the second element (distance) in ascending order using the bubble sort algorithm.
Technical Logic Flow
- Iterate over the list multiple times.
- Compare adjacent elements and swap them if they are out of order.
- Return the sorted list.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
arr |
list |
Input List | List of tuples where the second element represents distance. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
sorted_arr |
list |
Sorted List | List sorted by distance in ascending order. |
Function: get_len_btw_vectors
Purpose
Calculates the Euclidean distance between two embedding vectors.
Technical Logic Flow
- Compute the difference between the vectors using
np.linalg.norm. - Return the distance as a float.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
vector1 |
np.ndarray |
First Vector | First embedding vector. |
vector2 |
np.ndarray |
Second Vector | Second embedding vector. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
distance |
float |
Euclidean Distance | Distance between the two vectors. |
Function: sort_vectors
Purpose
Sorts a dictionary of vectors based on their semantic distance from a root vector.
Technical Logic Flow
- Iterate over the dictionary of vectors.
- Calculate the distance between the root vector and each vector using
get_len_btw_vectors. - Append the vector name and distance to a list.
- Sort the list using
bubble_sort_by_dist. - Extract and return the sorted vector names.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
root_vector |
np.ndarray |
Reference Vector | Root vector used for distance comparison. |
other |
dict[str, Any] |
Dictionary of Vectors | Contains vector names and their corresponding embeddings. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
sorted_names |
list[str] |
Sorted Vector Names | List of vector names sorted by semantic distance. |
Sorting Functions: Anchor Extraction and Content Organization
The following functions are responsible for extracting HTML anchor links and organizing content based on semantic relationships.
Function: extract_links_from_start
Purpose
Extracts HTML anchor links from the start of text chunks.
Technical Logic Flow
- Use regex to identify anchor links in each chunk.
- Append valid links to a list.
- Return the list of links and a flag indicating whether the first chunk should be deleted.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
chunks |
list[str] |
Text Chunks | List of text chunks to process. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
links |
list[str] |
Extracted Links | List of valid anchor links. |
flag |
bool |
Deletion Flag | Indicates if the first chunk should be deleted. |
Function: split_text_by_anchors
Purpose
Splits text into chunks based on HTML anchor tags and maps each chunk to its corresponding anchor link.
Technical Logic Flow
- Use regex to split text into chunks based on anchor tags.
- Extract links from the chunks using
extract_links_from_start. - Map each link to its corresponding chunk.
- Return the mapping as a dictionary.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
text |
str |
Input Text | Raw text containing HTML anchor tags. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
mapping |
dict[str, str] |
Anchor-Chunk Mapping | Dictionary mapping anchor links to text chunks. |
Function: get_order
Purpose
Orders a list of chunk titles semantically using an AI model.
Technical Logic Flow
- Log the start of the ordering process.
- Generate a prompt for the AI model, instructing it to sort the titles semantically.
- Pass the prompt to the
Model.get_answer_without_historymethod. - Parse the AI model's response into a list of sorted titles.
- Log the sorted titles and return them.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
model |
Model |
AI Model | Used for semantic sorting of titles. |
chanks |
list[str] |
Chunk Titles | List of titles to be sorted. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
sorted_titles |
list[str] |
Sorted Titles | List of titles sorted semantically. |
Critical Notes
- Regex-Based Extraction: Functions like
extract_links_from_startandsplit_text_by_anchorsrely on regex patterns for parsing HTML anchor tags. Ensure the input text adheres to the expected format for accurate processing.- AI Model Dependency: The
get_orderfunction heavily depends on theModelclass for semantic sorting. Ensure the model is properly configured and accessible.- Error Handling: Functions like
split_text_by_anchorsraise exceptions if the number of links and chunks do not match, indicating potential issues with the input text.
Shared Observations
- Modular Design:
- Functions are designed to be reusable and modular, allowing them to be integrated into larger workflows.
- Logging Integration:
- The
get_orderfunction usesBaseLoggerfor logging progress and outputs, improving traceability during execution.
- The
- Iterative Processing:
- Functions like
sort_vectorsandsplit_text_by_anchorsprocess data iteratively, ensuring robustness against input variations.
- Functions like
- Error Handling:
Function: parse_answer
Purpose
Parses the AI model's response to determine whether documentation updates are required and whether global documentation files need to be regenerated.
Technical Logic Flow
- Splits the AI model's response string using the
|delimiter. - Evaluates the first segment (
splited[0]) to determine if documentation updates are required (change_doc). - Evaluates the second segment (
splited[1]) to determine if global documentation files need to be regenerated (change_global). - Returns a
CheckGitStatusResultSchemaobject with the parsed results.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
answer |
str |
AI Model Response | Raw response string from the AI model. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
CheckGitStatusResultSchema |
CheckGitStatusResultSchema |
Parsed Response Object | Contains flags for documentation updates and global file regeneration. |
Function: have_to_change
Purpose
Determines whether documentation updates are required based on code changes and global information by querying an AI model.
Technical Logic Flow
- Constructs a prompt for the AI model:
- Includes a system message (
BASE_CHANGES_CHECK_PROMPT). - Adds global information (
global_info) if provided. - Includes a user message containing the code changes (
diff).
- Includes a system message (
- Sends the prompt to the AI model using
model.get_answer_without_history. - Parses the AI model's response using
parse_answer. - Returns the parsed response as a
CheckGitStatusResultSchemaobject.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
model |
Model |
AI Model | Used to query the AI for decision-making. |
diff |
list[dict[str, str]] |
Code Changes | List of changes detected in the codebase. |
global_info |
`str | None` | Global Documentation Info |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
CheckGitStatusResultSchema |
CheckGitStatusResultSchema |
Parsed Response Object | Contains flags for documentation updates and global file regeneration. |
Function: should_ignore
Purpose
Determines whether a given file or directory path should be ignored based on the specified ignore patterns.
Technical Logic Flow
- Resolves the relative path of the file/directory against the root directory.
- Converts the relative path to a string for pattern matching.
- Iterates through the ignore patterns:
- Matches the full relative path.
- Matches the base name of the path.
- Matches individual components of the path.
- Returns
Trueif any pattern matches; otherwise, returnsFalse.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
path |
str |
File/Directory Path | Path to check against ignore patterns. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
ignored |
bool |
Ignore Flag | Indicates whether the path should be ignored. |
Functions in compressor.py
Function: compress
Purpose
Compresses a given text string using an AI model and project-specific settings.
Technical Logic Flow
- Constructs a
promptconsisting of:- A system message with project-specific settings (
project_settings.prompt). - A system message with compression settings (
get_BASE_COMPRESS_TEXT). - A user message containing the input data.
- A system message with project-specific settings (
- Sends the
promptto the AI model (model.get_answer_without_history) for processing. - Returns the compressed output.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
str |
Input Data | Text to be compressed. |
project_settings |
ProjectSettings |
Project Settings | Contains project-specific metadata. |
model |
Model |
AI Model | AI model used for text compression. |
compress_power |
int |
Compression Power | Determines the strength of compression. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
answer |
str |
Compressed Output | Compressed version of the input data. |
Function: compress_and_compare
Purpose
Compresses and combines multiple text strings into fewer, larger chunks, while tracking progress.
Technical Logic Flow
- Initializes an empty list to store compressed chunks.
- Creates a progress bar to track the compression process.
- Iterates through the input data:
- Divides the input into chunks based on
compress_power. - Compresses each chunk using the
compressfunction. - Appends the compressed results to the corresponding index in the output list.
- Updates the progress bar after processing each chunk.
- Divides the input into chunks based on
- Removes the progress bar once processing is complete.
- Returns the list of compressed chunks.
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
list[str] |
Input Data | List of text strings to be compressed. |
model |
Model |
AI Model | AI model used for text compression. |
project_settings |
ProjectSettings |
Project Settings | Contains project-specific metadata. |
compress_power |
int |
Compression Power | Determines the strength of compression. |
progress_bar |
BaseProgress |
Progress Tracker | Tracks the progress of the compression task. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
compress_and_compare_data |
list[str] |
Compressed Chunks | List of combined compressed text chunks. |
Function: compress_to_one
Purpose
Iteratively compresses a list of text strings into a single compressed output.
Technical Logic Flow
- Initializes a counter for the number of iterations.
- While the input data contains more than one item:
- Adjusts
compress_powerbased on the size of the input data. - Calls
compress_and_compareto compress and combine the data. - Updates the input data with the compressed results.
- Increments the iteration counter.
- Adjusts
- Returns the final compressed output (a single string).
Parameters
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
list[str] |
Input Data | List of text strings to be compressed. |
model |
Model |
AI Model | AI model used for text compression. |
project_settings |
ProjectSettings |
Project Settings | Contains project-specific metadata. |
compress_power |
int |
Compression Power | Determines the strength of compression. |
progress_bar |
BaseProgress |
Progress Tracker | Tracks the progress of the compression task. |
Output
| Entity | Type | Role | Notes |
|---|---|---|---|
data[0] |
str |
Final Compressed Output | Single compressed string. |
Critical Notes
- Error Handling: The
compressfunction relies on the AI model's response and assumes valid input data.- Iterative Compression: The
compress_to_onefunction reduces multiple chunks into a single compressed output, making it ideal for large datasets.- Progress Tracking: The
compress_and_compareandcompress_to_onefunctions integrate progress tracking viaBaseProgress.
Function: split_data
Purpose
Splits a large text string into smaller chunks based on a maximum symbol limit, ensuring each chunk adheres to size constraints for further processing.
Technical Logic Flow
-
Initialization:
- A logger instance (
BaseLogger) logs the start of the data splitting process. - The input data is split into smaller chunks (
splited_by_files) based on file delimiters.
- A logger instance (
-
Iterative Splitting:
- The function iteratively checks each chunk's size against the
max_symbolslimit. - If a chunk exceeds the limit, it is split into two smaller parts:
- The first part contains the initial half of the chunk.
- The second part contains the remaining half.
- The process repeats until all chunks are within the size limit.
- The function iteratively checks each chunk's size against the
-
Chunk Aggregation:
- The split chunks are aggregated into
split_objects. - If the current chunk exceeds the size limit for the current object, a new object is created, and the chunk is added to it.
- The split chunks are aggregated into
-
Completion:
- The logger logs the total number of parts generated and their adherence to the size limit.
- The function returns the list of split objects.
Inputs
| Entity | Type | Role | Notes |
|---|---|---|---|
splited_by_files |
list[str] |
Input Data | List of text strings to be split. |
max_symbols |
int |
Maximum Symbols | Maximum allowed size for each chunk. |
Outputs
| Entity | Type | Role | Notes |
|---|---|---|---|
split_objects |
list[str] |
Split Data | List of text chunks within size limits. |
Function: write_docs_by_parts
Purpose
Generates documentation for a specific chunk of text using an AI model, incorporating project settings and global context.
Technical Logic Flow
-
Initialization:
- A logger instance logs the start of the documentation generation process.
- A prompt is constructed using the project's metadata (
ProjectSettings.prompt), global relations, and the input chunk.
-
Prompt Construction:
- The prompt includes:
- Language settings.
- Project-specific metadata.
- Global relations (if available).
- Previous documentation context (if available).
- The current chunk of text.
- The prompt includes:
-
AI Interaction:
- The AI model (
model.get_answer_without_history) processes the prompt and generates documentation. - The generated documentation is cleaned by removing any markdown formatting (e.g., triple backticks).
- The AI model (
-
Completion:
- The logger logs the length and content of the generated documentation.
- The cleaned documentation is returned.
Inputs
| Entity | Type | Role | Notes |
|---|---|---|---|
part |
str |
Input Chunk | Text chunk for documentation generation. |
model |
Model |
AI Model | AI model used for generating documentation. |
project_settings |
ProjectSettings |
Project Metadata | Contains project-specific metadata. |
prev_info |
`str | None` | Previous Context |
language |
str |
Language Setting | Language for the generated documentation. |
global_info |
`str | None` | Global Relations |
Outputs
| Entity | Type | Role | Notes |
|---|---|---|---|
answer |
str |
Generated Documentation | Documentation generated by the AI model. |
Function: gen_doc_parts
Purpose
Generates documentation for an entire codebase by splitting it into manageable chunks and processing each chunk iteratively.
Technical Logic Flow
-
Data Splitting:
- Calls
split_datato divide the input codebase (full_code_mix) into smaller chunks based onmax_symbols.
- Calls
-
Documentation Generation:
- Initializes a logger and progress tracker (
BaseProgress). - Iterates over the split chunks:
- Calls
write_docs_by_partsto generate documentation for each chunk. - Appends the generated documentation to the final result (
all_result). - Updates the progress tracker after each chunk.
- Calls
- Initializes a logger and progress tracker (
-
Completion:
- Logs the total length of the generated documentation.
- Removes the progress tracker subtask.
- Returns the combined documentation.
Inputs
| Entity | Type | Role | Notes |
|---|---|---|---|
full_code_mix |
str |
Input Codebase | Full codebase to be documented. |
max_symbols |
int |
Maximum Symbols | Maximum size for each chunk. |
model |
Model |
AI Model | AI model used for generating documentation. |
project_settings |
ProjectSettings |
Project Metadata | Contains project-specific metadata. |
language |
str |
Language Setting | Language for the generated documentation. |
progress_bar |
BaseProgress |
Progress Tracker | Tracks the progress of the task. |
global_info |
`str | None` | Global Relations |
Outputs
| Entity | Type | Role | Notes |
|---|---|---|---|
all_result |
str |
Final Documentation | Combined documentation for the codebase. |
DocFactory Class
Functional Role
The DocFactory class orchestrates modular documentation generation by invoking BaseModule instances and optionally splitting results into parts.
Technical Logic Flow
-
Initialization:
- Accepts a list of
BaseModuleinstances and a flag (with_splited) to enable splitting results into parts.
- Accepts a list of
-
Documentation Generation:
- Iterates over the provided modules, invoking their
generatemethod. - Splits module results into parts using
split_text_by_anchorsifwith_splitedis enabled. - Logs the output of each module and updates progress.
- Iterates over the provided modules, invoking their
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
modules |
list[BaseModule] |
Modular Components | List of modules for generating documentation. |
logger |
BaseLogger |
Logging Utility | Tracks operations, errors, and warnings. |
with_splited |
bool |
Splitting Flag | Enables splitting results into parts. |
doc_head |
DocHeadSchema |
Documentation Schema | Stores generated documentation parts. |
Critical Notes
- Modular Design: The factory pattern allows for easy addition of new modules.
- Splitting Results: Splits documentation into smaller parts for better organization and readability.
Class: DocContent
Purpose
Represents individual documentation content with optional embedding vectors for semantic analysis.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
content |
str |
Documentation Content | Stores the textual content of the documentation. |
embedding_vector |
`list | None` | Embedding Vector |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
init_embedding |
embedding_model: Embedding |
None |
Initializes the embedding_vector using the provided embedding model. |
Calls embedding_model.get_vector(content) to generate embeddings. |
Module: doc_schema.py
Purpose
Defines the schema for structured documentation, including content management, embedding initialization, and hierarchical organization of documentation parts.
Classes
Class: DocHeadSchema
Purpose
Manages hierarchical organization of documentation parts and provides functionality to combine them into a full document.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
content_orders |
list[str] |
Content Order | Maintains the order of documentation parts. |
parts |
dict[str, DocContent] |
Documentation Parts | Stores individual documentation parts as DocContent objects. |
Methods
| Method | Parameters | Return Type | Role | Notes |
|---|---|---|---|---|
add_parts |
name: str, content: DocContent |
None |
Adds a new documentation part to the schema. | Ensures unique names by appending an index if a name conflict occurs. |
get_full_doc |
split_el: str = "\n" |
str |
Combines all documentation parts into a single document. | Concatenates content from all parts in content_orders. |
__add__ |
other: DocHeadSchema |
DocHeadSchema |
Merges two DocHeadSchema objects. |
Adds all parts from other to the current schema. |
Class: DocInfoSchema
Purpose
Represents the complete documentation schema, including global information, code mix, and hierarchical documentation structure.
Attributes
| Attribute | Type | Role | Notes |
|---|---|---|---|
global_info |
str |
Global Information | Stores general information about the project. |
code_mix |
str |
Codebase Representation | Contains the structured representation of the codebase. |
doc |
DocHeadSchema |
Documentation Structure | Stores the hierarchical structure of the documentation. |
Interactions
DocContent
- Embedding Initialization:
init_embeddingmethod interacts with theEmbeddingclass frompostprocessor.embedding.- Generates semantic vectors based on the content for advanced search or analysis.
DocHeadSchema
- Part Management:
- Adds parts using
add_partsand ensures unique naming. - Combines parts into a single document using
get_full_doc.
- Adds parts using
- Schema Merging:
- Supports merging multiple schemas using the
__add__method.
- Supports merging multiple schemas using the
DocInfoSchema
- Global Information:
- Stores project-wide metadata in
global_info.
- Stores project-wide metadata in
- Codebase Representation:
- Maintains a structured representation of the codebase in
code_mix.
- Maintains a structured representation of the codebase in
- Documentation Structure:
- Uses
DocHeadSchemato organize documentation parts.
- Uses
Critical Notes
- Embedding Integration:
DocContentrelies on theEmbeddingclass for semantic vector generation, enabling advanced search and analysis capabilities.- Hierarchical Organization:
DocHeadSchemaensures proper ordering and management of documentation parts, supporting modular and extensible documentation workflows.- Schema Extensibility: The
DocInfoSchemaclass provides a centralized structure for managing global information, codebase representation, and documentation hierarchy.
BaseModule Abstract Base Class
Functional Role
Defines the interface for modular components used in the DocFactory. Each module must implement the generate(info, model) method.
Technical Logic Flow
- Initialization:
- Provides a base structure for modular components.
- Abstract Method:
generate(info, model): Generates documentation based on the providedinfoandmodel.
Critical Notes
- Extensibility: New modules can be created by subclassing
BaseModuleand implementing thegeneratemethod.
Visible Interactions
DocFactory:- Invokes
BaseModule.generate()for each module during documentation generation.
- Invokes
split_text_by_anchors:- Optionally splits module results into smaller parts.
Terminal Points
- The
DocFactory.generate_doc()method returns aDocHeadSchemaobject containing all generated documentation parts.
CustomModule Class
Functional Role
The CustomModule class generates documentation with contextual descriptions by leveraging AI models and preprocessed code data.
Technical Logic Flow
-
Initialization:
- Accepts a
discriptionparameter to define the contextual description for documentation generation.
- Accepts a
-
Documentation Generation:
- Preprocesses the
code_mixdata usingsplit_datato divide it into manageable chunks (defaultmax_symbols=5000). - Calls
generete_custom_discriptionto generate documentation using the AI model, contextual description, and language settings.
- Preprocesses the
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
discription |
str |
Contextual Description | Defines the context for documentation generation. |
info["code_mix"] |
dict |
Preprocessed Code Data | Contains structured code information. |
info["language"] |
str |
Language Setting | Specifies the language for documentation. |
model |
Model |
AI Model | Generates documentation content. |
result |
str |
Generated Documentation | Final output of the module. |
Critical Notes
- Contextual Generation: The
discriptionparameter is central to tailoring the generated documentation.- Preprocessing Dependency: Relies on
split_datafor chunking large code data before processing.
CustomModuleWithOutContext Class
Functional Role
The CustomModuleWithOutContext class generates documentation without contextual descriptions, focusing solely on the provided discription.
Technical Logic Flow
-
Initialization:
- Accepts a
discriptionparameter to define the content for documentation generation.
- Accepts a
-
Documentation Generation:
- Calls
generete_custom_discription_withoutto generate documentation using the AI model,discription, and language settings.
- Calls
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
discription |
str |
Content Description | Defines the content for documentation generation. |
info["language"] |
str |
Language Setting | Specifies the language for documentation. |
model |
Model |
AI Model | Generates documentation content. |
result |
str |
Generated Documentation | Final output of the module. |
Critical Notes
- Simplified Generation: Focuses on generating documentation without requiring contextual data.
- Language-Specific Output: Adapts the generated content based on the
languageparameter.
IntroLinks Class
Functional Role
The IntroLinks class extracts HTML links from the provided data and generates an introduction based on those links using AI models.
Technical Logic Flow
-
Link Extraction:
- Extracts all HTML links from
info["full_data"]usingget_all_html_links.
- Extracts all HTML links from
-
Introduction Generation:
- Calls
get_links_introto generate an introduction based on the extracted links, AI model, and language settings.
- Calls
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
info["full_data"] |
str |
Full Project Data | Contains raw data including HTML links. |
info["language"] |
str |
Language Setting | Specifies the language for documentation. |
model |
Model |
AI Model | Generates introduction content. |
links |
list[str] |
Extracted Links | List of HTML links extracted from the data. |
intro_links |
str |
Generated Introduction | Final output of the module. |
Critical Notes
- Link-Based Introduction: Tailors the introduction based on the presence of HTML links in the data.
- Preprocessing Dependency: Relies on
get_all_html_linksfor link extraction.
IntroText Class
Functional Role
The IntroText class generates an introductory section for documentation based on global project information.
Technical Logic Flow
- Introduction Generation:
- Calls
get_introdactionto generate an introduction using the global project information, AI model, and language settings.
- Calls
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
info["global_info"] |
str |
Global Project Info | Contains reusable global documentation data. |
info["language"] |
str |
Language Setting | Specifies the language for documentation. |
model |
Model |
AI Model | Generates introduction content. |
intro |
str |
Generated Introduction | Final output of the module. |
Critical Notes
- Global Context: Utilizes
global_infoto create a high-level introduction for the documentation.- Language-Specific Output: Adapts the generated content based on the
languageparameter.
Shared Observations
-
Modular Design:
- All classes extend
BaseModule, ensuring consistency and reusability within theDocFactory.
- All classes extend
-
AI Model Dependency:
- Each module relies heavily on the
Modelclass for generating content, making it a critical component of the system.
- Each module relies heavily on the
-
Preprocessing and Postprocessing:
Postprocessor: Custom Intro
Functional Role
The custom_intro.py module is responsible for generating various types of introductory content for documentation. It uses AI models to create link-based introductions, global introductions, and custom descriptions based on the provided context.
Technical Logic Flow
1. HTML Link Extraction
- Function:
get_all_html_links(data: str) -> list[str]- Uses regex to extract anchor links (
<a name="..."></a>) from the input data. - Filters links based on length (
> 5 characters) and prefixes them with#. - Logs the extraction process and outputs the list of links.
- Uses regex to extract anchor links (
2. Link-Based Introduction
- Function:
get_links_intro(links: list[str], model: Model, language: str = "en") -> str- Constructs a prompt using the extracted links and predefined instructions (
BASE_INTRODACTION_CREATE_LINKS). - Sends the prompt to the AI model (
model.get_answer_without_history) to generate an introduction. - Logs the generated introduction.
- Constructs a prompt using the extracted links and predefined instructions (
3. Global Introduction
- Function:
get_introdaction(global_data: str, model: Model, language: str = "en") -> str- Constructs a prompt using global project data and predefined instructions (
BASE_INTRO_CREATE). - Sends the prompt to the AI model to generate an introduction.
- Returns the generated introduction.
- Constructs a prompt using global project data and predefined instructions (
4. Custom Description Generation
- Function:
generete_custom_discription(splited_data: str, model: Model, custom_description: str, language: str = "en") -> str- Iterates over split data chunks (
splited_data). - Constructs a detailed prompt using the chunk, predefined instructions (
BASE_CUSTOM_DISCRIPTIONS), and the custom description task. - Sends the prompt to the AI model to generate a description.
- Breaks the loop if a valid description is generated; otherwise, retries with the next chunk.
- Iterates over split data chunks (
5. Custom Description Without Context
- Function:
generete_custom_discription_without(model: Model, custom_description: str, language: str = "en") -> str
Manager Class Usage and Methods
The Manager class is instantiated and used in the gen_doc function. Based on the provided context, the Manager class is responsible for managing the documentation generation process. It interacts with various components such as configuration, models, embedding layers, and progress tracking.
How to Use the Manager Class
-
Initialization:
- The
Managerclass is initialized with the following parameters:project_path: The path to the project for which documentation is being generated.config: AConfigobject containing project configuration details.llm_model: A language model instance (e.g.,GPT4oModel,AzureModel, etc.).embedding_model: An embedding model instance (e.g.,Embedding).progress_bar: A progress bar instance (e.g.,ConsoleGtiHubProgress).
- The
-
Workflow:
- The
Managerclass is used to perform various tasks such as loading project information, generating documentation, managing global files, creating embedding layers, and saving the final output.
- The
Methods Used in the Context
The following methods of the Manager class are explicitly used in the provided context:
-
load_all_info():- Loads all necessary information about the project.
-
save():- Saves the current state of the manager, including generated documentation.
-
generate_code_file():- Generates a code file as part of the documentation process.
-
generate_global_info(compress_power: int, is_reusable: bool):- Generates global information for the documentation.
- Parameters:
compress_power: Controls the level of compression for the global information.is_reusable: Indicates whether the global information can be reused.
-
generete_doc_parts(max_symbols: int, with_global_file: bool):- Generates parts of the documentation.
- Parameters:
max_symbols: Maximum number of symbols allowed in each part.with_global_file: Indicates whether to include global information in the documentation parts.
-
factory_generate_doc(factory: DocFactory, to_start: bool = False):- Generates documentation using a factory pattern.
- Parameters:
factory: An instance ofDocFactorycontaining custom modules.to_start: IfTrue, the generated documentation is added to the start.
-
order_doc():- Orders the documentation parts.
-
create_embedding_layer():- Creates an embedding layer for the documentation.
-
clear_cache():- Clears the cache used during the documentation generation process.
Code Example
Below is an example of how to use the Manager class based on the provided context:
from autodocgenerator.manage import Manager
from autodocgenerator.factory.base_factory import DocFactory
from autodocgenerator.ui.progress_base import ConsoleGtiHubProgress
from autodocgenerator.engine.models.gpt_model import GPT4oModel
from autodocgenerator.postprocessor.embedding import Embedding
from autodocgenerator.schema.cache_settings import CheckGitStatusResultSchema
# Configuration and model setup
project_path = "path/to/your/project"
config = Config() # Initialize Config object and set necessary configurations
llm_model = GPT4oModel(api_keys="your_api_keys", use_random=False)
embedding_model = Embedding(api_key="your_google_embedding_api_key")
progress_bar = ConsoleGtiHubProgress()
# Initialize Manager
manager = Manager(
project_path=project_path,
config=config,
llm_model=llm_model,
embedding_model=embedding_model,
progress_bar=progress_bar,
)
# Load project information
manager.load_all_info()
# Generate code file
manager.generate_code_file()
# Generate global information
manager.generate_global_info(compress_power=4, is_reusable=True)
# Generate documentation parts
manager.generete_doc_parts(max_symbols=5000, with_global_file=True)
# Use a factory to generate documentation with custom modules
custom_modules = [] # List of custom modules
doc_factory = DocFactory(*custom_modules)
manager.factory_generate_doc(doc_factory)
# Order the documentation
manager.order_doc()
# Create embedding layer
manager.create_embedding_layer()
# Clear cache
manager.clear_cache()
# Save the documentation
manager.save()
This example demonstrates the initialization and usage of the Manager class, along with its key methods. Note that some details, such as the Config object setup and API keys, need to be provided based on your specific use case.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autodocgenerator-1.6.5.9.tar.gz.
File metadata
- Download URL: autodocgenerator-1.6.5.9.tar.gz
- Upload date:
- Size: 80.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.12.13 Linux/6.17.0-1010-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c21513c12c26a5e679c95c2ed1b3da2c9c427c687bc23bb0a276da5076d998a6
|
|
| MD5 |
5b56b9d4f4749529bef50d19f5f07c42
|
|
| BLAKE2b-256 |
a6a41b806c75a3531729f8a227e00aaa737c47120a892f1985661d5990014f59
|
File details
Details for the file autodocgenerator-1.6.5.9-py3-none-any.whl.
File metadata
- Download URL: autodocgenerator-1.6.5.9-py3-none-any.whl
- Upload date:
- Size: 63.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.12.13 Linux/6.17.0-1010-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d644ba5fef12fa8cf2d5dc7b960502d05da8073cd830e4509d97c62f4f6f0db3
|
|
| MD5 |
479e5f728ad419c2796230ba60a4e2c3
|
|
| BLAKE2b-256 |
8d58c15e8e008e5a82e4650d0412c937b33cfe7c00580f4dd710e6a800f29bc3
|