This Project helps you to create docs for your projects
Project description
Auto‑Doc Generator
A layered, factory‑based, LLM‑driven Markdown documentation pipeline for any codebase.
1. Project Title
Auto‑Doc Generator – Layered + Factory + LLM‑Driven
2. Project Goal
To automatically produce a complete, readable README (or other Markdown artifacts) from the source code of a repository.
The tool parses the code, chunks it to stay within token limits, sends those fragments to a large‑language model (Groq or OpenAI), formats the generated text with reusable modules, and stitches the result into a single cohesive document. The solution is CI‑friendly and can be invoked from a local CLI or a GitHub Action.
3. Core Logic & Principles
3.1 Pipeline Overview
| Phase | Action | Key Components |
|---|---|---|
| Configuration | Read autodocconfig.yml → Config, StructureSettings, custom‑module lists |
auto_runner/config_reader.py |
| Entry Point | run_file.__main__ calls gen_doc(project_path, …) |
auto_runner/run_file.py |
| Repository Walk | Scan files, split code into manageable chunks | manage.py (preprocessor spliter.py, compressor.py) |
| LLM Interaction | Submit chunks to GPTModel (rotating keys, history, logging) |
engine/models/gpt_model.py, engine/config/config.py |
| Doc Construction | Each BaseModule (e.g., IntroText, CustomModule) processes LLM output → Markdown section |
factory/base_factory.py, factory/modules/* |
| Post‑processing | Optional re‑ordering, anchor extraction, intro sections, cache clearance | postprocessor/* |
| Persistence | Write README.md, logs, and cache |
Manager.save() in manage.py |
The pipeline is fully layered – each stage exposes a small, single‑purpose interface – and uses a factory pattern to iterate over a configurable list of doc modules.
3.2 LLM Wrapper
GPTModel (synchronous) / AsyncGPTModel (async) manage a pool of API keys and models.
- Model rotation –
ModelExhaustedExceptionis raised only when all configured keys/models are exhausted. - History tracking – keeps the last 3 k characters of context for subsequent prompts.
- Prompt assembly – pulls constants such as
BASE_SYSTEM_TEXT,BASE_INTRO_CREATE, etc., fromengine/config/config.py.
3.3 Pre‑processing
- Splitting –
split_datarespects a user‑definedmax_symbolsthreshold and applies heuristics to stay below token limits. - Compression –
compressor.pycan reduce large files into concise prompts before they hit the LLM. - Discovery –
settings.pycontrols file patterns to ignore, language, and metadata extraction.
3.4 Post‑processing
- Sorting & Ordering –
postprocessor.sortingplaces sections in a logical sequence. - Anchor Extraction – creates internal links for easy navigation.
- Intro Generation – optional introductory text or global sections.
3.5 Logging & UI
- Singleton Logger –
BaseLoggerfunnels all logs, with optional file output viaFileLoggerTemplate. - Progress Feedback –
ConsoleGitHubProgressshows real‑time status during CI runs, whileLibProgress(Rich) can be used locally.
4. Key Features
- Zero‑setup Documentation – a single
autodocconfig.ymlconfigures patterns, languages, and module order. - Modular Architecture – add or replace
BaseModuleimplementations without touching core logic. - LLM Flexibility – supports Groq, OpenAI, or any future LLM via the
gpt_model.pyabstraction. - Token‑Aware Chunking – automatically splits files to stay within token limits while preserving context.
- Post‑processing Pipeline – reorder, add anchors, and create global intros automatically.
- CI‑Ready – bundled GitHub workflows, progress output for GitHub Actions.
- Cache & History Management – avoids redundant API calls and keeps conversational context.
- Extensible Prompt System – all AI instructions live in
engine/config/config.py; modify tone or formatting with minimal code changes. - Exception Handling – graceful fallback when API limits or key exhaustion occurs.
5. Dependencies
| Library / Tool | Purpose |
|---|---|
| Python 3.10+ | Core runtime |
| rich | Optional console progress UI |
| pydantic | Schema validation (DocContent, DocInfoSchema, etc.) |
| groq / openai SDK | LLM client (client choice determined by GPTModel implementation) |
| PyYAML | Read autodocconfig.yml |
| GitHub Actions | CI workflows (see .github/workflows/*) |
| logging | Standard Python logger (wrapped by BaseLogger) |
All third‑party dependencies are listed in
requirements.txtand are installable viapip install -r requirements.txt.
Auto‑Doc Generator delivers a maintainable, plug‑and‑play solution that turns raw source into polished, AI‑generated documentation while keeping the developer in full control of prompts, module composition, and CI integration.
Executive Navigation Tree
-
📖 Introduction
-
🔧 Utilities
-
📦 Modules
-
⚙️ Configuration
-
📦 Manager
-
🔩 Components
-
📄 Documentation Generation
- get-all-html-links
- data-contract
- data-contract-gptmodel
- code-mix-class
- code-mix-generation
- code-splitting
- compressor-functions
- compressor-module
- compression-flow
- reassembly
- CONTENT_DESCRIPTION
- doc-factory
- doc-schemas
- generete-custom-discription
- generete-custom-discription-without
- gen-doc-parts
- global-info-generation
- parts-generation
- write-docs
- factory-generate
-
🔗 Cross‑module Interaction
-
📁 Anchor & Path
-
🛠️ Exception & Cache
-
📑 Summary & Misc
-
📚 General
get_all_html_links(data: str) → list[str]
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
str |
Markdown source | Input document to search for <a name> anchors. |
links |
list[str] |
Result | Returns [#anchor] for each anchor with >5 chars. |
logger |
BaseLogger |
Logger | Logs extraction steps. |
pattern |
re.Pattern |
Anchor regex | Matches <a name="..."> or <a name='...'>. |
Logic Flow
- Compile regex
r'<a name=["\']?(.*?)["\']?>. - Iterate over all matches; extract
anchor_name. - If
len(anchor_name)>5, prepend#and append tolinks. - Log number of links and list.
Result – Returns a list of markdown anchor links that will be used in table‑of‑contents sections.
get_introdaction(global_data: str, model: Model, language: str = "en") → str
| Entity | Type | Role | Notes |
|---|---|---|---|
global_data |
str |
Raw project description | Supplied by the calling pipeline. |
model |
Model |
LLM wrapper | Calls get_answer_without_history. |
language |
str |
Target language | Included in system instruction. |
Logic Flow
- Compose a 3‑message prompt using
BASE_INTRO_CREATE. - Pass to LLM and return the resulting intro text.
Result – The top‑level introduction for the README.
get_links_intro(links: list[str], model: Model, language: str = "en") → str
| Entity | Type | Role | Notes |
|---|---|---|---|
links |
list[str] |
Anchor list | Obtained from get_all_html_links. |
model |
Model |
LLM wrapper | Calls get_answer_without_history. |
language |
str |
Target language | Sent as a system instruction. |
prompt |
list[dict] |
LLM prompt | Contains system messages and BASE_INTRODACTION_CREATE_LINKS. |
Logic Flow
- Build a 3‑message prompt: language instruction,
BASE_INTRODACTION_CREATE_LINKS, and thelinksstring. - Invoke
model.get_answer_without_history. - Return the generated introductory section.
Result – A Markdown fragment that introduces the generated documentation with clickable links.
Intro Modules – Automated Introduction Sections
class IntroLinks(BaseModule):
def generate(self, info: dict, model: Model):
links = get_all_html_links(info.get("full_data"))
return get_links_intro(links, model, info.get("language"))
class IntroText(BaseModule):
def generate(self, info: dict, model: Model):
return get_introdaction(info.get("global_info"), model, info.get("language"))
- Purpose – Build introductory material from the repository’s metadata.
IntroLinkscollects all external URLs and asks the model to format a “Links” section.IntroTextsynthesizes a high‑level introduction based onglobal_info.
- Shared Mechanism – Both rely on
postprocessor.custom_introfunctions and the samemodel.
| Module | Input Key | Prompt Type | Output |
|---|---|---|---|
IntroLinks |
full_data |
Link list | Markdown list of links |
IntroText |
global_info |
Overview prompt | Markdown intro |
Note – The modules are pure‑function; no side‑effects beyond returning strings.
Error Handling – Any exception in the underlying LLM call propagates toDocFactory;ModelExhaustedExceptionis surfaced at the top level.
Welcome Message Display
The module executes a single helper, _print_welcome, that renders an ASCII banner and a status line when the package is imported.
def _print_welcome():
...
print(ascii_logo)
print(f"{CYAN}ADG Library{RESET} | {BOLD}Status:{RESET} Ready to work V0.0.1")
print(f"{'—' * 35}\n")
The routine uses ANSI escape sequences to colour the output. It is immediately invoked at import time, so any consumer of the library will see the banner in the console.
| Entity | Type | Role | Notes |
|---|---|---|---|
BLUE |
str | Colour escape for banner | "\033[94m" |
BOLD |
str | Bold text escape | "\033[1m" |
CYAN |
str | Colour escape for library name | "\033[96m" |
RESET |
str | Reset formatting | "\033[0m" |
ascii_logo |
str | Multi‑line ASCII art | displayed on import |
_print_welcome() |
function | Side‑effect: prints banner | executed automatically |
Notice: The function is not exported; it is a private helper for visual feedback only.
Logging Strategy
BaseLoggeris instantiated locally in each function; it is a singleton under the hood, guaranteeing a single output stream.InfoLogobjects carry alevelinteger; higher levels produce quieter logs.- All major stages emit a message, including lengths of generated text.
Logging Component – autodocgenerator/ui/logging.py
| Entity | Type | Role | Notes |
|---|---|---|---|
BaseLog |
Abstract class | Base for log objects | Stores message, level; generates timestamped prefix |
ErrorLog, WarningLog, InfoLog |
Sub‑classes | Emit formatted strings with severity tags | Override format() |
BaseLoggerTemplate |
Logger abstraction | Holds log_level; routes messages through global_log |
log() writes directly (e.g., console) |
FileLoggerTemplate |
Concrete template | Appends formatted logs to a file | Uses file_path |
BaseLogger |
Singleton | Central logger instance | set_logger() attaches a concrete template; log() delegates to the template |
Implementation Detail –
BaseLogger.__new__guarantees a single shared instance across the project.
Logger Initialization
After printing the banner, the module sets up the global logger that the rest of the package relies on.
from .ui.logging import BaseLogger, BaseLoggerTemplate, InfoLog, ErrorLog, WarningLog
logger = BaseLogger()
logger.set_logger(BaseLoggerTemplate())
The BaseLogger is a singleton that aggregates log handlers. BaseLoggerTemplate defines the output format and default level. The logger instance becomes available to any sub‑module that imports it from autodocgenerator.
| Symbol | Scope | Effect |
|---|---|---|
BaseLogger |
Module | Singleton logger class |
BaseLoggerTemplate |
Module | Log handler configuration |
logger |
Module | Global logger instance |
Assumption: The
ui.loggingmodule implements a standard logger that accepts a template viaset_logger. No other configuration is performed in this file.
BaseModule – Contract for Documentation Builders
class BaseModule(ABC):
@abstractmethod
def generate(self, info: dict, model: Model):
...
- Role – Declares the interface a concrete module must implement.
- Parameters –
infois a dictionary of repo‑wide data;modelis aModelinstance that talks to Groq/Chat‑GPT. - Output – A Markdown string produced by the module.
CustomModule – Context‑Rich Description
class CustomModule(BaseModule):
def generate(self, info: dict, model: Model):
return generete_custom_discription(
split_data(info.get("code_mix"), max_symbols=5000),
model,
self.discription,
info.get("language"))
- Goal – Produce a documentation segment that incorporates a code snippet limited to
5 000symbols. - Dependencies –
split_data(preprocessor),generete_custom_discription(postprocessor). - Data Flow
info["code_mix"]→ split to a manageable chunk.- Chunk + user
discription+ language sent to the model. - Result returned as Markdown.
CustomModuleWithOutContext – Self‑Contained Descriptions
class CustomModuleWithOutContext(BaseModule):
def generate(self, info: dict, model: Model):
return generete_custom_discription_without(
model, self.discription, info.get("language"))
- Use‑case – Generates a section that does not depend on any source fragment.
- Inputs – Only
model, staticdiscription, and language. - Output – Plain Markdown paragraph.
Install PowerShell Script – install.ps1
| Step | Action | Outcome |
|---|---|---|
Create .github/workflows dir |
New-Item -ItemType Directory -Force |
Directory ready for workflow file |
| Write workflow YAML | Here‑string @' … '@ piped to Out-File |
Generates autodoc.yml that calls reusable workflow |
Generate autodocconfig.yml |
Here‑string with current folder name and settings | Provides ignore patterns, build and structure flags |
Notice – The script uses PowerShell variable interpolation and
Out-File -Encoding utf8to ensure proper Unicode handling.
Installer Shell Script
Purpose
Creates the CI workflow and initial configuration for the Auto‑Doc Generator.
The script is run in the project root; it writes a GitHub Actions workflow
(.github/workflows/autodoc.yml) that re‑uses a shared reusable workflow
and an autodocconfig.yml file that stores project‑specific settings.
| Entity | Type | Role | Notes |
|---|---|---|---|
mkdir -p .github/workflows |
command | Ensure the workflow directory exists | No side‑effects beyond directory creation |
autodoc.yml |
YAML file | Defines the GitHub Actions job | Uses GROCK_API_KEY secret |
autodocconfig.yml |
YAML file | Holds project metadata and ignore patterns | Generated using shell basename and heredoc |
echo "✅ Done!" |
console output | Feedback to the user | Non‑blocking |
Processing Flow
-
Create Workflow Directory
mkdir -p .github/workflows
Ensures the directory for GitHub Actions files exists.
-
Write
autodoc.yml
The file declares a dispatchable workflow that invokes a reusable workflow stored atDrag‑GameStudio/ADG/.github/workflows/reuseble_agd.yml@main. It passes theGROCK_API_KEYsecret and grants write permission for repository contents. -
Write
autodocconfig.yml
Populates project metadata (project_name,language) and several sections:ignore_files– glob patterns for files to exclude.build_settings– logging options.structure_settings– toggles for generated sections. The content is output via a here‑document, with the first$escaped (\$) so the key can be rendered correctly in GitHub secrets.
-
Final Notification
echo "✅ Done! …"confirms success.
Critical Assumption
The script expects to be executed in the repository root where a GitHub Actions workflow can be committed and aautodocconfig.ymlcan be placed.
To set up the application you can run one of two bootstrap scripts directly from the repository.
-
On Windows with PowerShell execute:
irm https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.ps1 | iex
The script pulls the latest installation package, configures required environment variables, and starts the service.
-
On Linux‑based systems run the shell script:
curl -sSL https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.sh | bash
This performs the same sequence of download, setup, and launch steps in a POSIX environment.
If you intend to run the installer within a CI pipeline such as GitHub Actions, create a secret named GROCK_API_KEY in the repository’s secret store. The value must be the API key obtained from the Grock interface at https://grockdocs.com. The installer will automatically consume this secret to authenticate with the Grock service during deployment.
Manager – Repository‑level Orchestration
Purpose –
The Manager class is the central coordination hub for the Auto‑Doc Generator pipeline.
It loads and persists intermediate artefacts, triggers LLM‑based transformations, and accumulates final Markdown output.
Manager class
| Method | Purpose |
|---|---|
generate_code_file() |
Scans the project folder and creates a cache of Python source files for later use. |
generate_global_info(compress_power: int) |
Builds a compressed global information file if requested. |
generete_doc_parts(max_symbols: int, with_global_file: bool) |
Splits the cached code into document parts respecting the maximum symbol count. |
factory_generate_doc(factory, to_start: bool = False, with_splited: bool = True) |
Uses a DocFactory instance to create documentation pieces, optionally inserting them at the beginning and controlling whether the result is split into multiple sections. |
order_doc() |
Reorders the generated documentation sections according to a predefined sequence. |
clear_cache() |
Deletes temporary files and data used during the generation cycle. |
save() |
Persists the final documentation object to disk. |
doc_info.doc.get_full_doc() |
Retrieves the complete assembled documentation text. |
Typical usage
from autodocgenerator.manage import Manager
from autodocgenerator.factory.base_factory import DocFactory
from autodocgenerator.factory.modules.general_modules import CustomModule
from autodocgenerator.ui.progress_base import ConsoleGtiHubProgress
from autodocgenerator.auto_runner.config_reader import read_config
from autodocgenerator.engine.models.gpt_model import GPTModel
from autodocgenerator.engine.config.config import API_KEYS
# Load configuration data from a file (context provided in the project)
with open("autodocconfig.yml", "r", encoding="utf-8") as f:
cfg_data = f.read()
config_obj, custom_mods, struct_opts = read_config(cfg_data)
# Prepare the language model and progress indicator
llm = GPTModel(API_KEYS, use_random=False)
# Create Manager
mgr = Manager(
project_path=".", # root of the target project
config=config_obj,
llm_model=llm,
progress_bar=ConsoleGtiHubProgress()
)
# Execute the documentation pipeline
mgr.generate_code_file()
if struct_opts.use_global_file:
mgr.generate_global_info(compress_power=4)
mgr.generete_doc_parts(
max_symbols=struct_opts.max_doc_part_size,
with_global_file=struct_opts.use_global_file
)
mgr.factory_generate_doc(DocFactory(*custom_mods))
if struct_opts.include_order:
mgr.order_doc()
additional_modules = []
if struct_opts.include_intro_text:
additional_modules.append(IntroText())
if struct_opts.include_intro_links:
additional_modules.append(IntroLinks())
mgr.factory_generate_doc(
DocFactory(*additional_modules, with_splited=False),
to_start=True
)
mgr.clear_cache()
mgr.save()
# Retrieve the finished documentation
full_text = mgr.doc_info.doc.get_full_doc()
This sequence demonstrates how to instantiate the manager, run all generation steps, and obtain the final documentation string.
__init__ – Construction & Cache Preparation
| Parameter | Type | Role | Notes |
|---|---|---|---|
project_directory |
str |
Root of target repository | Path used for all cache files |
config |
Config |
Parsed autodocconfig.yml |
Provides pbc.log_level, ignore_files, language etc. |
llm_model |
Model |
LLM client | Handles key rotation, request history |
progress_bar |
BaseProgress |
UI progress | Default instance if not supplied |
Steps
- Initialise a new
DocInfoSchemacontainer. - Store
config,project_directory,llm_model,progress_bar. - Initialise a singleton
BaseLoggerand attach aFileLoggerTemplateto a log file under the cache folder. - Create a
.auto_doc_cachefolder if it does not exist.
Note – No network traffic is performed during construction.
Related Configuration
autodocgenerator.engine.exceptions.ModelExhaustedException is the only exception that propagates outside this module, signaling to Manager that the pipeline must terminate gracefully.
The API_KEYS list is sourced from autodocgenerator.config.config and typically contains Groq API keys.
The documentation above is a self‑contained, factual representation of the gpt_model.py and model.py fragments, aligned with the Auto‑Doc Generator’s pipeline and strictly based on the provided source.
Config Reader – Settings Loader
The autodocgenerator.auto_runner.config_reader module is responsible for translating the YAML configuration file (autodocconfig.yml) into runtime objects that drive the documentation pipeline.
StructureSettings
| Property | Type | Default | Notes |
|---|---|---|---|
include_intro_links |
bool |
True |
Whether to inject the IntroLinks module during the doc build. |
include_order |
bool |
True |
Enables post‑processing re‑ordering. |
use_global_file |
bool |
True |
Controls generation of a global‑information file. |
max_doc_part_size |
int |
5_000 |
Maximum token count per chunk. |
include_intro_text |
bool |
True |
Controls injection of a descriptive intro section. |
StructureSettings exposes a load_settings(dict) method that dynamically overwrites defaults from a user‑supplied dictionary.
Assumption: The module does not expose any public API beyond the
read_configfunction and theStructureSettingsclass.
read_config
| Entity | Type | Role | Notes |
|---|---|---|---|
file_data |
str |
YAML source | Raw file contents. |
Config |
autodocgenerator.config.config.Config |
Holds project‑wide settings. | Instantiated and populated. |
ProjectBuildConfig |
ProjectBuildConfig |
Holds build‑specific toggles. | Loaded from build_settings. |
CustomModule / CustomModuleWithOutContext |
BaseModule subclasses |
Custom LLM prompts. | Created from custom_descriptions. |
StructureSettings |
StructureSettings |
Layout flags. | Instantiated and overwritten. |
The function returns a tuple of:
(config: Config, custom_modules: list[BaseModule], structure_settings: StructureSettings)
Internally it:
- Parses
file_datawithyaml.safe_load. - Constructs a
Configand populates ignore patterns, language, project metadata. - Builds a
ProjectBuildConfigfrombuild_settings. - Creates
CustomModuleinstances based on the%marker logic. - Loads any supplied
structure_settings.
Missing: No public functions are exposed beyond
read_config.
Module Summary
- Executes a banner on import.
- Instantiates and configures a global logger for the project.
- Exposes the
loggerinstance for downstream components.
Information not present in the provided fragment: there are no public functions or classes beyond the internal banner routine; the module does not expose any API beyond the logger.
Engine Exceptions – LLM Availability Guard
autodocgenerator.engine.exceptions.ModelExhaustedException signals that all configured Groq/ChatGPT models have been exhausted, and no further requests can be made. This exception propagates up to the caller, typically resulting in a graceful termination of the documentation pipeline.
Summary
These modules collectively translate a YAML configuration into runtime objects, orchestrate the document generation pipeline via Manager and DocFactory, and expose a clean API for the rest of the system to consume. The design relies on explicit data contracts and avoids hidden state, ensuring that each component can be unit‑tested in isolation.
Supporting Module – model.py
History Class
class History:
def __init__(self, system_prompt: str = BASE_SYSTEM_TEXT):
self.history: list[dict[str, str]] = []
if system_prompt is not None:
self.add_to_history("system", system_prompt)
...
- Initializes with the default system prompt.
- Provides
add_to_history(role, content)for appending messages.
Abstract Base
ParentModel abstracts key rotation and history management.
self.api_keysholds the list of API keys.self.regen_models_nameholds the shuffled list of model identifiers.generate_answer,get_answer_without_history,get_answerare abstract and implemented by concrete subclasses.
Synchronous Implementation
GPTModel implements the actual HTTP request to Groq:
chat_completion = self.client.chat.completions.create(
messages=messages,
model=model_name,
)
- The request is wrapped in a
while Trueloop that continues until a success or all models are exhausted. - On failure, it logs a warning, updates indices, and re‑instantiates the Groq client with a new key.
ModelExhaustedExceptionis raised when the model pool is empty.
Side Effect – All logs (
InfoLog,WarningLog,ErrorLog) are routed throughBaseLogger, ensuring uniform traceability across the system.
Run File – Orchestration Entry Point
autodocgenerator.auto_runner.run_file contains the primary driver that ties together all layers of the Auto‑Doc Generator. The core public method is gen_doc.
gen_doc
| Parameter | Type | Role | Notes |
|---|---|---|---|
project_path |
str |
Root of the repository | Target for content discovery. |
config |
Config |
Project configuration | From config_reader. |
custom_modules |
list[BaseModule] |
Custom sections to inject | Provided by read_config. |
structure_settings |
StructureSettings |
Layout toggles | Also from read_config. |
Flow
-
LLM Preparation
sync_model = GPTModel(API_KEYS, use_random=False)
-
Manager Instantiation
manager = Manager( project_path, config=config, llm_model=sync_model, progress_bar=ConsoleGtiHubProgress(), )
-
Repository Walk
manager.generate_code_file()– splits the codebase into manageable chunks.
-
Global Section (Optional)
if structure_settings.use_global_file: manager.generate_global_info(compress_power=4)
-
Document Parts Generation
manager.generete_doc_parts( max_symbols=structure_settings.max_doc_part_size, with_global_file=structure_settings.use_global_file )
-
Custom Module Injection
manager.factory_generate_doc(DocFactory(*custom_modules))
-
Re‑ordering (Optional)
if structure_settings.include_order: manager.order_doc()
-
Intro Modules (conditioned on flags)
additionals_modules = [] if structure_settings.include_intro_text: additionals_modules.append(IntroText()) if structure_settings.include_intro_links: additionals_modules.append(IntroLinks()) manager.factory_generate_doc(DocFactory(*additionals_modules, with_splited=False), to_start=True)
-
Cleanup & Persist
manager.clear_cache() manager.save()
-
Return Value – the assembled markdown string:
return manager.doc_info.doc.get_full_doc()
Return:
str– the complete README content.
The __main__ block simply loads autodocconfig.yml, parses it with read_config, and calls gen_doc on the current directory.
Key Interactions
| Component | Interaction | Outcome |
|---|---|---|
Manager |
generate_code_file → generate_global_info |
Pre‑processing pipeline that creates a cached, compressed representation of the code. |
DocFactory |
factory_generate_doc |
Instantiates and processes each BaseModule, which in turn calls GPTModel.generate_answer. |
GPTModel |
LLM requests | Generates Markdown for each code chunk or module. |
ConsoleGtiHubProgress |
Progress callbacks | UI feedback during long operations. |
CustomModule |
Prompt injection | Allows users to embed arbitrary LLM prompts. |
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
project_path |
str |
Input | Directory to scan. |
config.ignore_files |
list[str] |
Filters | Files/directories excluded from the scan. |
config.language |
str |
LLM context | Determines language of prompts. |
config.project_name |
str |
Metadata | Populated into global information. |
structure_settings.max_doc_part_size |
int |
Chunk size limit | Token cap for each LLM call. |
manager.doc_info.doc |
DocContent |
Resulting document | Exposed via get_full_doc() |
Project Metadata (pyproject.toml)
Purpose
Defines packaging metadata, runtime dependencies, and the build system
configuration for the Auto‑Doc Generator.
| Entity | Type | Role | Notes |
|---|---|---|---|
[project] |
section | PEP 621 metadata | Includes name, version, authors, license, readme, requires‑python |
dependencies |
list | Runtime requirements | Uses pinned versions for stability |
[build-system] |
section | Build backend | Uses poetry‑core to build a wheel |
Key Parameters
| Parameter | Value | Effect |
|---|---|---|
name |
autodocgenerator |
Package name used in PyPI |
version |
1.0.3.3 |
Semantic versioning tag |
requires-python |
>=3.11,<4.0 |
Ensures compatibility with Python 3.11+ |
license.text |
MIT |
Open‑source license |
readme |
README.md |
Primary long‑description source |
dependencies |
extensive list | Includes rich, pyyaml, pydantic, groq, etc. |
build-system.requires |
poetry‑core>=2.0.0 |
Specifies backend required to build the project |
Side Effect
Thepyproject.tomlis consumed by Poetry (or compatible tools) during installation or packaging, automatically pulling the specified dependencies and ensuring the runtime environment matches the configuration.
ProjectSettings (preprocessor.settings)
Core Responsibility
Holds a per‑project prompt template used by compression and other LLM interactions. Allows arbitrary key/value metadata to be inserted into the prompt.
| Method | Role | Notes |
|---|---|---|
__init__(project_name) |
Initializes with project name; starts empty info dict. |
|
add_info(key, value) |
Stores custom metadata. | |
prompt (property) |
Builds a composite prompt string: base template + project name + all info key/value pairs. |
Uses BASE_SETTINGS_PROMPT constant from engine.config.config. |
GPTModel: LLM Request Handler
Role
Acts as the bridge between the Auto‑Doc Generator pipeline and an external Groq‑powered language model.
- Manages a rotating pool of API keys and model names.
- Provides a synchronous API (
generate_answer) used byDocFactoryand higher‑level orchestrators. - Emits detailed logs via
BaseLogger.
Key Interactions
| Called By | Outcome |
|---|---|
DocFactory.factory_generate_doc |
LLM generates Markdown for a code chunk or module. |
Manager.generate_global_info |
Requests auxiliary documentation pieces (e.g., project overview). |
ConsoleGitHubProgress |
Uses the logs generated by GPTModel for UI feedback (implicit through BaseLogger). |
Model Hierarchy & History Context
| Class | Base | Purpose |
|---|---|---|
History |
– | Stores a list of {role, content} objects representing the conversation. |
ParentModel |
ABC |
Holds common state: api_keys, history, shuffling of models_list. |
Model |
ParentModel |
Synchronous implementation of the LLM interface. |
AsyncModel |
ParentModel |
Asynchronous counterpart (currently unimplemented). |
AsyncGPTModel |
AsyncModel |
Stub for future async support. |
Critical Logic
- Constructor – Shuffles
models_listifuse_random=True, initializes indices.- generate_answer – Attempts to call
client.chat.completions.create.- Error Recovery – On exception, rotates to the next API key and/or model until the pool is exhausted.
- Result Handling – Extracts
choices[0].message.content, logs success, returns empty string ifNone.
Progress Interface – autodocgenerator/ui/progress_base.py
| Entity | Type | Role | Notes |
|---|---|---|---|
BaseProgress |
Interface | Defines progress API | Methods are no‑ops or placeholders |
LibProgress |
Rich‑based implementation | Uses rich.progress.Progress to show a main task and optional subtasks |
create_new_subtask(), update_task(), remove_subtask() |
ConsoleTask |
Simple console helper | Prints percentage of a single task | Not thread‑safe, used by GitHub progress |
ConsoleGtiHubProgress |
GitHub‑friendly wrapper | Falls back to console output when Rich is absent | Delegates to ConsoleTask instances |
Key Logic Flow –
update_task()checks for an active subtask; if none, it advances the base task, otherwise advances the current sub‑task.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
data (in compress) |
str |
Input text to be compressed. | Raw Markdown, code snippets, or other free text. |
project_settings |
ProjectSettings |
Provides contextual prompt information. | Includes BASE_SETTINGS_PROMPT and user‑defined metadata. |
model |
Model |
LLM wrapper exposing get_answer_without_history. |
Must be an instance of engine.models.gpt_model.GPTModel or compatible. |
compress_power |
int |
Compression granularity hint. | Influences prompt construction and bucket size in higher‑level functions. |
progress_bar |
BaseProgress |
UI feedback. | Default constructed instance if omitted. |
Data Contract for GPTModel
| Entity | Type | Role | Notes |
|---|---|---|---|
api_key |
list[str] |
Credentials for Groq client | Default: API_KEYS from config.config. |
history |
History |
Conversation history | Contains the system prompt on initialization. |
models_list |
list[str] |
Candidate LLM models | Defaults include gpt-oss-120b, llama‑3.3‑70b‑versatile, gpt-oss-safeguard‑20b. |
use_random |
bool |
Shuffle model list | True by default. |
client |
Groq |
Active Groq client instance | Re‑instantiated when key changes. |
messages |
list[dict[str,str]] |
Chat messages | Either history.history or provided prompt. |
regen_models_name |
list[str] |
Remaining models to try | Updated during error handling. |
current_model_index |
int |
Index in regen_models_name |
Rotated after a failed request. |
current_key_index |
int |
Index in api_keys |
Rotated after a failed request. |
result |
str |
LLM response | Returned to caller; logged at level 2. |
Repository Content Packing – CodeMix
| Entity | Type | Role | Notes |
|---|---|---|---|
root_dir |
Path | Base directory for traversal | Default . resolved to absolute path. |
ignore_patterns |
list[str] | Patterns to skip | Used by should_ignore. |
logger |
BaseLogger |
Logging helper | Emits ignored‑file messages. |
should_ignore(path) |
method | Determines if a file/directory should be excluded | Uses fnmatch against path parts and basename. |
build_repo_content() |
method | Generates a Markdown representation of the repository | Returns a single string. |
Logic Flow
- Append a header “Repository Structure:”.
- Walk the file tree (
rglob("*")). - For each path:
- Skip if
should_ignore(path)isTrue(log at level 1). - Calculate depth and indentation; add a line with either
<dir_name>/or file name.
- Skip if
- Append a separator of equal signs.
- Walk again, this time adding file contents:
- For each file not ignored, write
<file path="relative_path">marker, file text (UTF‑8, ignore errors), then a newline. - Catch read errors and include an error message line.
- For each file not ignored, write
- Return the joined string.
Result – A consolidated Markdown block describing the repository layout and all source file contents.
generate_code_file
| Action | Description | Dependencies |
|---|---|---|
Calls CodeMix(project_directory, config.ignore_files) |
Builds a flattened source string | preprocessor.code_mix.CodeMix |
Stores result in self.doc_info.code_mix |
Centralised repository of raw code | DocInfoSchema.code_mix |
| Updates progress | Signals step completion | BaseProgress.update_task() |
Side‑effect – Emits an
InfoLogentry for start/finish.
Code Splitting & Chunking Logic
while True:
have_to_change = False
for i, el in enumerate(splited_by_files):
if len(el) > max_symbols * 1.5:
splited_by_files.insert(i+1, el[i][int(max_symbols / 2):])
splited_by_files[i] = el[i][:int(max_symbols / 2)]
have_to_change = True
if have_to_change == False:
break
- Purpose – Iteratively bisects any source fragment that exceeds 1.5 × the maximum allowed symbol count (
max_symbols). - Behaviour – The loop terminates once every element in
splited_by_filesis below the threshold. - Side‑effects – Mutates
splited_by_filesin‑place and records progress viaBaseLogger. - Edge – If an element is exactly at the threshold it is not split, ensuring minimal churn.
Key Functions
| Function | Purpose | Parameters | Returns | Notes |
|---|---|---|---|---|
compress(data: str, project_settings: ProjectSettings, model: Model, compress_power) -> str |
Sends a single text block to an LLM for compression using a dynamic prompt. | data: text to compress project_settings: contextual promptsmodel: LLM interface (Model) compress_power: numeric hint for prompt length |
Compressed text string returned by the LLM | Uses get_BASE_COMPRESS_TEXT(len(data), compress_power) to prepend token‑limit instructions. |
compress_and_compare(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) -> list |
Aggregates multiple items into compressed buckets of size compress_power. |
data: list of strings to compress model, project_settings: as above compress_power: how many items per bucket progress_bar: progress feedback |
List of compressed strings; each element represents a bucket | Logs progress via BaseProgress. |
compress_to_one(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) -> str |
Iteratively merges the list returned by compress_and_compare until a single string remains. |
Same as compress_and_compare |
Final single compressed document | Uses a loop that halves the list size; new_compress_power is reduced to 2 when the remaining list is smaller than compress_power+1. |
Compressor Module – autodocgenerator.preprocessor.compressor
Core Responsibility
The compressor module condenses raw source‑code or documentation fragments into a smaller representation suitable for LLM‑based processing. It orchestrates one‑to‑many compression passes, progressively merging chunks until a single compressed document remains.
Compression Logic Flow
-
Prompt Construction
- System messages: project‑specific prompt (
project_settings.prompt) and a size‑aware compression directive (get_BASE_COMPRESS_TEXT). - User message: the raw
datastring.
- System messages: project‑specific prompt (
-
LLM Invocation
model.get_answer_without_history(prompt)is called synchronously; the response is a compressed string.
-
Batch Compression
compress_and_comparegroups incoming items (datalist) into buckets ofcompress_power.- Each bucket’s aggregated content is compressed via
compress, appended with a newline.
-
Recursive Reduction
compress_to_onerepeatedly callscompress_and_compare, reducing the list size until a single string is produced.- When the remaining list length is below
compress_power + 1, the bucket size is lowered to2to ensure convergence.
-
Result
- A single Markdown string that encapsulates the entire repository or documentation section, ready for further post‑processing or file output.
Re‑assembly into Fixed‑Size Parts
curr_index = 0
for el in splited_by_files:
if len(split_objects) - 1 < curr_index:
split_objects.append("")
if len(split_objects[curr_index]) + len(el) > max_symbols * 1.25:
curr_index += 1
split_objects.append(el)
continue
split_objects[curr_index] += "\n" + el
- Goal – Concatenate split fragments into
split_objectssuch that each accumulated string stays within 1.25 ×max_symbols. - Result – Returns a list of strings, each a “clean” chunk ready for LLM consumption.
- Logging –
BaseLoggerreports the final chunk count. ` tag and contain no file paths, extensions, or generic terms.
- Send to the LLM.
- Return the raw answer.
Result – A formatted, tag‑prefixed description suitable for insertion into documentation sections.
DocFactory – Orchestrating Module Execution
class DocFactory:
def generate_doc(self, info: dict, model: Model, progress: BaseProgress) -> DocHeadSchema:
- Purpose – Sequentially runs each
BaseModule, splits the result on anchor markers, and aggregatesDocHeadSchema. - Workflow
- Create a sub‑task counter in
progress. - For every module, call
module.generate(info, model). - If
with_splitedisTrue, split the returned string usingsplit_text_by_anchorsand add each fragment todoc_headwith its key. - Log at level 1 (module finished) and level 2 (raw output).
- Increment progress, remove sub‑task after all modules finish.
- Create a sub‑task counter in
| Entity | Type | Role | Notes |
|---|---|---|---|
info |
dict |
Repository metadata | Provided by Manager; keys: code_mix, full_data, global_info, language, … |
model |
Model |
LLM wrapper | Handles key rotation, history, and HTTP calls. |
progress |
BaseProgress |
UI progress | Tracks sub‑task count and updates. |
doc_head |
DocHeadSchema |
Result container | Holds named DocContent entries. |
Documentation Data Structures
class DocContent(BaseModel):
content: str
- Holds a raw markdown fragment.
class DocHeadSchema(BaseModel):
content_orders: list[str] = []
parts: dict[str, DocContent] = {}
def add_parts(self, name, content: DocContent):
...
def get_full_doc(self, split_el: str = "\n") -> str:
...
def __add__(self, other: "DocHeadSchema") -> "DocHeadSchema":
...
- Ordering –
content_orderspreserves insertion order for deterministic rendering. - Merging –
__add__concatenates two schemas, ensuring no key clashes by renaming.
class DocInfoSchema(BaseModel):
global_info: str = ""
code_mix: str = ""
doc: DocHeadSchema = Field(default_factory=DocHeadSchema)
- Aggregates the global metadata, source mix, and generated documentation.
Warning – All schemas derive from pydantic and are serialisable via
dict(). No custom validation beyond field types.
generete_custom_discription(splited_data: str, model: Model, custom_description: str, language: str = "en") → str
| Entity | Type | Role | Notes |
|---|---|---|---|
splited_data |
str |
Iterable of code snippets | The function iterates over it, treating each element as a chunk (likely intended to be a list). |
model |
Model |
LLM wrapper | get_answer_without_history. |
custom_description |
str |
Prompt text | Task description for the LLM. |
language |
str |
Target language | System instruction. |
result |
str |
Accumulated answer | Returned when a valid response is found. |
Logic Flow
- Loop over
splited_data. - For each
sp_data, build a prompt with context,BASE_CUSTOM_DISCRIPTIONS, and the task. - Query the LLM.
- If the answer does not contain
!noinfoor “No information found”, or if!noinfooccurs past 30 chars, break the loop. - Return the first satisfactory
result.
Result – A concise description generated for the supplied custom task, or an empty string if no info is available.
generete_custom_discription_without(model: Model, custom_description: str, language: str = "en") → str
| Entity | Type | Role | Notes |
|---|---|---|---|
model |
Model |
LLM wrapper | Calls get_answer_without_history. |
custom_description |
str |
Prompt text | Task description for the LLM. |
language |
str |
Target language | System instruction. |
Logic Flow
gen_doc_parts – Orchestrator
| Entity | Type | Role | Notes |
|---|---|---|---|
full_code_mix |
str |
All source text | Input to be split. |
max_symbols |
int |
Size threshold | Determines chunk boundaries. |
model |
Model |
LLM wrapper | Same contract as above. |
project_settings |
ProjectSettings |
Configuration holder | Provides prompt. |
language |
str |
Target language | |
progress_bar |
BaseProgress |
UI feedback | Can be a concrete implementation. |
global_info |
`str | None` | Repository‑wide metadata |
Workflow
- Split –
split_data(full_code_mix, max_symbols)produces a list of chunks. - Progress –
create_new_subtasktracks number of chunks. - Iterate – For every chunk:
- Call
write_docs_by_parts. - Append result to
all_result. - Keep only the last 3 k characters of the current result for context in the next call (
result = result[len(result) - 3000:]). - Update progress.
- Call
- Finalize – Remove subtask, log total length, and return
all_result.
Edge Cases
generate_global_info
| Parameter | Default | Role |
|---|---|---|
compress_power |
4 |
Compression aggressiveness |
max_symbols |
10000 |
Token‑size of initial chunk |
Flow
split_data(full_code_mix, max_symbols)– chunk the code mix.compress_to_one– feeds chunks tollm_modelwith project settings; returns a single‑string global doc fragment.- Persist fragment to
global_info.md. - Store in
self.doc_info.global_info. - Log and advance progress.
Note –
compress_powercontrols how many top‑level sections the compressor keeps.
generete_doc_parts
| Parameter | Default | Role |
|---|---|---|
max_symbols |
5_000 |
Max chunk size for part generation |
with_global_file |
False |
Whether to prepend global content |
Sequence
- Read the cached
global_infofile (ignores passedwith_global_file). - Call
gen_doc_partswith:full_code_mixmax_symbolsllm_model- project settings from
config - language from
config - progress bar
global_infopayload.
- Persist raw output to
output_doc.md. - Split output into sections by anchors (
split_text_by_anchors). - Store each section into
self.doc_info.doc(DocContent).
Output – The fully stitched Markdown string, later refined by post‑processors.
write_docs_by_parts – One‑Chunk LLM Pass
| Entity | Type | Role | Notes |
|---|---|---|---|
part |
str |
Raw code fragment | Must be a single chunk from gen_doc_parts. |
model |
Model |
LLM wrapper | Exposes get_answer_without_history. |
project_settings |
ProjectSettings |
Configuration holder | Supplies prompt and other constants. |
prev_info |
`str | None` | Last LLM output |
language |
str |
Target language | e.g., "en". |
global_info |
`str | None` | Repository‑wide metadata |
Prompt Assembly
- System messages – three entries:
- Language directive (
"For the following task use language {language}"). - Global project metadata (
project_settings.prompt). - Pre‑defined completion template (
BASE_PART_COMPLITE_TEXT).
- Language directive (
- Optional system messages – appended when
global_infoorprev_infoexist. - User message – the actual
partof source code.
Important – The LLM receives no history; each chunk is independent except for
prev_info, which is supplied as a system prompt.
Response Handling
answer: str = model.get_answer_without_history(prompt=prompt)
temp_answer = answer.removeprefix("```")
- Removes leading triple backticks that some LLMs prepend.
- If the response is identical to
temp_answer, the function returns it directly. - Otherwise, trailing ``` fences are stripped and the cleaned string is returned.
factory_generate_doc
| Parameter | Type | Role |
|---|---|---|
doc_factory |
DocFactory |
Provides a list of BaseModule instances |
to_start |
bool |
Determines whether to prepend or append generated content |
Workflow
- Compose a context dictionary:
info = {
"language": config.language,
"full_data": curr_doc, # existing markdown
"code_mix": self.doc_info.code_mix,
"global_info": self.doc_info.global_info
}
- Log the module names and input keys.
- Invoke
doc_factory.generate_doc(info, llm_model, progress_bar). - Merge
resultwithself.doc_info.doceither at start or end. - Increment progress.
Key point –
DocFactoryinternally iterates over itsBaseModulechildren, each of which calls the LLM viagenerete_custom_discriptionor similar functions.
order_doc
| Action | Description |
|---|---|
Calls get_order |
Reorders self.doc_info.doc.content_orders according to LLM‑generated suggestions. |
Result – Final section ordering is stored in
doc_info.doc.
Cross‑Module Interactions
sorting.pyimportsModelfromengine.models.modeland logging classes fromui.logging.extract_links_from_startandsplit_text_by_anchorswork together to convert a raw Markdown document into an anchor‑based mapping.get_orderrelies on aModelimplementation that implementsget_answer_without_history; no internal state is mutated.CodeMixusesfnmatchfor ignore logic and logs viaBaseLogger; it is independent of any LLM components.- All modules expose purely functional logic; any persistence or higher‑level orchestration occurs elsewhere in the Auto‑Doc Generator pipeline.
Cross‑Module Interactions
| Component | Interaction |
|---|---|
compress → engine.models.model.Model |
Calls get_answer_without_history(prompt) to obtain LLM output. |
compress → engine.config.config.get_BASE_COMPRESS_TEXT |
Generates a system instruction based on input length and compression power. |
compress_and_compare → BaseProgress |
Creates a sub‑task and updates it per iteration. |
compress_to_one → compress_and_compare |
Re‑uses it to progressively merge data. |
ProjectSettings → compress / compress_and_compare |
Supplies project_settings.prompt for system messages. |
Note – All imports are explicit; no implicit external library usage beyond those declared.
Cross‑Module Interactions
- Uses
BASE_INTRODACTION_CREATE_LINKS,BASE_INTRO_CREATE,BASE_CUSTOM_DISCRIPTIONSfromengine.config.config. - Relies on
GPTModel(subclass ofModel) to perform LLM calls. - Logs via
BaseLogger/InfoLog; no error handling is performed—exceptions bubble to the caller. - Functions are pure helpers; no state is held within the module.
Observations
generete_custom_discriptioniterates over astr, which likely is an unintended bug if a list of strings is expected.- All functions return raw LLM responses; downstream code is responsible for formatting and integration.
- Logging verbosity can be adjusted through
InfoLoglevel parameter.
Default Ignore List
| Pattern | Effect |
|---|---|
*.pyo, *.pyd, *.pdb, *.pkl, *.log, *.sqlite3, *.db |
Binary and log files |
venv, env, .venv, .env, .vscode, .idea, *.iml, .gitignore, .ruff_cache |
Virtual environments, IDE files, caching |
*.pyc, __pycache__, .git, .coverage, htmlcov, migrations, *.md, static, staticfiles, .mypy_cache |
Compiled Python, CI artifacts, markdown, static assets, type‑check cache |
Usage – Passed to
CodeMixconstructor to exclude unwanted files from the generated content.
Semantic Ordering via LLM
| Entity | Type | Role | Notes |
|---|---|---|---|
model |
Model |
LLM wrapper | Must expose get_answer_without_history. |
chanks |
list[str] | Section titles to reorder | Passed directly into the prompt. |
logger |
BaseLogger |
Logging helper | Records start/end of ordering. |
prompt |
list[dict] | Messages sent to the LLM | Contains a user role prompt that instructs the model to return a comma‑separated list. |
result |
str | Raw LLM output | Returned by get_answer_without_history. |
new_result |
list[str] | Cleaned, ordered titles | Result of splitting and trimming result. |
Logic Flow
- Log the start of ordering and the titles to process.
- Build a single user message asking the LLM to sort the titles semantically, keeping
#prefixes and not adding explanatory text. - Call
model.get_answer_without_history(prompt). - Split the returned string on commas, strip whitespace, and store in
new_result. - Log the final list and return it.
Result – A list of titles in a LLM‑determined semantic order.
Anchor Extraction Logic
| Entity | Type | Role | Notes |
|---|---|---|---|
extract_links_from_start(chunks) |
function | Parses a list of Markdown‑section strings for a leading <a name="..."></a> tag. |
Returns a list of anchor links (prefixed with #) and a boolean indicating whether the first chunk should be discarded. |
links |
list[str] | Collected anchor names | Each anchor name longer than 5 characters is considered valid. |
have_to_del_first |
bool | Flag for removal of the first chunk | If any chunk lacks a valid anchor, the first chunk is marked for deletion. |
Logic Flow
- Iterate over
chunks. - For each
chunk, strip whitespace and search for the regex pattern^<a name=["\']?(.*?)["\']?>. - If a match is found and the captured name exceeds five characters, append
#<name>tolinks. - If no match exists for a chunk, set
have_to_del_firsttoTrue. - Return
(links, have_to_del_first).
Result – A tuple used by
split_text_by_anchorsto identify and manage anchor boundaries.
Text Splitting by Anchors
| Entity | Type | Role | Notes |
|---|---|---|---|
text |
str | Raw README content | Expected to contain <a name="..."></a> anchors. |
chunks |
list[str] | Sub‑strings separated by the anchor regex | Derived via re.split. |
result_chanks |
list[str] | Cleaned, non‑empty chunks | Trims whitespace. |
all_links, have_to_del_first |
tuple | Result from extract_links_from_start |
Determines if the first chunk must be removed. |
result |
dict[str, str] | Mapping of anchor link → section text | Returned by the function. |
Logic Flow
- Split
textat every anchor point using(?=<a name=...)(look‑ahead). - Trim whitespace from each resulting chunk.
- Invoke
extract_links_from_starton the cleaned chunks. - Detect if the overall file starts with an anchor or the first chunk should be dropped; if so,
pop(0). - Verify that the number of links matches the number of remaining chunks; otherwise raise an exception.
- Build a dictionary mapping each
#anchorto its corresponding section text and return it.
Result – A deterministic mapping of anchors to their associated Markdown sections.
File Path Helpers
| Method | Purpose | Notes |
|---|---|---|
get_file_path(file_key) |
Resolve absolute cache file path | Uses FILE_NAMES mapping |
read_file_by_file_key(file_key) |
Load cached content | Returns None on failure |
clear_cache
If the configuration flag pbc.save_logs is False, the method deletes the cached log file.
save
Writes the fully assembled documentation (self.doc_info.doc.get_full_doc()) to output_doc.md in the cache directory.
Error Handling
read_file_by_file_keyswallows all exceptions and returnsNone.- All LLM calls inside other methods propagate
ModelExhaustedExceptionif no key remains. - No explicit
try/exceptaround network or file operations beyond the minimal wrapper, keeping responsibility at the caller level.
Summary of Manager Responsibilities
- Cache Management – Creates and cleans
.auto_doc_cache. - Source Aggregation – Produces a single
code_mixstring from the repo. - Global Compression – Condenses the entire code mix into one Markdown snippet.
- Chunked Generation – Breaks the mix into manageable parts, queries the LLM, stitches results.
- Factory‑based Expansion – Allows plug‑in modules to add or modify sections.
- Ordering & Persistence – Orders sections and writes the final
README.md.
All interactions are strictly local or via the provided Model and DocFactory interfaces; no external services are invoked outside of the LLM wrapper.
Edge Cases & Error Handling
| Scenario | Current Behavior | Potential Issue |
|---|---|---|
data contains more than compress_power items |
Processed in multiple passes | None |
| LLM returns empty string | compress returns an empty string |
Missing documentation content |
progress_bar is the base class (no UI) |
Operations run silently | UI feedback unavailable |
model.get_answer_without_history raises an exception |
Propagates upwards | No retry logic implemented |
autodocgenerator.postprocessor.custom_intro – Module Overview
This module provides helper utilities for enriching the generated Markdown with hyperlinks, introductory text, and custom‑section descriptions. All LLM interactions are delegated to a Model instance passed in as an argument. Logging is performed via the singleton BaseLogger.
Summary
This fragment delivers a logging singleton and progress reporting utilities, together with a PowerShell bootstrapper that scaffolds GitHub Actions and configuration files for the Auto‑Doc Generator. All classes are lightweight, rely only on the standard library (plus rich for CLI progress), and expose a consistent API for the rest of the pipeline.
Summary
The provided fragment implements the chunk‑based documentation pipeline:
- Chunking – Splits raw source into size‑bounded parts, ensuring no individual chunk exceeds a token‑like limit.
- LLM Pass – Each part is sent to a configured GPT model (
Model) with a rich system prompt derived fromProjectSettingsand optional contextual messages. - Aggregation – Results are concatenated, trimmed, and returned as a single markdown string.
- Schema – Generated fragments are wrapped in
DocHeadSchema/DocInfoSchemafor later assembly.
All interactions are pure except for logger and progress updates, keeping the core logic deterministic and testable.
Example Call Sequence
# Inside Manager.generate_doc_parts
gpt = GPTModel()
answer = gpt.generate_answer(
with_history=True,
prompt=[{"role":"user","content":"Explain this function"}]
)
generate_answerpulls the current conversation history.- Iterates over
regen_models_nameandapi_keys. - On success, returns Markdown string; on total exhaustion, propagates
ModelExhaustedException.
Constraints & Observations
- Async Support –
AsyncGPTModelis a placeholder; the asynchronous logic remains unimplemented. - Error Handling – Generic
Exceptionis caught; specific Groq errors are not distinguished. - Logging Level –
Answer: {result}is logged at level 2; consumers can tune verbosity viaBaseLogger.
Observations & Edge Cases
extract_links_from_startassumes the anchor appears at the start of a chunk; any deviation may lead tohave_to_del_firstbeingTrue.split_text_by_anchorsraises a genericExceptionif anchor–chunk counts mismatch. No recovery strategy is included.get_orderexpects the LLM to honor the instruction “leave # in title”; malformed output will be included as‑is.CodeMix.build_repo_contentwrites a literal newline"\n\n"after each file block; if a file contains this pattern, duplication may occur.- All logging levels are set to
level=1or default; higher granularity is not provided in the snippets.
The file defines several top‑level keys:
-
project_name – the title of the documentation set.
-
language – the language to use for generated text.
-
ignore_files – a list of glob patterns that will be skipped by the generator. Typical values include cache directories, byte‑code, virtual env folders, database files, logs, git artefacts, IDE folders and markdown files.
-
build_settings – controls the build process:
- save_logs – Boolean to keep or discard the log file.
- log_level – numeric verbosity (e.g., 2).
-
structure_settings – governs the layout of the output:
- include_intro_links – add hyperlinks to the introduction.
- include_intro_text – add explanatory introductory text.
- include_order – maintain a defined order for sections.
- use_global_file – whether to pull content from a shared file.
- max_doc_part_size – maximum character count per document chunk (here 5000).
-
project_additional_info – free‑form fields, such as a project “global idea” description.
-
custom_descriptions – a list of template strings that can contain placeholders and are inserted into the final documentation. The example items illustrate installing the generator, describing configuration options, and using the Manager class.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autodocgenerator-1.0.3.3.tar.gz.
File metadata
- Download URL: autodocgenerator-1.0.3.3.tar.gz
- Upload date:
- Size: 60.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
947afb68afb13560b982d04332dfff14cad1b17355d3b212ec161dd52e0454e2
|
|
| MD5 |
9f68f9d7a947531f14cd7a17bb299ed1
|
|
| BLAKE2b-256 |
64672d68212ca6862fb4dbe8373e8c728fa872d24ae9041c2442bac3b7a94569
|
File details
Details for the file autodocgenerator-1.0.3.3-py3-none-any.whl.
File metadata
- Download URL: autodocgenerator-1.0.3.3-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f92dc8688b8b23af211d7e358a39972247e6c3e470c861d6ca6db082aa4047e4
|
|
| MD5 |
449c633043f06a3599d5c40255036bd3
|
|
| BLAKE2b-256 |
08ec55693bc7aea4ad9093873324677189d2c3e94dd8cb6742619bf21ddbc468
|