This Project helps you to create docs for your projects
Project description
Auto Doc Generator – Project Overview
1. Project Title
Auto Doc Generator
2. Project Goal
The purpose of Auto Doc Generator is to relieve developers from the repetitive, manual work of writing project documentation.
Given a repository and a tiny autodocconfig.yml file, the tool automatically extracts source‑code, creates a concise high‑level summary, splits the material into LLM‑friendly chunks, asks a large language model to produce markdown fragments, and finally assembles a polished README.md.
In short, it turns a raw codebase into a ready‑to‑publish documentation file with zero human‑written prose.
3. Core Logic & Principles
| Phase | What Happens | Main Classes / Modules |
|---|---|---|
| Configuration | The CLI reads autodocconfig.yml. The parser builds a immutable Config object, a collection of CustomModule definitions, and a StructureSettings object that governs chunk size, ordering, and intro sections. |
autodocgenerator.auto_runner.config_reader, autodocgenerator.config.config.Config |
| Pre‑processing | 1. CodeMix walks the repository, respects ignore patterns, and writes a single repo‑mix file that contains the directory tree and raw source. 2. Compressor sends the mix to the LLM (via GPTModel / AsyncGPTModel) and receives a compact project‑wide summary. 3. Spliter breaks the summary (or the raw mix) into chunks that respect the max_symbols limit configured by the user. |
preprocessor.code_mix, preprocessor.compressor, preprocessor.spliter |
| LLM Generation | For every chunk a prompt is built from the global ProjectSettings.prompt (which already embeds language, project name, etc.) and the chunk’s content. The prompt is sent to the LLM wrapper; the response is a markdown fragment. Custom modules defined in the config are also processed at this stage, allowing users to inject bespoke sections that are still rendered by the LLM. |
engine.models.GPTModel, engine.models.AsyncGPTModel, DocFactory, CustomModule, CustomModuleWithOutContext |
| Post‑processing | The generated fragments are concatenated into a temporary output_doc.md. Anchor tags (<a name="…"></a>) are extracted, then a second LLM call determines the semantically optimal ordering of those sections (or respects a user‑provided order). Static intro fragments (IntroLinks, IntroText) are prepended, and the final markdown is written to README.md. |
postprocessor.sorting, postprocessor.custom_intro, IntroLinks, IntroText |
| Orchestration & UI | Manager coordinates every step, keeping an internal cache (.auto_doc_cache) that stores intermediate files (code mix, global summary, per‑chunk docs). A progress bar (ConsoleGtiHubProgress) and a global logger (ui.logging) give real‑time feedback, especially useful in CI pipelines. |
autodocgenerator.auto_runner.run_file, Manager, ConsoleGtiHubProgress, BaseLogger |
| Error handling | If the list of LLM models is exhausted, a ModelExhaustedException bubbles up to the CLI, which exits with a clear message. Shared History and ParentModel objects allow fallback to alternative models without losing context. |
ModelExhaustedException, History, ParentModel |
Key Architectural Principles
- Pipeline‑first design – each stage receives a well‑defined artifact, transforms it, and passes it downstream.
- Configuration‑driven – all behaviour (ignore patterns, language, chunk size, custom sections) lives in a single YAML file; the code itself never hard‑codes project specifics.
- Stateless LLM wrappers –
GPTModelandAsyncGPTModelexpose a single method (get_answer_without_history) that receives a prompt and returns a response, keeping the model layer thin and replaceable. - Cache‑based intermediate storage – the
.auto_doc_cachedirectory guarantees that a failure in a later stage does not require re‑running the entire pipeline. - Extensibility via Custom Modules – users can drop a Python file that implements a
processmethod; the factory will call it, letting the LLM enrich the custom text.
4. Key Features
- One‑command generation –
python -m autodocgenerator.auto_runner.run_filelaunches the full pipeline. - YAML‑based configuration –
autodocconfig.ymldefines ignore patterns, project language, chunk size, ordering preferences, and custom modules. - Automatic code extraction – walks the repository, filters by patterns, and produces a unified source view (
code_mix.txt). - LLM‑powered summarisation – compresses the entire codebase into a concise global description.
- Chunked processing – splits large inputs into token‑safe pieces, guaranteeing that every LLM request stays within model limits.
- Customizable sections – users can inject arbitrary prose (e.g., “Installation”, “Contribution Guidelines”) that the LLM formats automatically.
- Semantic re‑ordering – after generation, anchors are extracted and a second LLM call decides the most logical section order.
- Progress reporting & logging – console‑based progress bar and structured logs help debug and monitor CI runs.
- Cache persistence – intermediate files (
code_mix.txt,global_info.md,report.txt,output_doc.md) survive crashes, allowing a quick resume. - Async support –
AsyncGPTModelenables concurrent LLM calls for large projects, reducing overall runtime. - Graceful fallback – if a model becomes unavailable,
ModelExhaustedExceptiontriggers a clean shutdown with a helpful error message.
5. Dependencies
| Dependency | Purpose | Minimum Version |
|---|---|---|
| Python | Runtime language | 3.9+ |
| groq (or any Groq‑compatible client) | Communicates with the Groq LLM endpoint | 0.1.0 |
| PyYAML | Parses autodocconfig.yml |
6.0 |
| tqdm (or similar) | Displays progress bars in the console | 4.65 |
| rich (optional) | Fancy logging/pretty console output | 13.0 |
| aiohttp (optional) | Asynchronous HTTP calls for AsyncGPTModel |
3.9 |
click (or built‑in argparse) |
CLI argument handling (if used) | 8.1 |
| pathspec | Advanced file‑ignore pattern matching (git‑style) | 0.11 |
| pytest (dev) | Test suite | 7.0 |
| black / isort / flake8 (dev) | Code formatting and linting | – |
All external libraries are listed in requirements.txt and are installed via pip install -r requirements.txt.
In summary, Auto Doc Generator is a fully‑configurable, pipeline‑oriented Python tool that leverages LLMs (through the Groq API) to turn any code repository into a high‑quality README.md. Its modular design, clear separation of concerns, and rich extensibility make it suitable for both individual developers and automated CI/CD environments.
Executive Navigation Tree
- 📂 Setup
- ⚙️ Configuration
- 📄 Documentation
- 📂 Modules
- ⚙️ Execution
- 🤖 Models
- 📊 Data
- 🛠️ Logging & Progress
Overview
To set up the automated installation workflow you need to execute a PowerShell installer on Windows platforms and a Bash installer on Linux‑based platforms. The workflow also requires a secret named GROCK_API_KEY in the repository’s GitHub Actions settings, populated with the API key obtained from the Grock documentation site.
Steps for Windows (PowerShell)
-
Open PowerShell with administrative privileges.
-
Run the following one‑liner, which fetches the installer script directly from the repository and executes it in the current session:
irm https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.ps1 | iex
irm(alias forInvoke-RestMethod) downloads the script.- The pipeline (
|) passes the script content toiex(Invoke-Expression) for immediate execution.
-
Follow any prompts shown by the installer to complete the setup.
Steps for Linux/macOS (Bash)
-
Open a terminal.
-
Execute the following command, which streams the installer script from the repository into the Bash interpreter:
curl -sSL https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.sh | bash
curl -sSLsilently follows redirects and outputs the script.- The pipe sends the script to
bashfor execution.
-
Respond to any interactive questions the script may ask.
Adding the Required Secret to GitHub Actions
- In your GitHub repository, navigate to Settings → Secrets and variables → Actions.
- Click New repository secret.
- Set the Name to
GROCK_API_KEY. - Retrieve your API key from the Grock documentation site at
https://grockdocs.com. - Paste the key into the Value field and save.
Result
With the installer executed on the appropriate platform and the GROCK_API_KEY secret stored, any GitHub Actions workflow that references this secret will be able to communicate with the Grock service and complete the automated deployment or build process.
StructureSettings – Configuration Container
| Entity | Type | Role | Notes |
|---|---|---|---|
include_intro_links |
bool | Toggle inclusion of link section | Default True |
include_order |
bool | Enable semantic re‑ordering of doc parts | Default True |
use_global_file |
bool | Generate a global summary file | Default True |
max_doc_part_size |
int | Max characters per LLM chunk | Default 5 000 |
include_intro_text |
bool | Add introductory text module | Default True |
load_settings |
method | Overwrites attributes from a dict | Mutates instance |
Note: Only keys present in the supplied dict are altered; missing keys retain defaults.
read_config – YAML Parser
Purpose: Convert raw autodocconfig.yml content into three runtime objects: Config, a list of CustomModule instances, and a StructureSettings instance.
Logic Flow
yaml.safe_load→ Python dictdata.- Instantiate
Config(). - Extract
ignore_files,language,project_name,project_additional_info. - Build
ProjectBuildConfig→load_settings(project_settings). - Populate
Configvia fluent setters (set_language,set_project_name,set_pcs). - Append each ignore pattern to
Config.ignore_files. - Populate additional project info via
add_project_additional_info. - Translate
custom_descriptionsinto module objects:- Prefix “%” →
CustomModuleWithOutContext(custom[1:]) - Otherwise →
CustomModule(custom).
- Prefix “%” →
- Create
StructureSettings()and applyload_settings(structure_settings). - Return
(config, custom_modules, structure_settings_object).
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
file_data |
str |
Raw YAML text | Must be UTF‑8 encoded |
| Return tuple | (Config, list[CustomModule], StructureSettings) |
Packaged configuration | All objects are mutable after creation |
Config & ProjectBuildConfig – Core Settings Objects
| Entity | Type | Role | Notes |
|---|---|---|---|
ProjectBuildConfig |
class | Holds build‑time flags (save_logs, log_level) |
Loaded via load_settings. |
Config |
class | Aggregates ignore patterns, language, project name, additional info, and a ProjectBuildConfig instance |
Provides fluent setters and get_project_settings() for downstream use. |
All interactions are strictly defined by the code; no external library behavior is assumed beyond yaml.safe_load.
settings.py – Project Prompt Builder
Responsibility – Constructs the system prompt injected into every LLM call, aggregating static base prompt with project‑specific key/value pairs.
Interactions
- Imported by
compressor.pyandspliter.py. - Uses constant
BASE_SETTINGS_PROMPTfromengine.config.
Logic Flow
ProjectSettings.__init__(project_name)stores the name and an emptyinfodict.add_info(key, value)populatesinfo.promptproperty concatenatesBASE_SETTINGS_PROMPT, the project name line, and eachinfoentry as"key: value"lines.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
project_name |
str |
Identifier of the target repository | Supplied by CLI config |
info |
dict[str, str] |
Additional metadata (e.g., language, ignore patterns) | Filled via add_info |
prompt |
str (property) |
Full system prompt for LLM | Combines base prompt and dynamic info |
pyproject.toml – Project Metadata & Build Configuration
Responsibility – Supplies static declarative configuration for the Auto Doc Generator package. The file is consumed by Poetry, pip, and runtime tools (e.g., importlib.metadata) to:
- Register the distribution (
name,version,description). - Declare authorship, licensing, and the README target.
- Constrain the supported Python interpreter (
requires‑python). - List runtime dependencies required for code execution.
- Define the build system (
poetry‑core) used to generate a wheel.
Technical Logic Flow
- Poetry parses the TOML document → builds an internal
Projectmodel. - The model populates
metadatafields (used forsetup.cfg‑like output). - Dependency strings are resolved against the current Python environment → lock file (
poetry.lock). - During
pip install .the same parser supplies the same values tosetuptools‑compatible hooks. - At runtime
importlib.metadata.metadata("autodocgenerator")reads the generatedMETADATAfile, which mirrors the entries defined here.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
name |
str |
Distribution identifier | "autodocgenerator" – must be unique on PyPI. |
version |
str |
Semantic version | "0.9.2.8" – follows PEP 440. |
description |
str |
Human‑readable short summary | Used by package indexes. |
authors |
list[dict] |
Contributor contact data | Each dict contains name & email. |
license |
dict |
SPDX‑compatible license info | text = "MIT". |
readme |
str |
Path to long description file | "README.md". |
requires-python |
str |
Interpreter constraint | ">=3.11,<4.0". |
dependencies |
list[str] |
Runtime requirement specifications | Exact pins (e.g., rich==14.2.0). |
build-system.requires |
list[str] |
Packages needed for building the wheel | ["poetry-core>=2.0.0"]. |
build-system.build-backend |
str |
Entry point for the build backend | "poetry.core.masonry.api" |
⚠️ No executable code lives in this file; it is pure data. Any change requires a new build/release to take effect.
Visible Interactions
- Package managers (
poetry,pip) read the file to resolve the dependency graph. - CI pipelines may parse
dependenciesto cache wheels. - Runtime introspection (
importlib.metadata) surfaces the declared metadata to the application (e.g.,__version__helpers).
All other project modules reference this configuration indirectly via the packaging tools; the file itself holds no mutable state. ` tag; the raw answer is returned.
Assumption: All
Modelinstances correctly implementget_answer_without_history; any failure propagates as an exception.
Data Contract
project_name: "Your Project Title"
Follow it with the programming language used:
language: "en"
To exclude files and directories from documentation, list them under ignore_files:
ignore_files:
- "dist"
- "*.pyc"
- "__pycache__"
- "venv"
- ".git"
- "*.md"
# add any other patterns you want to skip
Control the generation process with build_settings. Available keys:
- save_logs – set to
trueto keep log files,falseto discard them. - log_level – numeric value (e.g.,
1for minimal,2for normal,3for verbose).
build_settings:
save_logs: false
log_level: 2
Define the structure of the output using structure_settings. Options:
- include_intro_links –
trueto add navigation links at the start. - include_intro_text –
trueto include introductory paragraph. - include_order –
trueto keep sections in the order they appear in the source. - use_global_file –
trueto place shared information in a single section. - max_doc_part_size – maximum characters per generated part (e.g.,
5000).
structure_settings:
include_intro_links: true
include_intro_text: true
include_order: true
use_global_file: true
max_doc_part_size: 5000
Add any project‑wide description under project_additional_info:
project_additional_info:
global idea: "Brief description of the project's purpose."
Finally, provide custom prompts for the generator in custom_descriptions. Each entry is a free‑form string describing a documentation task:
custom_descriptions:
- "Explain how to install workflow with install scripts for Windows and Linux."
- "Explain how to write this YAML file and list available options."
- "Explain how to use the Manager class with code examples."
Combine all sections in a single YAML document, respecting proper indentation, to guide the documentation generator.
BaseModule – Abstract LLM Module Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
generate |
def(info: dict, model: Model) -> str |
Must return a markdown fragment produced by the supplied LLM model. | Abstract; concrete subclasses implement the call. |
Assumption: All subclasses treat
infoas a key‑value bag supplied by the pipeline and usemodelfor any LLM request.
Custom Description Modules
| Class | Purpose | LLM Entry Point |
|---|---|---|
CustomModule |
Wraps a user‑provided description and processes the repository mix. | Calls generete_custom_discription(split_data(...), model, self.discription, language). |
CustomModuleWithOutContext |
Same description but without code context. | Calls generete_custom_discription_without(model, self.discription, language). |
Both receive info (expects keys code_mix, language) and forward the model to the post‑processor helpers, which issue the actual LLM request.
Introductory Modules
| Class | Purpose | LLM Entry Point |
|---|---|---|
IntroLinks |
Extracts HTML links from full_data and generates a link block. |
Calls get_links_intro(links, model, language). |
IntroText |
Generates an introductory paragraph from global_data. |
Calls get_introdaction(global_data, model, language). |
Each module reads specific keys from info (full_data, global_data, language) and passes them with the shared model to post‑processor functions that perform the LLM interaction.
Custom Intro Module – Link & Description Generation
Purpose – Produces markdown introductions and link lists for the documentation using LLM calls. All functions are pure utilities; they do not modify repository files directly.
Visible Interactions
autodocgenerator.__init__ – Startup Banner & Global Logger
| Entity | Type | Role | Notes |
|---|---|---|---|
_print_welcome |
function | Emits a coloured ASCII banner and status line when the package is imported. | Uses inline ANSI escape codes; no external dependencies. |
BLUE, BOLD, CYAN, RESET |
local str |
Colour/formatting tokens for the banner. | Defined inside the function; scoped to _print_welcome. |
ascii_logo |
local str |
Multiline string containing the project logo. | Interpolated with colour tokens. |
logger |
BaseLogger instance |
Centralised logger for the library. | Instantiated after the banner; configured with BaseLoggerTemplate. |
BaseLogger, BaseLoggerTemplate, InfoLog, ErrorLog, WarningLog |
imports | Logging utilities re‑exported at package level. | Imported from autodocgenerator.ui.logging. |
Critical assumption: The banner is printed every time the package is imported; this side‑effect is intentional for user feedback.
Immediate Execution & Exported Symbols
- After defining
_print_welcome, the module invokes it (_print_welcome()), ensuring the banner appears on import. - The module then re‑exports logging classes and creates a module‑level logger:
from .ui.logging import BaseLogger, BaseLoggerTemplate, InfoLog, ErrorLog, WarningLog logger = BaseLogger() logger.set_logger(BaseLoggerTemplate())
This makesloggeravailable to any sub‑module that importsautodocgenerator.
Side‑effects:
- Terminal output on import.
- Global
loggerinstance ready for use throughout the package.
Warning: If the package is imported in a non‑interactive context (e.g., CI without a tty), the ANSI codes may appear as raw escape sequences. Adjust environment or suppress import side‑effects if undesirable.
compressor.py – LLM‑Based Text Compression
Responsibility – Reduces raw code‑mix fragments to compact summaries using the configured LLM model.
Interactions
- Receives
project_settings(frompreprocessor.settings.ProjectSettings). - Calls
model.get_answer_without_history(wrapper fromengine.models). - Updates a
BaseProgressinstance to report sub‑task progress.
Logic Flow
compressbuilds a three‑message prompt: system prompt fromproject_settings.prompt, a dynamic system prompt fromget_BASE_COMPRESS_TEXT, and the user payloaddata.- Sends the prompt to
model.get_answer_without_history; returns the LLM answer. compress_and_comparegroups inputdatalist into blocks ofcompress_power. For each element it callscompress, concatenates results per block, and updates the progress bar.compress_to_onerepeatedly invokescompress_and_compareuntil a single compressed string remains, adjustingcompress_powerwhen the remaining list is short.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
str / list[str] |
Raw text or list of fragments to compress | Passed to compress / compress_and_compare |
project_settings |
ProjectSettings |
Supplies system prompt and project metadata | Accessed via .prompt |
model |
Model |
LLM interface (sync/async) | Uses get_answer_without_history |
compress_power |
int |
Block size for grouping fragments | Default 4, may be reduced |
progress_bar |
BaseProgress |
Visual progress reporter | Sub‑task created/updated/removed |
| Return | str |
Fully compressed markdown | Output of compress_to_one |
Warning – If
compress_and_comparereceives an empty list, it returns a list of empty strings; subsequent loops may produce an empty final result.
spliter.py – Chunking for LLM Consumption
Responsibility – Splits a large markdown string into size‑constrained chunks (max_symbols) suitable for LLM prompts.
Interactions
- Consumes
ProjectSettingsfor prompt construction (future steps not shown). - Uses
BaseProgressand logging utilities for runtime visibility.
Logic Flow (present portion)
split_data(data, max_symbols)initializessplit_objects.- (Implementation truncated) – The function will later divide
dataat logical boundaries while respectingmax_symbols.
Data Contract (extracted)
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
str |
Full markdown to be chunked | May contain anchor tags |
max_symbols |
int |
Upper token/character limit per chunk | Drives split granularity |
| Return | list[str] |
Ordered list of chunk strings | Consumed by downstream doc generation |
Note – Only the signature and initial variable setup are visible; further processing is not documented here. Using the
Managerclass
from autodocgenerator.manage import Manager
from autodocgenerator.engine.models.gpt_model import GPTModel, AsyncGPTModel
from autodocgenerator.ui.progress_base import ConsoleGtiHubProgress
from autodocgenerator.factory.base_factory import DocFactory
from autodocgenerator.factory.modules.general_modules import CustomModule, CustomModuleWithOutContext
# 1. Prepare required objects
project_path = "." # path to the root of the project
config = ... # an instance of Config (filled elsewhere)
sync_model = GPTModel(API_KEY, use_random=False)
async_model = AsyncGPTModel(API_KEY)
progress = ConsoleGtiHubProgress()
# 2. Create the manager
manager = Manager(
project_path,
config=config,
sync_model=sync_model,
async_model=async_model,
progress_bar=progress,
)
# 3. Run the main generation steps
manager.generate_code_file() # scans the project and creates internal code representation
manager.generate_global_info(compress_power=4) # optional: builds a global information file
manager.generete_doc_parts( # splits documentation into parts
max_symbols=5000, # maximum size of each part
with_global_file=True
)
# 4. Apply custom documentation modules (if any)
custom_modules = [
CustomModule("...description..."),
CustomModuleWithOutContext("...description without context...")
]
manager.factory_generate_doc(DocFactory(*custom_modules))
# 5. Optional ordering of the generated documentation
manager.order_doc()
# 6. Add introductory modules (e.g., intro text, links)
from autodocgenerator.factory.modules.intro import IntroText, IntroLinks
intro_modules = [IntroText(), IntroLinks()]
manager.factory_generate_doc(DocFactory(*intro_modules))
# 7. Clean up temporary data
manager.clear_cache()
# 8. Retrieve the final documentation
output = manager.read_file_by_file_key("output_doc")
print(output)
Key Manager methods
| Method | Purpose |
|---|---|
generate_code_file() |
Scans the project directory, respects ignore patterns, and builds an internal representation of source files. |
generate_global_info(compress_power: int) |
Creates a global information file; compress_power controls the level of compression. |
generete_doc_parts(max_symbols: int, with_global_file: bool) |
Splits the documentation into chunks limited by max_symbols. If with_global_file is True, the global file is included in each part. |
factory_generate_doc(factory: DocFactory) |
Generates documentation using a DocFactory built from provided modules. |
order_doc() |
Reorders the generated sections according to the configured order logic. |
clear_cache() |
Removes temporary files and cached data after generation. |
read_file_by_file_key(key: str) -> str |
Returns the content of a generated file identified by key (e.g., "output_doc"). |
These examples show a typical workflow: instantiate Manager, run the generation pipeline, optionally add custom or introductory modules, and finally retrieve the assembled documentation.
Manager Class – Orchestration Core
Responsibility – Central coordinator that drives the full documentation pipeline: code‑mix creation, global summary compression, chunked doc generation, factory‑based extensions, final ordering, and cache cleanup.
Visible Interactions
Technical Logic Flow
- Init – stores
project_directory,Config, models, logger, creates.auto_doc_cacheif absent. generate_code_file→ instantiateCodeMix, callbuild_repo_content, writecode_mix.txt, log & update progress.generate_global_info→ readcode_mix.txt, split viasplit_data, compress withcompress_to_one(sync LLM), writeglobal_info.md.generete_doc_parts→ readcode_mix.txt(+ optional global), invokegen_doc_parts(sync LLM) with language & settings, writeoutput_doc.md.factory_generate_doc→ load current doc & code mix, buildinfodict (language,full_data,code_mix), calldoc_factory.generate_doc, prepend result to existing doc, write back.order_doc→ split current doc by anchors, request ordering viaget_order, overwriteoutput_doc.md.clear_cache→ optionally deletereport.txtbased onconfig.pbc.save_logs.
Model Base – History & Model Rotation
Responsibility – Supplies shared history, API key, and model‑selection list for both sync and async wrappers.
| Entity | Type | Role | Notes |
|---|---|---|---|
api_key |
str |
Auth token for Groq API | Defaults to global API_KEY |
history |
History |
Stores system & user messages | Initialized with BASE_SYSTEM_TEXT |
regen_models_name |
list[str] |
Candidate model identifiers | Shuffled if use_random=True |
current_model_index |
int |
Index of model currently tried | Updated on failure |
Logic Flow
ParentModel.__init__copiesMODELS_NAME.- If
use_random, list is shuffled. regen_models_nameholds the rotation order.
GPTModel – Synchronous LLM Wrapper
Responsibility – Sends a single request to Groq’s synchronous client and returns the generated text.
| Entity | Type | Role | Notes |
|---|---|---|---|
client |
Groq |
API client for sync calls | Created with api_key |
logger |
BaseLogger |
Emits InfoLog/ErrorLog/WarningLog |
Logs start, model used, answer |
prompt (method arg) |
str |
User‑supplied message when with_history=False |
Otherwise uses history.history |
| Return | str |
LLM‑generated answer | Extracted from chat_completion.choices[0].message.content |
Step‑by‑Step
- Log start.
- Choose
messages= history orprompt. - Loop: pick
model_namefromregen_models_name[current_model_index]. - Call
client.chat.completions.create(messages=model_name). - On exception, log warning, advance index (wrap to 0).
- When a response arrives, log model and answer, then return content.
AsyncGPTModel – Asynchronous LLM Wrapper
Responsibility – Mirrors GPTModel but operates with await using Groq’s async client.
| Entity | Type | Role | Notes |
|---|---|---|---|
client |
AsyncGroq |
Async API client | Created with api_key |
logger |
BaseLogger |
Same logging behavior as sync | |
prompt |
str |
Optional override when with_history=False |
|
| Return | str |
Generated answer (awaited) |
Logic Flow (identical to sync version, prefixed with await):
- Log generation start.
- Determine
messages. - Loop through
regen_models_nameattempting asyncchat.completions.create. - On failure, log warning and rotate index.
- Upon success, log model and answer, return the text.
Assumption – The code presumes
chat_completion.choices[0].message.contentis always present; no guard is added for empty choices.
These three classes constitute the LLM interaction layer used throughout the Auto‑Doc Generator pipeline.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
project_directory |
str |
Root path of the target repo | Used for all file I/O |
config |
Config |
Immutable settings (ignore patterns, language, logging) | Accessed via config.get_project_settings() |
sync_model / async_model |
Model / AsyncModel |
LLM interface for all generation calls | Must implement get_answer_without_history |
full_code_mix |
str |
Raw concatenated source files | Read from code_mix.txt |
global_result |
str |
Compressed project summary | Written to global_info.md |
result (doc parts) |
str |
Generated markdown fragments | Written to output_doc.md |
info dict |
dict[str, str] |
Payload for factories (language, full_data, code_mix) |
Size logged per key |
Assumption: All imported functions/classes behave as documented in the project knowledge base; no external side effects are introduced beyond file writes and LLM calls.
Sorting – Anchor Extraction & Ordering
Responsibility – Parses a markdown document for <a name="…"></a> anchors, builds a mapping of anchor → section text, and asks an LLM to return a semantically‑sorted list of titles.
Interactions –
- Receives raw markdown text from the Manager (post‑processor stage).
- Uses the Model (
Model.get_answer_without_history) to obtain ordering. - Returns a concatenated markdown string that the DocFactory writes to the final output file.
Logic Flow
split_text_by_anchors(text)→ regex(?=<a name=…)splits the document at each anchor.extract_links_from_start(chunks)→ extracts leading anchors (#anchor) from each chunk, discarding those ≤ 5 chars.- Validates equal counts; otherwise returns
None. - Builds
resultdict mapping each#anchorto its chunk. get_order(model, chanks)logs start, composes a user prompt asking the LLM to “Sort the following titles semantically …”.- Parses the comma‑separated response, reassembles ordered sections, logs each addition, and returns the ordered markdown.
Warning – If the number of detected anchors does not match the number of chunks, the function aborts and yields
None, causing downstream steps to skip ordering.
Data Contract
CodeMix – Repository Content Builder
Responsibility – Walks the project tree, respects ignore patterns, writes a structural tree followed by each file’s raw content into a single text file (repomix-output.txt).
Interactions –
- Consumes
root_dirandignore_patternssupplied by config_reader. - Emits the mixed repository file consumed later by preprocessor.compressor.
- Logs progress via BaseLogger (
InfoLog).
Logic Flow
should_ignore(path)→ normalisespathrelative toroot_dirand checks it against eachignore_patternsentry usingfnmatch.build_repo_content(output_file)opens the output, writes a “Repository Structure” header.- Iterates over
root_dir.rglob("*")(sorted):- If a directory, writes an indented line
dir/. - If a file and not ignored, writes
<file path="…">tag, then the file’s raw UTF‑8 text, followed by two newlines. - Errors during file read are captured and written as
Error reading ….
- If a directory, writes an indented line
- Logs each ignored path at level 1.
Assumption – All files are UTF‑8 decodable; unreadable files are recorded but do not halt execution.
Data Contract
split_data – Adaptive Chunk Rebalancing
Responsibility – Re‑splits a list of raw file fragments (splited_by_files) into size‑controlled split_objects so that each chunk respects max_symbols.
Visible Interactions – Uses BaseLogger for progress messages; no external state is mutated beyond returned list.
Logic Flow
- Initialise
split_objects = []. - Balancing loop – while any fragment exceeds
1.5 × max_symbolsit is bisected atmax_symbols/2and re‑inserted, settinghave_to_change. Loop repeats until all fragments fit the limit. - Iterate
splited_by_files, appending each piece to the current chunk; if adding would exceed1.25 × max_symbols, start a new chunk. - Log final count and return
split_objects.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
splited_by_files |
list[str] |
Raw fragments from previous step | May contain oversized entries |
max_symbols |
int |
Upper bound for chunk size | Drives both balancing & chunk creation |
| Return | list[str] |
Ordered chunks respecting limits | Consumed by gen_doc_parts |
Warning – Over‑large fragments are split at a fixed half‑point; content boundaries (e.g., markdown headings) are not preserved.
DocFactory – Generation Orchestrator
| Entity | Type | Role | Notes |
|---|---|---|---|
modules |
list[BaseModule] |
Ordered collection of LLM‑driven processors. | Provided at construction (*modules). |
info |
dict |
Shared data bag (e.g., code_mix, language). |
Passed unchanged to each module. |
model |
Model |
Synchronous LLM wrapper used by every module. | Same instance reused throughout the run. |
progress |
BaseProgress |
Progress‑bar helper. | Creates a sub‑task named “Generate parts”. |
output |
str |
Concatenated documentation fragments. | Each fragment appended with a double newline. |
Logic Flow
- Initialise sub‑task (
progress.create_new_subtask). - Iterate over
self.modules. - Call
module.generate(info, model). - Append result to
output. - Log success via
BaseLogger. - Update progress (
progress.update_task). - After loop, remove sub‑task and return
output.
Visible Interactions – Directly invokes each module’s generate; delegates LLM calls to those modules; writes logs; updates UI progress.
gen_doc – Orchestrator Entry Point
Purpose: Drive the full documentation pipeline using the objects produced by read_config.
Step‑by‑Step
- Initialise LLM wrappers:
GPTModel(sync) andAsyncGPTModel. - Instantiate
Managerwith project path,Config, models, and aConsoleGtiHubProgressbar. manager.generate_code_file()– creates repo‑mix.- If
structure_settings.use_global_file→manager.generate_global_info(compress_power=4). manager.generete_doc_parts(max_symbols=..., with_global_file=...)– chunk‑splits and LLM‑generates per part.manager.factory_generate_doc(DocFactory(*custom_modules))– runs user‑defined modules.- If
include_order→manager.order_doc()– semantic re‑ordering. - Append optional intro modules (
IntroText,IntroLinks) based on settings and invoke anotherfactory_generate_doc. manager.clear_cache()– removes temporary artifacts.- Return final markdown via
manager.read_file_by_file_key("output_doc").
Data Contract
write_docs_by_parts – LLM‑Driven Part Documentation
Responsibility – Build a system‑user prompt, invoke the LLM (model.get_answer_without_history), and return cleaned markdown for a single chunk.
Visible Interactions – Reads project_settings.prompt; optionally includes global_info and prev_info; logs via BaseLogger.
Logic Flow
- Initialise logger.
- Assemble
promptlist with three mandatory system messages (language, global project info,BASE_PART_COMPLITE_TEXT). - Append optional system messages for
global_infoandprev_info. - Append the user message containing the chunk
part. - Call
model.get_answer_without_history(prompt). - Strip leading/trailing Markdown fences ````` ```.
- Return cleaned answer.
Data Contract
gen_doc_parts – End‑to‑End Part Generation
Responsibility – Split the full code‑mix, generate documentation for each chunk, and concatenate results.
Visible Interactions – Calls split_data, write_docs_by_parts, updates a BaseProgress sub‑task, and logs.
Logic Flow
splited_data = split_data(full_code_mix, max_symbols).- Create a progress sub‑task sized to
len(splited_data). - For each chunk
el:result = write_docs_by_parts(el, …, prev=result, …).- Append
resulttoall_result. - Keep a 3000‑character tail of
resultfor the next iteration (prev_info). - Update progress.
- Remove sub‑task, log final length, and return
all_result.
Data Contract
Logging Infrastructure – BaseLogger & Log Types
Responsibility – Provide a singleton logger that delegates to a configurable BaseLoggerTemplate (console or file).
Visible Interactions – All functions above instantiate BaseLogger() and call .log(InfoLog(...)).
Components
BaseLog– base class withmessage,level, and formatted output.- Sub‑classes
ErrorLog,WarningLog,InfoLogprepend timestamp and severity. BaseLoggerTemplate– filters bylog_leveland prints.FileLoggerTemplate– writes to a file.BaseLogger– singleton factory exposingset_loggerandlog.
Data Contract
BaseProgress – Minimal Progress Interface
Responsibility – Defines the contract used by the pipeline to report incremental work. It exposes three methods that concrete progress reporters must implement: create_new_subtask(name, total_len), update_task(), and remove_subtask().
Visible Interactions – All manager‑level loops call BaseProgress.create_new_subtask before a batch of LLM requests, invoke update_task after each request, and finally remove_subtask. No state is stored in this class itself.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
name |
str |
Sub‑task identifier | Human‑readable label |
total_len |
int |
Expected iteration count | Drives progress bar limits |
| Return | None |
Side‑effect only | Implementations update UI or console |
LibProgress – Rich‑based UI Implementation
Responsibility – Provides a visual progress bar using rich.Progress while preserving the abstract API.
Logic Flow
- Constructor receives a
Progressinstance and creates a base task"General progress"with configurable total (default 4). create_new_subtaskregisters a new task and stores its handle in_cur_sub_task.update_taskadvances the current sub‑task if present; otherwise it advances the base task.remove_subtaskdiscards the current sub‑task handle, causing subsequent updates to target the base task again.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
progress |
rich.progress.Progress |
Rendering engine | Provided by caller |
total |
int |
Base task length | Defaults to 4 |
_base_task |
int |
Rich task ID for the base bar | Internal |
_cur_sub_task |
`int | None` | Active sub‑task ID |
⚠️ The class does not implement error handling for missing
Progressobjects; callers must ensure a valid instance.
ConsoleGtiHubProgress – Simple Console Task Reporter
Responsibility – Supplies a lightweight, dependency‑free progress reporter that prints textual updates to stdout.
Logic Flow
- Instantiation creates a permanent General Progress
ConsoleTask(gen_task). create_new_subtaskspawns a freshConsoleTaskfor the named sub‑operation, stored incurr_task.update_taskcallscurr_task.progress()if a sub‑task exists; otherwise it updatesgen_task.remove_subtaskclearscurr_task, causing future updates to fall back to the general task.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
curr_task |
`ConsoleTask | None` | Active sub‑task reporter |
gen_task |
ConsoleTask |
Persistent general progress reporter | Initialized in __init__ |
| Return | None |
Side‑effect: printed progress line | Uses print |
ConsoleTask – Helper class that tracks current_len, computes percentage, and emits a formatted line on each progress() call.
All progress reporters conform to the BaseProgress contract, enabling the manager to switch UI implementations without code changes.
_print_welcome – Logic Flow
- Define ANSI colour/format strings (
BLUE,BOLD,CYAN,RESET). - Build
ascii_logowith colour placeholders and the literal logo. - Print
ascii_logotostdout. - Print a status line:
"ADG Library | Status: Ready to work"coloured withCYAN. - Print a separator line (
'—' * 35).
The function has no parameters, returns None, and produces side‑effects (terminal output).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autodocgenerator-0.9.3.0.tar.gz.
File metadata
- Download URL: autodocgenerator-0.9.3.0.tar.gz
- Upload date:
- Size: 49.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.12 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a181f4a1674a0e5e18ee36b41038bb24102fd4eae1162c264906882ae0eb311e
|
|
| MD5 |
98f11959eb0000505e27fca37de45457
|
|
| BLAKE2b-256 |
557ce25f79c1f36154d53627c12968c4096047a3fa26609b565512c683a27678
|
File details
Details for the file autodocgenerator-0.9.3.0-py3-none-any.whl.
File metadata
- Download URL: autodocgenerator-0.9.3.0-py3-none-any.whl
- Upload date:
- Size: 41.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.12 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fca71999a136b64d93303187be92b1839149477c7ae308fd78b31e425b404e4
|
|
| MD5 |
1b411e24d5ee824e3a9c28422f7227f0
|
|
| BLAKE2b-256 |
68f750650a1c6fd49e2ff15fbedfb30be33d11cca0a27f5f29b2abaf2d1720f7
|