This Project helps you to create docs for your projects
Project description
Executive Navigation Tree
-
📂 Core Engine
-
📂 Documentation Generation
-
📂 Compression Utilities
-
📂 Progress & Interaction
-
📂 Extensibility & Integration
-
📂 Manager Utilities
-
📂 Testing & Assumptions
-
📂 Overview & Intro
The Manager class is instantiated with the following parameters (as shown in autodocgenerator/auto_runner/run_file.py):
| Parameter | Description (inferred from usage) |
|---|---|
project_path |
Path to the root of the project (e.g., "."). |
project_settings |
An instance of ProjectSettings containing project metadata. |
sync_model |
An instance of GPTModel (synchronous model). |
async_model |
An instance of AsyncGPTModel (asynchronous model). |
ignore_files |
List of file patterns to ignore during documentation generation. |
progress_bar |
An object implementing progress reporting, e.g., LibProgress(progress). |
language |
Language code for the documentation (e.g., "en"). |
Full example of usage
# Example: Using the Manager class to generate documentation
from autodocgenerator.manage import Manager
from autodocgenerator.engine.models.gpt_model import GPTModel, AsyncGPTModel
from autodocgenerator.engine.config.config import API_KEY
from autodocgenerator.preprocessor.settings import ProjectSettings
from autodocgenerator.ui.progress_base import LibProgress
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
# 1. Prepare project settings (could be read from autodocconfig.yml)
project_settings = ProjectSettings("MyProject")
project_settings.add_info("global idea", "Example project for documentation generation")
# 2. Define ignore patterns (same as default or custom)
ignore_list = [
"*.pyo", "*.pyd", "*.pdb", "*.pkl", "*.log", "*.sqlite3", "*.db", "data",
"venv", "env", ".venv", ".env", ".vscode", ".idea", "*.iml", ".gitignore",
".ruff_cache", ".auto_doc_cache", "*.pyc", "__pycache__", ".git",
".coverage", "htmlcov", "migrations", "*.md", "static", "staticfiles",
".mypy_cache"
]
# 3. Initialize GPT models (API key is taken from the config)
sync_model = GPTModel(API_KEY)
async_model = AsyncGPTModel(API_KEY)
# 4. Set up a Rich progress bar
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
BarColumn(),
TaskProgressColumn(),
) as progress:
progress_bar = LibProgress(progress)
# 5. Create the Manager instance
manager = Manager(
project_path=".", # path to the project root
project_settings=project_settings,
sync_model=sync_model,
async_model=async_model,
ignore_files=ignore_list,
progress_bar=progress_bar,
language="en" # documentation language
)
# 6. Run the generation steps (as in run_file.py)
manager.generate_code_file()
manager.generate_global_info_file(use_async=False, max_symbols=8000)
manager.generete_doc_parts(use_async=False, max_symbols=4000)
# Example: generate documentation using a factory (doc_factory must be created elsewhere)
# manager.factory_generate_doc(doc_factory)
# Retrieve the final documentation content
output = manager.read_file_by_file_key("output_doc")
print(output)
This example mirrors the workflow used in autodocgenerator/auto_runner/run_file.py, showing all required parameters and a typical sequence of method calls on the Manager instance.
Example 1 – Using custom description modules
from autodocgenerator.factory.base_factory import DocFactory
from autodocgenerator.factory.modules.general_modules import CustomModule
# Create custom modules from description strings
mod1 = CustomModule("how to use Manager class what parameters i need to give. give full example of usage")
mod2 = CustomModule("give me examples of usage for DocFactory with different modules")
mod3 = CustomModule("explain how to write autodocconfig.yml file what options are available")
# Initialise DocFactory with the custom modules
custom_doc_factory = DocFactory(mod1, mod2, mod3)
Example 2 – Using built‑in introductory modules
from autodocgenerator.factory.base_factory import DocFactory
from autodocgenerator.factory.modules.intro import IntroLinks, IntroText
# Initialise DocFactory with the standard intro modules
intro_factory = DocFactory(
IntroLinks(),
IntroText(),
)
Example 3 – Combining both custom and intro modules
from autodocgenerator.factory.base_factory import DocFactory
from autodocgenerator.factory.modules.general_modules import CustomModule
from autodocgenerator.factory.modules.intro import IntroLinks, IntroText
custom = CustomModule("custom description for a specific feature")
intro_links = IntroLinks()
intro_text = IntroText()
# DocFactory can receive any mix of modules
mixed_factory = DocFactory(custom, intro_links, intro_text)
Typical usage in the generation pipeline
from autodocgenerator.auto_runner.run_file import gen_doc
from autodocgenerator.auto_runner.config_reader import read_config
# Load configuration (autodocconfig.yml)
with open("autodocconfig.yml", "r", encoding="utf-8") as f:
cfg = read_config(f.read())
project_settings = cfg.get_project_settings()
doc_factory, intro_factory = cfg.get_doc_factory()
# Generate documentation
output = gen_doc(
project_settings,
cfg.ignore_files,
".", # project root
doc_factory, # custom content
intro_factory, # introductory content
)
The autodocconfig.yml file is a YAML configuration used by autodocgenerator.
Based on the repository code (autodocgenerator/auto_runner/config_reader.py) the following top‑level options are recognized:
- project_name (string) – The name of the project.
- language (string, default “en”) – Language for the generated documentation.
- ignore_files (list of string patterns, optional) – File‑name patterns that will be excluded from the documentation process (e.g.,
*.pyc,__pycache__,venv, etc.). - project_additional_info (mapping) – Arbitrary key‑value pairs that are added to the project settings; each key is a string and the value is a string.
- custom_descriptions (list of strings) – Descriptions that are turned into
CustomModuleobjects and incorporated into the documentation generation pipeline.
Only these options are parsed by read_config; any other fields are ignored. An example configuration from the repository:
project_name: "Auto Doc Generator"
language: "en"
project_additional_info:
global idea: "This project was created to help developers make documentations for them projects"
custom_descriptions:
- "how to use Manager class what parameters i need to give. give full example of usage"
- "give me examples of usage for DocFactory with different modules"
- "explain how to write autodocconfig.yml file what options are available"
autodocgenerator/init.py
Overview
autodocgenerator/__init__.py is the package initializer for the Auto Doc Generator (ADG) library.
Its sole purpose is to emit a short banner ("ADG") when the package is imported.
Although minimal, this file plays a key role in the module discovery performed by the CI/CD pipelines and the documentation‑generation runner (autodocgenerator.auto_runner.run_file).
Responsibility
- Side‑effect notification – prints a recognizable string (
"ADG") to standard output the first time the package is imported. - Package marker – signals to Python that
autodocgeneratoris a proper package, allowing relative imports such asfrom .engine import configthroughout the codebase.
Interaction with the System
| Component | Interaction |
|---|---|
GitHub Actions (docs.yml) |
Executes python -m autodocgenerator.auto_runner.run_file. Importing autodocgenerator triggers this __init__ file, resulting in the banner appearing in the CI logs (useful for quick sanity checks). |
autodocgenerator.auto_runner.run_file |
Imports the top‑level package (import autodocgenerator). The banner confirms that the import succeeded before the runner proceeds to read configuration, load factories, and generate documentation. |
| Developers / End‑users | When they run python -m autodocgenerator or import any sub‑module, they see the "ADG" output, confirming that the correct package version is being used. |
Key Logic Flow
# autodocgenerator/__init__.py
print("ADG")
- Module import – Python evaluates the package’s
__init__file. printstatement – Sends the literal string"ADG"tostdout.- Import completes – Control returns to the caller (e.g., the runner or user script).
Assumptions
- The environment’s standard output is not redirected or suppressed; otherwise the banner may be invisible.
- No other side effects (e.g., logging configuration) are required at import time. The simplicity is intentional to keep import overhead negligible.
Inputs & Outputs
| Aspect | Description |
|---|---|
| Input | Implicit import mechanism; no explicit arguments. |
| Output | A single line printed to stdout: ADG. No return value, no raised exceptions. |
| Side Effects | The only side effect is the console output; no file I/O, network calls, or state mutation. |
Extensibility & Best Practices
- Do not add heavy logic here. Heavy initialisation should live in dedicated modules (e.g.,
engine/config/config.py) to avoid slowing down imports. - If future versions need richer startup information (version, environment), consider replacing the plain
printwith a structured logger:import logging logger = logging.getLogger(__name__) logger.info("Auto Doc Generator (ADG) initialized")
- Keep the banner consistent with CI logs and documentation generation output to aid debugging.
Example Usage
$ python -c "import autodocgenerator"
ADG
Or via the documentation runner:
$ python -m autodocgenerator.auto_runner.run_file
# CI log will contain:
# ADG
# ... (subsequent runner output)
Summary
autodocgenerator/__init__.py is a lightweight entry point that confirms successful package import by printing "ADG". It ensures the package is recognized by Python’s import system and provides a quick visual cue in CI pipelines and interactive sessions. Its design intentionally avoids any heavy computation, delegating all functional responsibilities to the sub‑packages under autodocgenerator.
autodocgenerator/auto_runner/config_reader.py
Purpose
config_reader.py translates a user‑provided YAML configuration file into a runtime Config object that the documentation‑generation pipeline can consume. It centralises all static settings (ignore patterns, language, project metadata, custom modules) and supplies ready‑to‑use factories for the documentation engine.
Core Class – Config
| Attribute | Meaning | Default |
|---|---|---|
ignore_files |
Glob patterns that the Manager will skip while scanning the project tree. |
A comprehensive list covering compiled artefacts, virtual‑env folders, IDE caches, etc. |
language |
Target language for generated docs. | "en" |
project_name |
Human‑readable name of the analysed project. | "" (must be supplied by the user) |
project_additional_info |
Arbitrary key/value pairs that are injected into ProjectSettings. |
{} |
custom_modules |
Instances of CustomModule that extend the documentation generation (e.g., extra sections, specialised parsers). |
[] |
Fluent API
All mutators (set_language, set_project_name, add_*) return self, enabling a builder‑style configuration:
cfg = Config().set_language("fr").add_ignore_file("*.tmp")
Helper Methods
get_project_settings()– builds aProjectSettingsobject (fromautodocgenerator.preprocessor.settings) populated withproject_nameand any additional info.get_doc_factory()– creates twoDocFactoryinstances:docFactory– contains all user‑definedcustom_modules.intro_factory– always includes the built‑in intro modules (IntroLinks,IntroText).
These factories are later passed to the Manager to render the final documentation.
read_config(file_data: str) -> Config
- Parse YAML –
yaml.safe_loadconverts the raw string into a Python dict. - Instantiate
Config– starts from the defaults defined in__init__. - Populate fields –
ignore_files→add_ignore_file(preserves defaults).language→set_language.project_name→set_project_name.project_additional_info→add_project_additional_info.custom_descriptions→ each entry wrapped inCustomModuleand added viaadd_custom_module.
- Return the fully‑initialised
Configobject.
Assumptions & Side‑effects
- The YAML file is well‑formed; malformed content will raise
yaml.YAMLError. - No I/O is performed here – the caller supplies the file contents.
- All fields are optional; missing keys fall back to sensible defaults.
autodocgenerator/auto_runner/run_file.py
Purpose
run_file.py is the entry point for the command‑line execution of the Auto‑Doc Generator (ADG). It wires together configuration, LLM models, progress UI, and the core Manager to produce a single markdown (or similar) document representing the analysed project.
Main Function – gen_doc
def gen_doc(
project_settings: ProjectSettings,
ignore_list: list[str],
project_path: str,
doc_factory: DocFactory,
intro_factory: DocFactory,
) -> str:
Workflow
- Progress UI – a
rich.Progressbar with spinner, description, and bar columns visualises the long‑running steps. - LLM Clients –
sync_model = GPTModel(API_KEY)– synchronous OpenAI‑compatible client.async_model = AsyncGPTModel(API_KEY)– asynchronous counterpart (currently used synchronously).
- Manager Construction –
Managerreceives:project_path– root directory to scan.project_settings– metadata fromConfig.- LLM clients, ignore patterns, progress wrapper (
LibProgress), and language.
- Generation Steps (executed sequentially):
generate_code_file()– extracts source‑code snippets.generate_global_info_file()– creates a high‑level overview (max 8000 symbols).generete_doc_parts()– splits the work into manageable chunks (max 4000 symbols).factory_generate_doc(doc_factory)– runs user‑defined custom modules.factory_generate_doc(intro_factory)– adds the standard intro sections.
- Result Retrieval –
manager.read_file_by_file_key("output_doc")returns the final assembled document as a string.
Return Value
A single string containing the complete generated documentation.
Script Execution (if __name__ == "__main__":)
- Reads
autodocconfig.ymlfrom the current working directory. - Calls
read_config(fromconfig_reader.py) to obtain aConfiginstance. - Extracts
ProjectSettingsand the twoDocFactoryobjects. - Invokes
gen_docwith the current directory (".") as the project root. - Stores the resulting document in
output_doc(the script does not automatically write it to disk; callers can add that step).
Interaction with the Rest of the System
| Component | Role in the Flow |
|---|---|
autodocgenerator.manage.Manager |
Orchestrates file discovery, LLM calls, and assembly of documentation parts. |
autodocgenerator.engine.models.gpt_model |
Provides the LLM API wrappers used by Manager. |
autodocgenerator.ui.progress_base |
Supplies LibProgress, a thin adapter that lets Manager report progress to the rich bar. |
autodocgenerator.factory.* |
Supplies modular document generators (custom or built‑in intro). |
autodocgenerator.preprocessor.settings.ProjectSettings |
Holds project‑level metadata consumed by the factories. |
Assumptions & Constraints
API_KEYis defined inautodocgenerator.engine.config.configand is a valid OpenAI (or compatible) key.- The environment has network access for LLM calls.
- The progress bar is displayed on a terminal that supports ANSI escape codes.
- All factories supplied are stateless or safely reusable across a single run.
Extensibility Tips
- Async Generation –
Manageralready supports async calls; switchuse_async=Trueand adjust themax_symbolsparameters to leverage concurrency. - Additional Intro Modules – Extend
IntroLinks/IntroTextor replace them by providing a customDocFactoryvia the YAMLcustom_descriptionsfield. - Custom Progress UI – Implement another
BaseProgresssubclass and pass it toManagerif richer UI is required.
Example Command‑Line Use
$ python -m autodocgenerator.auto_runner.run_file
# Reads autodocconfig.yml, shows a progress bar, and prints the final doc string.
Or programmatically:
from autodocgenerator.auto_runner.run_file import gen_doc
from autodocgenerator.auto_runner.config_reader import read_config, Config
with open("autodocconfig.yml", "r", encoding="utf-8") as f:
cfg = read_config(f.read())
proj_settings = cfg.get_project_settings()
doc_factory, intro_factory = cfg.get_doc_factory()
doc = gen_doc(
proj_settings,
cfg.ignore_files,
project_path=".",
doc_factory=doc_factory,
intro_factory=intro_factory,
)
print(doc)
Summary
config_reader.py converts a YAML description into a structured Config object, while run_file.py consumes that object to drive the full documentation generation pipeline. Together they form the bootstrap layer of the Auto‑Doc Generator, handling configuration, progress reporting, LLM initialisation, and final document assembly without embedding any heavy business logic—those responsibilities reside in the Manager and the various DocFactory modules.
Engine Models Overview
The autodocgenerator.engine.models package provides thin wrappers around the Groq LLM API.
These wrappers are the only components that know how to talk to the remote model; all higher‑level logic (file discovery, prompt construction, document assembly) lives in the Manager and the various DocFactory modules.
History
- Purpose – Holds the conversation history that is sent to the LLM.
- Key data –
self.history– a list of dictionaries{role, content}. - Behaviour – On construction a system message containing
BASE_SYSTEM_TEXT(fromconfig.config) is added automatically, unless the caller passesNone. - Side‑effects –
add_to_historymutates the internal list; the sameHistoryinstance is shared by a model and its callers, so everyget_answercall appends a user and assistant entry.
ParentModel
- Responsibility – Stores common configuration for concrete model classes: API key, a
Historyobject, and a shuffled list of model names (MODELS_NAME). - Model rotation –
self.regen_models_nameis a random permutation of the configured model identifiers. When a request fails, the wrapper will advanceself.current_model_indexand retry with the next model.
Model (synchronous)
- Base class for
GPTModel. - Public helpers
get_answer(prompt: str) → str– records the user prompt, callsgenerate_answer, records the assistant reply, and returns it.get_answer_without_history(prompt: list[dict]) → str– forwards a pre‑built message list directly togenerate_answer.
- Default
generate_answer– placeholder returning"answer"; overridden inGPTModel.
AsyncModel (asynchronous)
- Mirrors
Modelbut withasyncmethods, enabling theManagerto run many LLM calls concurrently.
GPTModel (synchronous Groq wrapper)
class GPTModel(Model):
def __init__(self, api_key=API_KEY, history=History()):
super().__init__(api_key, history)
self.client = Groq(api_key=self.api_key)
-
generate_answer- Chooses the current model name from
self.regen_models_name. - Calls
self.client.chat.completions.create(messages=…, model=model_name, temperature=0.3). - On any exception the failing model is removed from the rotation; the loop retries with the next entry.
- Returns the content of the first choice (
chat_completion.choices[0].message.content).
- Chooses the current model name from
-
Error handling – If every configured model fails, an exception “all models do not work” is raised.
AsyncGPTModel (asynchronous Groq wrapper)
- Same logic as
GPTModelbut usesAsyncGroqandawaitfor the API call. - On failure it cycles the index instead of removing the model, allowing a retry with the next candidate.
Interaction with the Rest of the System
| Component | How it uses the model layer |
|---|---|
autodocgenerator.manage.Manager |
Instantiates either GPTModel or AsyncGPTModel (depending on use_async) and calls get_answer / get_answer_without_history to obtain LLM completions for each code fragment. |
autodocgenerator.factory.* |
Supplies the textual prompts (intro, description, etc.) that are fed to the model via the History object. |
autodocgenerator.ui.progress_base.LibProgress |
Receives progress updates from Manager; it does not interact with the model directly. |
autodocgenerator.engine.config.config |
Provides constants (API_KEY, BASE_SYSTEM_TEXT, MODELS_NAME) consumed by ParentModel. |
The model classes are deliberately stateless aside from the rotating list and the shared History; they can be safely recreated for each run or reused across a single documentation generation session.
Assumptions & Constraints
API_KEYmust be a valid Groq (or compatible) token; otherwise the client raises an authentication error.- Network connectivity is required for every
generate_answercall. MODELS_NAMEcontains at least one model identifier; an empty list will cause anIndexError.- The
Historyobject is expected to contain only dictionaries with"role"("system","user","assistant") and"content"keys – this matches Groq’s chat schema.
Extensibility Tips
- Custom LLM providers – Subclass
ParentModeland replaceself.clientwith another SDK; keep the samegenerate_answersignature. - Alternative retry policy – Override the while‑loop logic in
GPTModel/AsyncGPTModelto implement exponential back‑off or circuit‑breaker patterns. - History persistence – Swap the default
Historywith a subclass that writes to disk if you need to audit the full prompt/response trail.
Quick Example
from autodocgenerator.engine.models.gpt_model import GPTModel
model = GPTModel()
answer = model.get_answer("Explain the purpose of the `History` class.")
print(answer)
In an asynchronous pipeline the same code would use AsyncGPTModel and await model.get_answer(...).
These classes constitute the LLM access layer of Auto‑Doc Generator, isolating the rest of the codebase from vendor‑specific details while providing simple, retry‑aware synchronous and asynchronous interfaces.
Documentation – Factory Layer & Repository‑mix Pre‑processor
(part of the Auto‑Doc Generator pipeline – the “LLM‑driven documentation builder”)
autodocgenerator/factory/base_factory.py
Purpose
Provides the pluggable module framework that the manager uses to assemble a documentation page.
BaseModule– abstract contract for any “generation step” (e.g., intro text, custom description).DocFactory– orchestrates a list ofBaseModuleinstances, feeds them the sameinfopayload and a concreteModel(sync or async), and concatenates their outputs.
Core Classes
| Class | Responsibility | Important Methods |
|---|---|---|
BaseModule (ABC) |
Defines the interface for a generation step. Sub‑classes implement generate(info, model) → str. |
generate – abstract. |
DocFactory |
Holds an ordered collection of modules, creates a sub‑task in the UI progress bar, runs each module sequentially, aggregates results. | __init__(*modules) – stores modules.generate_doc(info, model, progress) → str – main driver. |
Interaction with the Rest of the System
| Component | How it connects |
|---|---|
autodocgenerator.manage.Manager |
Instantiates a DocFactory with the desired modules (e.g., IntroText, CustomModule). Calls factory.generate_doc(info, model, progress) to obtain the final markdown/HTML. |
autodocgenerator.engine.models.* |
Passed as the model argument; modules call model.get_answer… inside their generate implementation. |
autodocgenerator.ui.progress_base.BaseProgress |
Provides create_new_subtask, update_task, remove_subtask used by DocFactory to report per‑module progress. |
autodocgenerator.factory.modules.* |
Concrete BaseModule subclasses that live in the same package; they are the only objects DocFactory ever invokes. |
Assumptions & Side‑effects
infois a dictionary produced by the pre‑processor (seecode_mix.py) and contains keys such as"code_mix","full_data","global_data","language".- Each module returns a plain string (markdown/HTML).
DocFactorysimply concatenates them with double new‑lines. - The progress object must support the three methods used; otherwise a runtime
AttributeErroris raised. - No state is kept inside
DocFactoryaftergenerate_docreturns – it can be reused for multiple runs.
autodocgenerator/factory/modules/general_modules.py
Responsibility
Implements a custom description module that lets the user supply a free‑form prompt (discription).
- Splits the large source‑code blob (
info["code_mix"]) into chunks ≤ 7 000 symbols (viasplit_data). - Calls
generete_custom_discription(typo‑preserved from the original code) which internally talks to the LLM model, feeding each chunk together with the custom prompt.
Key Points
- Constructor stores the user‑provided description text.
generatereturns the concatenated LLM answer for all chunks.
Interaction
- Relies on
engine.models.model.Modelfor the LLM client. - Uses
preprocessor.spliter.split_datato respect token limits. - Calls
preprocessor.postprocess.generete_custom_discription– the function that builds the final prompt and parses the model response.
autodocgenerator/factory/modules/intro.py
Responsibility
Two small modules that produce the introductory part of the documentation:
| Class | What it does |
|---|---|
IntroLinks |
Extracts all HTML links from info["full_data"] (get_all_html_links) and asks the model to write a short description for each (get_links_intro). |
IntroText |
Generates a high‑level project introduction from info["global_data"] (get_introdaction). |
Both modules follow the same BaseModule contract and return a string ready to be concatenated.
Interaction
- Import the same
Modeltype as other modules. - Depend on
preprocessor.postprocesshelpers for link extraction and prompt creation.
autodocgenerator/preprocessor/code_mix.py
Purpose
Creates a single text representation of an entire repository (the “code‑mix”) that later feeds the LLM. It is used by the manager to populate info["code_mix"].
Core Class
| Method | Description |
|---|---|
__init__(root_dir=".", ignore_patterns=None) |
Sets the repository root (resolved to an absolute Path) and a list of glob patterns / directory names to skip. |
should_ignore(path: Path) → bool |
Returns True if the given path matches any ignore pattern (supports full‑path, basename, and any path component). |
build_repo_content(output_file="repomix-output.txt") |
Writes two sections to output_file: 1️⃣ A tree‑like listing of directories/files (respecting ignore rules). 2️⃣ The raw content of each non‑ignored file wrapped in <file path="…"> tags. Errors while reading a file are captured and written as a comment line. |
Interaction
- Called once per documentation run (usually by
Managerbefore any LLM calls). - The generated file is read back by a separate pre‑processor (not shown) that stores its content in
info["code_mix"].
Assumptions & Side‑effects
ignore_patternsmust be a list of glob strings; the default list (ignore_listdefined at the bottom) filters out binaries, virtual‑env folders, IDE caches, etc.- The method opens the output file in write‑mode, overwriting any existing file.
- File reading uses
encoding="utf-8"witherrors="ignore"– non‑UTF‑8 files are silently stripped of undecodable bytes. - The function may raise
OSErrorif the output path is not writable.
Extensibility Tips
- Add a new generation step – subclass
BaseModule, implementgenerate(self, info, model), and pass an instance toDocFactory. - Custom ignore logic – override
should_ignorein a subclass ofCodeMix(e.g., to exclude large binary files by size). - Parallel module execution – replace the simple
forloop inDocFactory.generate_docwithasyncio.gatherand useAsyncModelfor true concurrency (requires a thread‑safe progress implementation).
These components together form the factory layer of Auto‑Doc Generator: they turn raw repository data into structured prompts, invoke the LLM via the model layer, and stitch the pieces into a final documentation string.
autodocgenerator/preprocessor/compressor.py
Overview
This module implements the compression pipeline used by the Auto‑Doc Generator to shrink large code fragments (or any textual payload) before they are sent to the LLM.
It works on the pre‑processed data produced by earlier steps (e.g., code_mix or raw file contents) and returns a single, highly‑condensed string that still preserves the essential information required for documentation generation.
The pipeline can run synchronously or asynchronously, and it reports its progress through the shared BaseProgress UI component.
compress(data, project_settings, model, compress_power) → str
- Purpose – Build a three‑message prompt (system + system + user) and ask the LLM to compress
data. - Inputs
data– raw text to be shortened.project_settings–ProjectSettingsinstance providing the base system prompt (project_settings.prompt).model– an object implementing theModelprotocol (get_answer_without_history).compress_power– integer controlling the aggressiveness of compression; passed toget_BASE_COMPRESS_TEXT.
- Output – The LLM’s answer (a compressed version of
data). - Side‑effects – None (pure function apart from the LLM call).
compress_and_compare(data, model, project_settings, compress_power=4, progress_bar=BaseProgress()) → list[str]
- Purpose – Batch‑compress a list of strings, then concatenate every
compress_powerresults into a single chunk. - Logic Flow
- Allocate a result list sized
ceil(len(data)/compress_power). - Create a sub‑task on
progress_bar(total =len(data)). - Iterate over
data, compress each element withcompress, and append the result to the appropriate chunk (curr_index = i // compress_power). - Update the progress bar after each element.
- Remove the sub‑task and return the list of concatenated chunks.
- Allocate a result list sized
- Assumptions –
compress_power≥ 1;progress_barimplementscreate_new_subtask,update_task,remove_subtask.
async_compress(data, project_settings, model, compress_power, semaphore, progress_bar) → str (coroutine)
- Mirrors
compressbut runs inside anasyncio.Semaphoreto limit concurrent LLM calls. - Calls
await model.get_answer_without_history(...)and updates the progress bar once the answer is received.
async_compress_and_compare(data, model, project_settings, compress_power=4, progress_bar=BaseProgress()) → list[str] (coroutine)
- Purpose – Parallel version of
compress_and_compare. - Steps
- Initialise a semaphore (max 4 concurrent requests).
- Spawn a task for each element via
async_compress. await asyncio.gather(*tasks)to collect all compressed pieces.- Re‑group the flat list into chunks of size
compress_power(identical to the synchronous version).
- Side‑effects – Progress bar updates are performed inside each
async_compresscall.
compress_to_one(data, model, project_settings, compress_power=4, use_async=False, progress_bar=BaseProgress()) → str
- Purpose – Repeatedly compress the list until only a single string remains (the final “code‑mix” summary).
- Algorithm
- While
len(data) > 1:- Adjust
compress_powerto2when the list is too short for the default chunk size. - Call either
compress_and_compareorasync_compress_and_comparebased onuse_async. - Replace
datawith the newly produced list and increment an iteration counter.
- Adjust
- While
- Result – The sole element
data[0], a fully compressed representation of the original input set.
generate_discribtions_for_code(data, model, project_settings, progress_bar=BaseProgress()) → list[str]
- Purpose – Ask the LLM to produce developer‑oriented documentation snippets for each code block in
data. - Prompt – A fixed system message describing the required output format (markdown, parameter tables, usage example) and a user message containing the raw code (
CONTEXT: {code}). - Flow
- Create a progress sub‑task (
len(data)). - For each
codeelement, send the prompt viamodel.get_answer_without_history. - Append the answer to
describtionsand update the progress bar. - Return the list of generated descriptions.
- Create a progress sub‑task (
Interaction with the Rest of the System
- Model Layer – Imports
Model/AsyncModelfromengine.models.gpt_model. All compression calls delegate the heavy‑lifting to the LLM viaget_answer_without_history. - Configuration – Uses
get_BASE_COMPRESS_TEXT(engine config) to inject a reusable system prompt fragment that encodes the desired compression ratio. - UI – Progress reporting is unified through
BaseProgress, allowing the manager UI to display nested tasks (e.g., “Compare all files”, “Generate describtions”). - Pre‑processor Pipeline – The output of
compress_to_onefeedsinfo["code_mix"](or similar) which later becomes part of the final prompt stack assembled by theDocFactorymodules.
Key Assumptions & Side‑effects
- All text inputs are UTF‑8 compatible; the LLM is expected to handle any encoding quirks.
compress_powerinfluences both the granularity of chunking and the aggressiveness of the compression prompt.- Asynchronous functions assume the event loop is not already running;
compress_to_onesafely invokesasyncio.runwhenuse_async=True. - Errors from the LLM (network failures, rate limits) propagate as exceptions; callers (typically the manager) must handle them.
This module is the “size‑reduction” stage of the Auto‑Doc Generator, turning potentially huge repository dumps into a compact, LLM‑friendly representation before the final documentation generation steps.
autodocgenerator.preprocessor.postprocess – Post‑processing Helpers
Responsibility
This module prepares the raw markdown produced by the compression stage for the final documentation output.
It extracts section titles, builds markdown anchors, generates introductory texts for the whole document and for individual link groups, and creates custom descriptions on demand. All heavy‑lifting (LLM calls) is delegated to the Model abstraction from engine.models.
Key Functions
| Function | Purpose | Important I/O |
|---|---|---|
generate_markdown_anchor(header: str) → str |
Normalises a heading into a GitHub‑style markdown anchor (#my‑section). |
Input: raw heading text. Output: anchor string prefixed with #. |
get_all_topics(data: str) → tuple[list[str], list[str]] |
Scans a markdown document for level‑2 headings (## …) and returns both the titles and their generated anchors. |
Input: full markdown text. Output: (titles, anchors). |
get_all_html_links(data: str) → list[str] |
Extracts the names of existing HTML <a name="…"> anchors (used by the generator to keep track of previously created links). |
Input: markdown/HTML text. Output: list of anchor names. |
get_links_intro(links: list[str], model: Model, language: str = "en") → str |
Sends the list of link anchors to the LLM and asks it to produce a short introductory paragraph that will be placed before the Links section. | Input: list of anchor strings, LLM model, language code. Output: generated paragraph. |
get_introdaction(global_data: str, model: Model, language: str = "en") → str |
Generates a high‑level introduction for the whole documentation set, based on the compressed “code‑mix” text. | Input: concatenated compressed data, LLM model, language code. Output: introduction markdown. |
generete_custom_discription(splited_data: str, model: Model, custom_description: str, language: str = "en") → str |
Iterates over pre‑split chunks of text, asking the LLM to answer a custom query (e.g., “Describe the authentication flow”). Stops at the first non‑empty answer that does not contain the sentinel !noinfo. |
Input: iterable of text chunks, LLM model, user‑provided query, language. Output: the first satisfactory description or an empty string. |
Logic Flow Highlights
- Anchor Generation –
generate_markdown_anchornormalises Unicode, replaces spaces with hyphens, strips illegal characters, collapses repeated hyphens, and finally prefixes#. - Topic Extraction –
get_all_topicswalks the markdown string searching for\n##markers, slices out the heading text, and builds a parallel list of anchors via the helper above. - LLM Interaction – Both
get_links_introandget_introdactionconstruct a system‑user prompt array and callmodel.get_answer_without_history. The system messages embed static prompts (BASE_INTRODACTION_CREATE_TEXT,BASE_INTRO_CREATE) from the central configuration, ensuring consistent wording across the pipeline. - Custom Description Loop –
generete_custom_discriptionrespects strict response rules (no hallucination, empty output on missing info). It repeats the request for each chunk until a meaningful answer appears, using the sentinel!noinfoto detect “no data”.
Assumptions & Side‑effects
- Input markdown follows the conventional
##heading style; otherwise topics will be missed. - The LLM model supplied implements
get_answer_without_history(prompt: list[dict]) → strand may raise network‑related exceptions – callers must handle them. - All functions are pure except for the LLM calls, which have external side‑effects (API usage, rate limits).
- The module does not modify the original
datastrings; it only returns derived values.
Interaction with the Rest of the System
- Compression Stage – The output of
compress_to_one(a single large string) is passed toget_introdactionto obtain a human‑readable preface. - DocFactory / UI – The tuples
(titles, anchors)fromget_all_topicsfeed the table‑of‑contents builder; the introductory paragraphs are concatenated with the generated code‑block descriptions to form the final markdown document shown in the UI. - Configuration Layer – Static prompt fragments (
BASE_INTRODACTION_CREATE_TEXT,BASE_INTRO_CREATE) live inengine.config.config; any change there instantly propagates to this module.
autodocgenerator.preprocessor.settings – Project‑wide Configuration Wrapper
Responsibility
Encapsulates per‑project metadata (name, arbitrary key/value pairs) and produces a ready‑to‑inject prompt segment (ProjectSettings.prompt) that is later concatenated with other system prompts (e.g., compression, introduction).
Key Class
class ProjectSettings:
def __init__(self, project_name: str)
def add_info(self, key, value) # store additional metadata
@property
def prompt(self) -> str # render the full settings block
- Construction –
project_nameis mandatory; additional data can be added at any time viaadd_info. - Prompt Generation – The
promptproperty concatenates the globalBASE_SETTINGS_PROMPT(fromengine.config.config) with a lineProject Name: …and then eachkey: valuepair on its own line. The result is a plain‑text block that can be inserted into any LLM prompt to give the model context about the target project.
Assumptions & Side‑effects
- The caller is responsible for calling
add_infobefore accessingprompt; otherwise only the project name appears. - No external I/O occurs; the class is purely in‑memory.
System Interaction
- Model Layer – When building prompts for compression or description generation, the
ProjectSettings.promptstring is appended to the system messages, ensuring the LLM is aware of project‑specific constraints (e.g., target framework, coding standards). - Configuration Centralisation – By pulling
BASE_SETTINGS_PROMPTfrom the shared config, the module guarantees that any organisational policy changes (license headers, confidentiality notices) are automatically reflected across all generated documentation.
Together, postprocess.py and settings.py form the post‑compression phase of the Auto‑Doc Generator: they turn the compact “code‑mix” into a structured, navigable markdown document enriched with project‑specific context.
autodocgenerator.preprocessor.spliter – Chunking & LLM‑driven Documentation Generation
Purpose
This module bridges the compression stage (a single large “code‑mix” string) and the post‑processing stage that produces the final markdown document. It:
- Splits the massive mixed‑code payload into size‑limited chunks that respect the LLM token limits.
- Invokes the configured language model (sync or async) for each chunk, feeding the previous chunk’s output as context so the generated documentation remains coherent across parts.
- Aggregates the per‑chunk answers into one continuous markdown string while reporting progress to the UI.
split_data(data: str, max_symbols: int) -> list[str]
| Parameter | Meaning |
|---|---|
data |
The full compressed code‑mix (plain text). |
max_symbols |
Approximate maximum character count that a single LLM request may contain (derived from the model’s token budget). |
Logic flow
- Initial line split –
data.split("\n")creates a list of logical lines (splited_by_files). - Oversize line handling – A loop repeatedly checks each line; if a line exceeds
1.5 × max_symbolsit is broken in half (usingint(max_symbols/2)) and the two halves are re‑inserted. This guarantees no individual element is dramatically larger than the budget. - Chunk assembly – A second pass walks the (now‑sanitized) line list, concatenating lines into
split_objects. A new chunk starts when the current chunk would exceed1.25 × max_symbols. Newlines are preserved.
Output – A list of strings, each guaranteed to be ≤ ≈ max_symbols characters, ready for a single LLM call.
Assumptions & side‑effects
- Input is plain‑text; no binary data is expected.
- The function never performs I/O; it works purely in memory.
- It assumes
"\n"is the line delimiter used throughout the pipeline.
write_docs_by_parts(part: str, model: Model, global_info: str, prev_info: str = None, language: str = "en") -> str
Responsibility
Builds a prompt for the part‑completion LLM and returns the model’s raw answer stripped of surrounding markdown fences.
Prompt composition
| Message role | Content |
|---|---|
system |
“For the following task use language {language}”. |
system |
BASE_PART_COMPLITE_TEXT (static instruction fragment from engine.config). |
user |
The current code chunk (part). |
(optional) system |
“it is last part of documentation that you have write before{prev_info}” – provides continuity when prev_info contains the previous chunk’s output. |
user |
The same part again (keeps the user‑side payload at the end of the list). |
The model is called via model.get_answer_without_history(prompt=prompt).
If the answer is wrapped in triple back‑ticks, they are removed; otherwise the raw answer is returned.
Inputs / Outputs
part– a single chunk fromsplit_data.model– any concrete implementation ofengine.models.gpt_model.Model(sync).global_info– currently unused (commented out) but reserved for future global context.prev_info– the tail of the previous answer (up to ~3000 chars) to keep the narrative consistent.- Returns a markdown‑ready string (code fences stripped).
Side‑effects – None; the function only builds data structures and calls the LLM.
async_write_docs_by_parts(...) -> Awaitable[str]
Same semantics as write_docs_by_parts but:
- Accepts an
AsyncModelinstance and runsawait async_model.get_answer_without_history. - Executes inside an
asyncio.Semaphoresupplied by the caller, limiting concurrent LLM requests (default 4 inasync_gen_doc_parts). - Optionally calls
update_progress()after the model response to drive UI progress bars.
All other behaviours (prompt layout, fence stripping) are identical.
gen_doc_parts(full_code_mix, global_info, max_symbols, model, language, progress_bar) -> str
Workflow
split_data→ list of chunks.progress_bar.create_new_subtaskregisters a sub‑task whose length equals the number of chunks.- Iterates over chunks:
- Calls
write_docs_by_partswith the current chunk, the model, and the previous chunk’s tail (result). - Appends the returned markdown to
all_result. - Truncates
resultto its last 3000 characters (kept for continuity). - Updates the UI progress bar.
- Calls
- Removes the sub‑task and returns the concatenated documentation.
Assumptions
progress_barimplements theBaseProgressinterface (create/update/remove sub‑tasks).- The model respects the token budget implied by
max_symbols.
async_gen_doc_parts(...) -> Awaitable[str]
Parallel version of gen_doc_parts:
- Splits the input once.
- Creates a semaphore (
max 4 concurrent calls). - Launches an
async_write_docs_by_partstask for each chunk, passing a lambda that updates the progress bar. - Awaits
asyncio.gatherto collect all answers, concatenates them with double newlines, and cleans up the progress UI.
Interaction with the Rest of the System
- Compression Stage – Receives the output of
compress_to_one(a single large string). - DocFactory / UI – The returned markdown is fed to the final document assembler, which adds the table‑of‑contents (from
get_all_topics) and introductory sections. - Configuration Layer – Prompt fragments (
BASE_PART_COMPLITE_TEXT) are centrally defined; any change propagates automatically. - Model Layer – Both sync and async model classes live in
engine.models.gpt_model; this module treats them uniformly via theModel/AsyncModelabstractions.
Key Takeaways for New Developers
- The module’s only external side‑effects are LLM API calls and UI progress updates.
- All chunk‑splitting logic is deterministic and pure; you can safely unit‑test
split_datawith variousmax_symbols. - When extending the pipeline (e.g., adding a new system prompt), modify
BASE_PART_COMPLITE_TEXTor adjust thepromptconstruction in the two “write” functions. - For higher throughput, tune the semaphore limit in
async_gen_doc_partsaccording to your LLM provider’s rate limits.
Module: autodocgenerator.ui.progress_base
(UI‑level helpers that expose a tiny, test‑friendly progress‑tracking API for the rest of the documentation‑generation pipeline.)
Overview
This file defines a very small abstraction layer over Rich’s Progress object.
The rest of the system (e.g. the doc‑generation workers in autodocgenerator.core) never talks to Rich directly – they depend only on the BaseProgress protocol.
LibProgress is the concrete implementation used by the CLI, while the abstract base makes it trivial to swap in a mock progress reporter for unit‑tests.
BaseProgress (abstract protocol)
| Method | Purpose | Expected behaviour |
|---|---|---|
create_new_subtask(name: str, total_len: int) |
Starts a sub‑task that represents the processing of a single chunk of code (e.g. one call to the LLM). | Returns nothing; the concrete class should store an identifier for later updates. |
update_task() |
Advances the currently active task by one step. | If a sub‑task is active it is advanced, otherwise the global “General progress” task is advanced. |
remove_subtask() |
Marks the current sub‑task as finished and discards its handle. | After this call update_task() will affect the base task again. |
BaseProgress contains only the method signatures (implemented as ...). It is deliberately lightweight – no state, no Rich dependency – so that test doubles can inherit from it and override the methods.
LibProgress (Rich‑backed implementation)
class LibProgress(BaseProgress):
def __init__(self, progress: Progress, total: int = 4):
…
Constructor
progress– an already‑configuredrich.progress.Progressinstance (usually created in the CLI entry‑point).total– the expected number of top‑level steps (default 4).- Creates a base task named “General progress” with the supplied total.
- Initializes
_cur_sub_tasktoNone; this attribute holds the Rich task ID of the active sub‑task.
create_new_subtask(name, total_len)
- Calls
self.progress.add_task(name, total=total_len)and stores the returned task ID in_cur_sub_task. - The
total_lenargument is the number of incremental updates the sub‑task will receive (e.g. the number of code chunks).
update_task()
- If a sub‑task is active (
_cur_sub_taskis notNone) it advances that task by one unit. - Otherwise it advances the base task.
- This design lets the higher‑level generator code treat both granular (per‑chunk) and overall progress uniformly.
remove_subtask()
- Clears the reference to the current sub‑task, effectively signalling its completion.
- No explicit call to
Progress.remove_taskis made – Rich automatically hides finished tasks; the UI only stops updating the sub‑task.
Side‑effects
- UI updates – each call to
update_tasktriggers a redraw of the Rich progress bar. - State mutation – internal task IDs are stored/cleared; no external data is modified.
Interaction with the Rest of the System
- Doc‑generation workers (
gen_doc_parts,async_gen_doc_parts, etc.) receive aBaseProgressinstance via dependency injection. - Before processing a batch of code chunks they call
create_new_subtaskwith a descriptive name (e.g. “Generating docs for module X”) and the number of chunks. - After each LLM request they invoke
update_task()– this drives the progress bar shown to the user. - When the batch finishes they call
remove_subtask()so that subsequent batches reuse the base task.
Because the workers only depend on the abstract protocol, they can be exercised in tests with a dummy progress that simply records calls, keeping the test suite fast and deterministic.
Extending / Customising
- Alternative UI back‑ends – implement a new subclass of
BaseProgressthat forwards calls totqdm, a web‑socket UI, or a logger. - More detailed metrics – add extra methods (e.g.
set_description) to the abstract class and implement them inLibProgressusingProgress.update(task_id, description=…). - Rate‑limit handling – the progress layer is deliberately stateless; any throttling logic belongs in the model‑calling code, not here.
Testing Tips
class DummyProgress(BaseProgress):
def __init__(self):
self.calls = []
def create_new_subtask(self, name, total_len):
self.calls.append(("create", name, total_len))
def update_task(self):
self.calls.append(("update",))
def remove_subtask(self):
self.calls.append(("remove",))
Inject DummyProgress into gen_doc_parts and assert the expected sequence of calls – this validates that the generation pipeline correctly reports progress without needing a terminal.
Key Takeaway for New Developers
progress_base.py isolates UI concerns from the core documentation engine. By coding against BaseProgress you keep the generation logic pure, enable fast unit tests, and retain the flexibility to swap the visual progress implementation at runtime.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autodocgenerator-0.5.9.tar.gz.
File metadata
- Download URL: autodocgenerator-0.5.9.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.12.12 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e535a171b71e62d6a582ffb21fbb0ec6064491859d466f7248dc18f02e9eb5e
|
|
| MD5 |
a9cca6703869ae5bfb9f5ba0bf61e06b
|
|
| BLAKE2b-256 |
b42716116266b8d93ff13dbdec4299c8c7f5ce0e6a401a89bf99fa00e0a0e8bc
|
File details
Details for the file autodocgenerator-0.5.9-py3-none-any.whl.
File metadata
- Download URL: autodocgenerator-0.5.9-py3-none-any.whl
- Upload date:
- Size: 38.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.12.12 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1ba4bf3a41dc030cbda9239b2048f6f726effbbc8019e5243556778fda68d67
|
|
| MD5 |
eadabd62565bf06eca9085ea4f41d66c
|
|
| BLAKE2b-256 |
cda6e493be00c46ae158af4a8cd4ec859e474391c3c4bd92d89de4f85b538064
|