This Project helps you to create docs for your projects
Project description
Executive Navigation Tree
- 📂 Configuration
- ⚙️ Execution Engine
- 🔧 Processing Pipeline
- 🛠️ Setup & Install
Project Overview – Auto Doc Generator
1. Project Title
Auto Doc Generator
2. Project Goal
Auto Doc Generator is a developer‑focused utility that automates the creation of high‑quality project documentation. By extracting structural information, comments, and docstrings from source code, the tool produces ready‑to‑publish Markdown (or other markup) files, relieving developers of the repetitive, error‑prone manual documentation process. The primary problem it solves is the gap between fast‑paced code development and the often‑neglected upkeep of accurate, comprehensive documentation.
3. Core Logic & Principles
| Aspect | Description |
|---|---|
| Source Inspection | The generator parses the target project’s source files (Python, JavaScript, etc.) using language‑specific Abstract Syntax Tree (AST) libraries. It walks the AST to locate modules, classes, functions, and their associated docstrings/comments. |
| Metadata Extraction | For each discovered element it extracts: • Name and signature • Docstring (or inline comment block) • Type hints / annotations (when available) • Public API surface (filtering out private/dunder members). |
| Template Rendering | Extracted metadata is fed into a lightweight templating engine (e.g., Jinja2). Pre‑defined Markdown templates define the layout for module overviews, class sections, function tables, and usage examples. |
| Configuration‑Driven | Users supply a concise YAML/JSON configuration that controls: • Input directories / file patterns • Output location and file naming • Template selection and custom variables • Inclusion/exclusion rules (e.g., ignore test files). |
| CLI Interface | A small command‑line wrapper parses the configuration, invokes the extraction pipeline, and writes the rendered documentation to disk. The CLI also provides flags for “dry‑run”, verbose logging, and incremental updates. |
| Extensibility | The architecture isolates three interchangeable components: Parser, Renderer, and Dispatcher. Adding support for a new programming language or output format only requires implementing the relevant parser or template set, without touching the core workflow. |
Key algorithms include recursive AST traversal, pattern‑based file discovery, and context‑aware string sanitisation to ensure that generated markdown is syntactically correct.
4. Key Features
- Automatic API documentation – Generates module, class, and function reference sections directly from source code.
- Multi‑language support – Built‑in parsers for Python (via
ast) and JavaScript/TypeScript (viaesprima/@babel/parser). - Customizable Markdown templates – Ship with default templates; users can supply Jinja2 templates to match their style guide.
- Config‑driven operation – Single YAML/JSON file defines inputs, outputs, filters, and template variables.
- Command‑line tool – Easy integration into development workflows (
autodoc generate --config path/to/config.yaml). - Incremental generation – Detects unchanged files and skips re‑rendering, speeding up CI runs.
- CI/CD ready – Designed to be called from GitHub Actions, GitLab CI, Jenkins, etc. (returns non‑zero exit code on failures).
- Extensible plugin architecture – Add new language parsers or output formats by implementing the
ParserorRendererinterface.
5. Dependencies
| Dependency | Purpose | Version (minimum) |
|---|---|---|
Python >=3.8 |
Runtime environment | – |
Jinja2 |
Template rendering engine | 3.0 |
PyYAML |
YAML configuration parsing | 5.4 |
ast (standard library) |
Python source parsing | – |
esprima / @babel/parser (optional) |
JavaScript/TypeScript parsing | 4.0 / 7.0 |
click |
CLI argument handling | 8.0 |
watchdog (optional) |
File‑system monitoring for incremental builds | 2.1 |
Optional – If the project is used only for Python, the JavaScript parser and watchdog are not required.
End of Overview
This document captures the essential purpose, inner workings, primary capabilities, and required tools for the Auto Doc Generator project, providing a solid foundation for onboarding, stakeholder communication, and further development planning.
To set up the documentation generation workflow you need to:
-
Install the helper scripts
- Windows (PowerShell): Run the script directly from the raw file URL
irm https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.ps1 | iex
- Linux/macOS (bash): Download and execute the script with curl
curl -sSL https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.sh | bash
- Windows (PowerShell): Run the script directly from the raw file URL
-
Add the required secret to GitHub Actions
- In your repository’s Settings → Secrets and variables → Actions, create a new secret named
GROCK_API_KEY. - Set its value to the API key you obtain from the Grock documentation site: https://grockdocs.com.
- In your repository’s Settings → Secrets and variables → Actions, create a new secret named
-
Workflow execution
- The
.github/workflows/autodoc.ymlworkflow triggers manually (workflow_dispatch). - It calls the reusable workflow defined in
.github/workflows/reuseble_agd.yml, which installs theautodocgeneratorpackage, runs the documentation generator, and commits the updatedREADME.md. - The secret
GROCK_API_KEYis passed to the reusable workflow and made available asAPI_KEYfor the documentation generation step.
- The
| Parameter | Description (as shown in the code) |
|---|---|
project_directory: str |
Path to the root of the project you want to document. |
project_settings: ProjectSettings |
Instance that holds global information about the documentation project. |
sync_model: Model = None |
Synchronous language model used for processing (e.g., GPTModel). |
async_model: AsyncModel = None |
Asynchronous language model used for processing (e.g., AsyncGPTModel). |
ignore_files: list = [] |
List of glob‑style patterns for files/folders that must be ignored when building the repository mix. |
language: str = "en" |
Language code for the generated documentation (default "en"). |
progress_bar: BaseProgress = BaseProgress() |
Progress‑bar implementation; in the example a LibProgress wrapping a Rich Progress instance is used. |
Full example (taken directly from the repository)
if __name__ == "__main__":
# Patterns of files/folders that should be ignored
ignore_list = [
"*.pyo", "*.pyd", "*.pdb", "*.pkl", "*.log", "*.sqlite3", "*.db", "data",
"venv", "env", ".venv", ".env", ".vscode", ".idea", "*.iml", ".gitignore",
".ruff_cache", ".auto_doc_cache", "*.pyc", "__pycache__", ".git",
".coverage", "htmlcov", "migrations", "*.md", "static", "staticfiles",
".mypy_cache"
]
# Initialise language models
sync_model = GPTModel(API_KEY) # synchronous model
async_model = AsyncGPTModel(API_KEY) # asynchronous model
# Rich progress bar configuration
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
BarColumn(),
TaskProgressColumn(),
) as progress:
# Project‑level settings
project_settings = ProjectSettings("Auto Doc Generator")
project_settings.add_info(
"global idea",
"""This project was created to help developers make documentations for them projects"""
)
# Create the Manager instance
manager = Manager(
r"C:\Users\sinic\OneDrive\Документы\GitHub\ADG", # project_directory
project_settings, # ProjectSettings instance
sync_model=sync_model, # synchronous model
async_model=async_model, # asynchronous model
ignore_files=ignore_list, # ignore patterns
progress_bar=LibProgress(progress), # progress bar wrapper
language="en" # documentation language
)
# Example usage of the manager (calls are commented out in the source)
# manager.generate_code_file()
# manager.generate_global_info_file(use_async=True, max_symbols=5000)
# manager.generete_doc_parts(use_async=True, max_symbols=4000)
# Generate documentation using a factory with the IntroLinks module
manager.factory_generate_doc(
DocFactory(
IntroLinks(),
# IntroText(),
)
)
autodocconfig.yml – available options
The file is a YAML document. The following top‑level keys are read by the generator (see autodocgenerator/auto_runner/config_reader.py):
| Key | Type | Description | Example |
|---|---|---|---|
project_name |
string | Name of the project that will appear in the generated documentation. | project_name: "My Awesome Project" |
language |
string | Language code for the documentation (default en). | language: "en" |
project_additional_info |
mapping (key‑value pairs) | Arbitrary additional information about the project. Each entry is added to ProjectSettings. |
project_additional_info:\n global idea: "Tool to auto‑generate docs" |
ignore_files |
list of strings | Glob patterns for files/directories that must be ignored during analysis. If omitted the default list defined in Config.__init__ is used. |
ignore_files: ["*.log", "tests/"] |
custom_descriptions |
list of strings | Free‑form text blocks that become custom modules (CustomModule) in the documentation generation pipeline. Each string is passed to CustomModule. |
custom_descriptions:\n - "Explain installation workflow …"\n - "How to use Manager class …" |
How to write the file
project_name: "Auto Doc Generator"
language: "en"
project_additional_info:
global idea: "This project was created to help developers make documentations for them projects"
custom_descriptions:
- "explain how install workflow with install.ps1 and install.sh scripts for install you should use links irm https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.ps1 | iex for powershell and curl -sSL https://raw.githubusercontent.com/Drag-GameStudio/ADG/main/install.sh | bash for linux based systems and also you have to add secret variable to git hub action GROCK_API_KEY with your api key from grock docs https://grockdocs.com to make it work"
- "how to use Manager class what parameters i need to give. give full example of usage"
- "explain how to write autodocconfig.yml file what options are available"
Only the keys listed above are recognized; any other keys are ignored. The file must be saved as autodocconfig.yml in the repository root.
autodocgenerator.auto_runner.config_reader
Purpose – Reads autodocconfig.yml and builds the runtime configuration used by the documentation generator.
Core class – Config
| Attribute | Meaning |
|---|---|
ignore_files |
Default glob patterns that are excluded from the source‑code scan (e.g. *.pyc, venv, .git). |
language |
Target language for generated docs (default en). |
project_name |
Human‑readable name of the inspected project. |
project_additional_info |
Arbitrary key/value pairs that are injected into ProjectSettings. |
custom_modules |
List of CustomModule objects – each wraps a custom description supplied by the user. |
Builder‑style setters
All setters (set_language, set_project_name, add_project_additional_info, add_ignore_file, add_custom_module) return self, enabling fluent chaining in read_config.
Helper methods
get_project_settings()– creates aProjectSettingsinstance (frompreprocessor.settings) and fills it with the additional info dictionary.get_doc_factory()– builds twoDocFactoryobjects:- docFactory – contains the user‑defined
CustomModules. - introFactory – pre‑populated with the built‑in intro modules (
IntroLinks,IntroText).
- docFactory – contains the user‑defined
Both factories are later passed to the Manager to render the final document.
Function – read_config(file_data: str) -> Config
- Parses the YAML string with
yaml.safe_load. - Populates a fresh
Configinstance:- merges
ignore_filesfrom the file, - sets
language&project_name, - adds any
project_additional_infoentries, - converts each entry in
custom_descriptionsinto aCustomModule.
- merges
- Returns the fully‑configured
Configobject.
Assumptions & side‑effects – The YAML file is well‑formed; missing keys fall back to defaults. No I/O is performed here – the caller supplies the file contents.
autodocgenerator.auto_runner.run_file
Purpose – Entry point executed by the GitHub Action (and locally) that orchestrates the whole documentation generation pipeline.
gen_doc function
def gen_doc(project_settings, ignore_list, project_path,
doc_factory, intro_factory) -> str:
- Model creation – Instantiates a synchronous
GPTModeland an asynchronousAsyncGPTModelusing the globalAPI_KEYfromengine.config.config. - Manager construction – Builds a
Managerwith:project_path– root directory to scan,project_settings– fromConfig.get_project_settings(),- the two GPT model objects,
ignore_fileslist,- a
ConsoleGtiHubProgressprogress‑bar implementation, - language (
en).
- Generation steps (executed sequentially):
generate_code_file()– extracts source code snippets.generate_global_info_file()– produces high‑level project description (sync, ≤ 8000 symbols).generete_doc_parts()– creates the main documentation sections (sync, ≤ 8000 symbols).factory_generate_doc(doc_factory)– renders custom user‑defined sections.factory_generate_doc(intro_factory)– renders the intro (links & text).
- Returns the final markdown content by reading the cached file with key
"output_doc".
Script execution (if __name__ == "__main__":)
- Loads autodocconfig.yml.
- Calls
read_config→ obtainsConfig. - Extracts
project_settingsand the two factories. - Invokes
gen_docwith the current directory ("."). - The resulting markdown is written to
.auto_doc_cache/output_doc.mdby theManager; the GitHub Action later copies it toREADME.md.
Interaction map
- ConfigReader → supplies
ProjectSettings, ignore list, and factories. - Manager (in
autodocgenerator.manage) → core orchestrator that talks to the GPT models, the pre‑processor (code splitting, compression), and the UI progress bar. - DocFactory → aggregates modules (
CustomModule,IntroLinks,IntroText) that each know how to render a specific markdown chunk.
Important assumptions
API_KEYenvironment variable is set (via the workflow secretGROCK_API_KEY).- The project path is readable and contains Python source files.
- The cache directory
.auto_doc_cacheexists or will be created byManager.
Quick start for developers
# Install dependencies
pip install autodocgenerator
# Generate docs (uses autodocconfig.yml in CWD)
python -m autodocgenerator.auto_runner.run_file
The script will produce README.md when run inside the reusable GitHub Action, or you can read the returned string for custom handling.
Configuration constants
The module autodocgenerator.engine.config.config centralises static prompts and runtime settings used by the AutoDoc pipeline.
| Symbol | Purpose | Typical usage |
|---|---|---|
BASE_SYSTEM_TEXT |
System‑level instruction fed to the LLM at the very start of a session. It forces the model to treat each incoming snippet as a partial view of the whole codebase and to keep refining its analysis. | Passed to the GPTModel as the first message in the conversation history. |
BASE_PART_COMPLITE_TEXT |
Prompt that tells the model to produce a concise (≈ 0.5‑1 k characters) documentation fragment for a given code piece. | Used by the DocFactory when rendering custom sections. |
BASE_INTRODACTION_CREATE_TEXT |
Detailed directive for building an “Executive Navigation Tree”. It encodes strict anchoring rules and hierarchy constraints that the LLM must obey when transforming a list of markdown links. | Consumed by the IntroFactory to generate the navigation tree. |
BASE_INTRO_CREATE |
High‑level briefing that asks the model to write a professional project overview (title, goal, core logic, features, dependencies). | Invoked when the initial project summary is required. |
BASE_SETTINGS_PROMPT |
Prompt that establishes a persistent “project knowledge base” – the model memorises key‑value parameters and re‑applies them on later calls. | Employed by the SettingsReader to seed the context. |
get_BASE_COMPRESS_TEXT(start, power) |
Factory function returning a prompt for summarising large code snippets. It dynamically adjusts the allowed length (~start/power characters) and forces a strict usage example block. |
Called by the Compression step before sending oversized code to the LLM. |
API_KEY |
Loaded from the environment (.env) at import time. It is the authentication token required by the Groq API. |
Shared by all model wrappers. |
MODELS_NAME |
Ordered list of fallback model identifiers. The system will rotate through this list if a request fails for the current model. | Referenced by Model/AsyncModel during request retries. |
Assumptions & side‑effects – The environment variable API_KEY must be present; otherwise API_KEY becomes None and API calls will fail. No file I/O occurs here; all values are in‑memory constants.
GPTModel / AsyncGPTModel – Groq model adapters
These two classes are thin wrappers around the Groq SDK that implement the abstract base classes Model (sync) and AsyncModel (async) defined in engine.models.model. Their responsibility is to provide a uniform generate_answer interface used throughout the documentation pipeline.
Common behaviour
-
Construction – Accepts an
api_key(defaults to the module‑levelAPI_KEY) and an optionalHistoryobject that stores the conversation turn list. The base class initialises:self.api_key– stored for later client creation.self.history– holds prior messages, enabling with_history mode.self.regen_models_name– a copy ofMODELS_NAME.self.current_model_index– index of the model currently being used.
-
Message selection –
- If
with_history=True, the entireself.history.historylist is sent. - Otherwise the caller‑provided
promptstring is used directly.
- If
-
Retry loop – A
while Trueloop attempts a completion with the model atself.regen_models_name[self.current_model_index]. On any exception the index is advanced (wrapping to0when the end is reached) and the loop retries. If all models fail, an exception"all models do not work"is raised. -
Result extraction – Upon success, the method returns
chat_completion.choices[0].message.content, i.e. the raw LLM text.
GPTModel (synchronous)
class GPTModel(Model):
def __init__(self, api_key=API_KEY, history=History()):
super().__init__(api_key, history)
self.client = Groq(api_key=self.api_key)
Uses the blocking Groq client. The generate_answer method follows the common flow described above, calling self.client.chat.completions.create(...).
AsyncGPTModel (asynchronous)
class AsyncGPTModel(AsyncModel):
def __init__(self, api_key=API_KEY, history=History()):
super().__init__(api_key, history)
self.client = AsyncGroq(api_key=self.api_key)
Uses the async AsyncGroq client; its generate_answer is declared async and therefore awaited by callers (e.g., the Manager when it processes large sections in parallel).
Interaction map
- Manager – Instantiates one
GPTModeland oneAsyncGPTModeland callsgenerate_answerto obtain:- Global project description (sync, ≤ 8000 sym).
- Section‑level documentation (sync, ≤ 8000 sym).
- Optional async calls for parallel chunk processing.
- History – Shared across both wrappers, allowing the pipeline to keep context between calls without re‑sending the whole prompt each time.
- MODELS_NAME – Drives the fallback strategy when a specific model endpoint is unavailable.
Important assumptions
- The Groq service is reachable and the supplied
API_KEYhas sufficient quota. - Network failures raise generic exceptions caught by the retry loop.
- The
Historyobject correctly formats messages as a list of{"role": "...", "content": "..."}dictionaries required by Groq.
Side‑effects – Each call may mutate self.current_model_index and, if with_history=True, may also append new messages to self.history (handled by the base class). No files are written; all state lives in memory.
autodocgenerator/engine/models/model.py
Purpose
Provides the core abstraction for all LLM‑backed generators used by the documentation pipeline.
It defines:
History– in‑memory store of message objects (role,content) that mimics the chat format expected by Groq/AsyncGroq.ParentModel– common constructor handling API‑key storage, a mutable conversation history, and a shuffled fallback list (regen_models_name) derived fromconfig.MODELS_NAME.Model– synchronous wrapper exposinggenerate_answer,get_answer_without_history, andget_answer.AsyncModel– asynchronous counterpart withasyncversions of the same helpers.
Key behaviours
| Method | Behaviour | Side‑effects |
|---|---|---|
History.__init__(system_prompt) |
Starts with an empty list; optionally injects a system prompt (BASE_SYSTEM_TEXT). |
Adds a system message. |
History.add_to_history(role, content) |
Appends a dict to self.history. |
Mutates history. |
ParentModel.__init__(api_key, history) |
Stores API key, history, initial model index 0, and creates a shuffled copy of MODELS_NAME for retry/fallback. |
May reorder regen_models_name. |
Model.generate_answer(with_history=True, prompt=None) |
Placeholder – concrete subclasses (GPTModel, AsyncGPTModel) override this with the real Groq request/response loop. |
Returns a string. |
Model.get_answer(prompt) |
Records user prompt, calls generate_answer, records assistant reply, returns the reply. |
Updates history. |
AsyncModel.generate_answer(...) / AsyncModel.get_answer(...) |
Same logic as Model but async. |
Updates history asynchronously. |
Assumptions & contracts
historymust contain dicts compatible with Groq’smessagesfield.api_keyis a valid Groq token; quota limits are handled elsewhere.- Sub‑classes replace the stub
generate_answerwith a retry loop that iterates overself.regen_models_nameand raises"all models do not work"if none succeed.
autodocgenerator/factory/base_factory.py
Purpose
Defines the pipeline orchestration building blocks.
BaseModule– abstract base class for any documentation generator piece. Sub‑classes implementgenerate(info, model) → str.DocFactory– composes a sequence ofBaseModuleinstances and drives their execution, reporting progress via aBaseProgressimplementation.
Core logic (DocFactory.generate_doc)
- Create a sub‑task
"Generate parts"sized to the number of modules. - Iterate over
self.modules:- Call
module.generate(info, model)(synchronous; async modules are wrapped elsewhere). - Concatenate results with double new‑lines.
- Update progress.
- Call
- Remove the sub‑task and return the assembled document string.
Interaction map
- Modules (
IntroLinks,IntroText,CustomModule, …) receive the sharedModel(orAsyncModel) instance, thus re‑using the sameHistoryand fallback strategy. BaseProgress(fromui.progress_base) supplies a UI‑friendly progress bar;DocFactoryonly calls its public methods (create_new_subtask,update_task,remove_subtask).- The factory does not perform any I/O itself – all side‑effects (LLM calls, history mutation) are delegated to the supplied model.
autodocgenerator/factory/modules/intro.py & general_modules.py
Overview
Concrete BaseModule implementations that plug into DocFactory.
| Module | Responsibility | Main call chain |
|---|---|---|
IntroLinks |
Extracts HTML links from info["full_data"] and asks the model to create a short intro for those links. |
get_all_html_links → get_links_intro(model, language) |
IntroText |
Generates a high‑level project introduction from info["global_data"]. |
get_introdaction(model, language) |
CustomModule |
Generates a custom description for a chunk of source code. | split_data → generete_custom_discription(model, description, language) |
All modules receive the same Model instance, therefore they benefit from the shared History (context continuity) and the same retry/fallback behaviour.
Summary for developers
- Instantiate a concrete model (
GPTModelorAsyncGPTModel) → passes itsHistoryto every module. - Create a
DocFactorywith desiredBaseModulesubclasses. - Call
factory.generate_doc(info, model, progress)→ returns the full documentation string while progress is shown.
The snippet constitutes the glue between the LLM client layer and the modular documentation generation pipeline.
autodocgenerator/manage.py – High‑level orchestration layer
The Manager class glues together every subsystem of Auto‑Doc‑Generator:
| Component | Role in Manager |
|---|---|
CodeMix (preprocessor/code_mix.py) |
Produces a single “code‑mix” file that contains the repository tree and the raw source of every non‑ignored file. |
Spliter / Compressor (preprocessor/spliter.py, preprocessor/compressor.py) |
Split the huge mix into LLM‑friendly chunks (split_data) and optionally compress them (compress_to_one). |
Doc‑Factory (factory/base_factory.py) |
Receives a DocFactory instance and runs the modular documentation pipeline (IntroLinks, CustomModule, …). |
Model objects (engine/models/gpt_model.py) |
Provide synchronous (Model) or asynchronous (AsyncModel) LLM access; all modules share the same history via these objects. |
Progress UI (ui/progress_base.py) |
A thin wrapper around rich’s Progress that reports task completion to the console. |
ProjectSettings |
Holds static project‑level meta‑information (title, global idea, etc.) that is injected into the LLM prompts. |
Core public API
| Method | Purpose | Important I/O / Side‑effects |
|---|---|---|
__init__(project_directory, project_settings, sync_model=None, async_model=None, ignore_files=[], language="en", progress_bar=BaseProgress()) |
Sets up paths, creates the hidden cache folder (.auto_doc_cache) and stores references to the models, settings and UI. |
Creates a folder on disk if missing. |
read_file_by_file_key(file_key) |
Reads one of the three cached artefacts (code_mix, global_info, output_doc). |
Returns a str containing the file contents. |
get_file_path(file_key) |
Resolves the absolute path of a cached artefact. | No side‑effects. |
generate_code_file() |
Instantiates CodeMix, walks the project tree while respecting ignore_files, writes the mixed representation to <cache>/code_mix.txt. |
Updates progress bar (update_task). |
generate_global_info_file(max_symbols=10_000, use_async=False) |
Loads the code‑mix, splits it into ≤ max_symbols chunks, compresses them into a single “global info” markdown using the chosen model, writes the result to <cache>/global_info.md. |
May invoke the LLM (sync or async). |
generete_doc_parts(max_symbols=5_000, use_async=False) |
Reads global info and the raw code‑mix, then generates the first draft of the documentation (doc parts) via gen_doc_parts / async_gen_doc_parts. Result is saved to <cache>/output_doc.md. |
Calls the LLM, may run an asyncio event loop. |
factory_generate_doc(doc_factory) |
Loads global info, the current output_doc and code‑mix, builds the info dict expected by modules, runs doc_factory.generate_doc, prefixes the newly created sections to the existing doc and rewrites <cache>/output_doc.md. |
Relies on the shared self.sync_model; progress is updated after completion. |
Design assumptions
- All file I/O is confined to the hidden cache folder; the original source tree is never modified.
ignore_filesfollows Unix‑style glob patterns (handled byCodeMix.should_ignore).- LLM quota handling is external – the manager simply forwards the appropriate
Modelinstance. - Progress updates are fire‑and‑forget: the manager does not wait for UI rendering.
autodocgenerator/preprocessor/code_mix.py – Repository serializer
CodeMix is a small utility that produces a single textual snapshot of a repository:
| Method | Behaviour |
|---|---|
__init__(root_dir=".", ignore_patterns=None) |
Normalises root_dir to an absolute Path; stores a list of glob patterns that must be excluded. |
should_ignore(path: str) -> bool |
Returns True if path (as a Path) matches any ignore pattern – either against the whole relative path, its basename, or any of its parts. |
build_repo_content(output_file="repomix-output.txt") |
Writes two sections to output_file:1. Structure tree – indented list of directories and files, respecting ignore rules. 2. File payloads – for each non‑ignored file, writes <file path="…"> followed by its raw text (UTF‑8, errors ignored). Errors while reading a file are logged inline. |
Interaction points
- Called only from
Manager.generate_code_file(). - The generated
<cache>/code_mix.txtbecomes the raw input for all downstream preprocessing steps (splitting, compression, doc‑factory).
Side‑effects & Constraints
- Writes overwrites the target file; no backup is made.
- Ignores binary or unreadable files silently (errors are captured and written as plain text).
- Assumes that the repository fits into memory when read line‑by‑line – acceptable for typical Python projects.
Quick usage example
from autodocgenerator.manage import Manager
from autodocgenerator.preprocessor.settings import ProjectSettings
from autodocgenerator.engine.models.gpt_model import GPTModel, AsyncGPTModel
from autodocgenerator.ui.progress_base import LibProgress
from rich.progress import Progress
settings = ProjectSettings("MyProject")
settings.add_info("global idea", "A demo project.")
manager = Manager(
project_directory="path/to/project",
project_settings=settings,
sync_model=GPTModel(API_KEY),
async_model=AsyncGPTModel(API_KEY),
ignore_files=["*.pyc", "__pycache__", ".git"],
progress_bar=LibProgress(Progress()),
language="en",
)
manager.generate_code_file() # creates .auto_doc_cache/code_mix.txt
manager.generate_global_info_file() # compresses it into global_info.md
manager.generete_doc_parts() # first draft → output_doc.md
manager.factory_generate_doc(
DocFactory(IntroLinks())
) # final doc with intro links
The Manager + CodeMix pair therefore constitutes the data‑acquisition layer of the system, turning a raw source tree into the structured inputs required by the LLM‑driven documentation pipeline.
autodocgenerator/preprocessor/compressor.py – Compression & description layer
Responsibility
Transforms the raw repository snapshot (produced by CodeMix) into a compact representation that fits the LLM token limits and later generates short technical descriptions for each code fragment.
Interaction with the rest of the system
- Called by
Manager.generate_global_info_file()(and indirectly byManager.factory_generate_doc). - Receives the list of file‑payload strings from
CodeMix.build_repo_content(). - Uses the model objects (
Model/AsyncModel) supplied by the manager – the quota handling is performed outside this module. - Emits the final compressed string that becomes the input for the post‑processing stage (
postprocess.py).
Key functions
| Function | Main flow | Important notes |
|---|---|---|
compress(data, project_settings, model, compress_power) |
Builds a three‑message prompt (system = project prompt, system = compression template from get_BASE_COMPRESS_TEXT, user = raw data) and calls model.get_answer_without_history. Returns the model’s compressed text. |
compress_power influences the length hint in the template. |
compress_and_compare(data, model, project_settings, compress_power=4, progress_bar=BaseProgress()) |
Splits the list into chunks of size compress_power, compresses each element with compress, concatenates results per chunk, updates a sub‑task on progress_bar. Returns a new list whose length is ceil(len(data)/compress_power). |
Synchronous, fire‑and‑forget UI updates. |
async_compress … / async_compress_and_compare … |
Same logic as the synchronous version but runs each compress call inside an asyncio.Semaphore(4) to limit parallel LLM requests. Progress is updated after each awaited answer. |
Allows the manager to enable use_async=True for faster throughput. |
compress_to_one(data, model, project_settings, compress_power=4, use_async=False, progress_bar=BaseProgress()) |
Repeatedly calls the (a)sync compress‑and‑compare functions, reducing the list until a single string remains. The loop adapts compress_power to 2 when the list is very short. Returns the final compressed document. |
Guarantees that the output fits a single LLM request. |
generate_discribtions_for_code(data, model, project_settings, progress_bar=BaseProgress()) |
For each compressed code fragment builds a detailed “explain‑your‑code” prompt (strict rules, markdown example) and collects the model’s answers. | Used later to produce the code‑description section of the final documentation. |
Assumptions & side‑effects
- The LLM model’s
get_answer_without_historyis stateless; quota limits are enforced elsewhere. - Input strings are assumed to be UTF‑8 text; binary blobs are filtered out earlier by
CodeMix. - Progress UI updates are fire‑and‑forget – the function does not wait for the UI to render.
- The function overwrites its output (the returned string) without persisting intermediate files.
autodocgenerator/preprocessor/postprocess.py – Final markdown polishing
Responsibility
Takes the compressed text (and optional custom descriptions) and builds the final, user‑ready Markdown document: generates anchors, extracts topics, creates introductory sections, and formats custom description blocks.
Interaction with the rest of the system
- Invoked after
compress_to_oneandgenerate_discribtions_for_code. - Consumes the single compressed string and the list of custom description strings.
- Calls the same LLM
Modelinterface to request introductions (get_links_intro,get_introdaction). - Returns ready‑to‑write Markdown that
Manager.factory_generate_docwrites to the output file.
Key functions
| Function | Purpose | Important details |
|---|---|---|
generate_markdown_anchor(header) |
Normalises a header string into a GitHub‑style anchor (#my‑section). |
Uses Unicode NFKC, replaces spaces with -, strips illegal characters, collapses multiple dashes. |
get_all_topics(data) |
Scans a Markdown string for level‑2 headings (## ) and returns a tuple (topics, anchors). |
Relies on simple str.find loops; suitable for the controlled output produced by this pipeline. |
get_all_html_links(data) |
Extracts existing <a name=…> anchors (max 25 chars) and returns them as #anchor strings. |
Helpful when the source already contains manual anchors. |
get_links_intro(links, model, language="en") |
Sends the list of anchors to the LLM with a system prompt (BASE_INTRODACTION_CREATE_TEXT) to generate a short introductory paragraph for the link section. |
Returns raw LLM answer. |
get_introdaction(global_data, model, language="en") |
Similar to the above but operates on the whole document body, using BASE_INTRO_CREATE to obtain a global introduction. |
|
generete_custom_discription(splited_data, model, custom_description, language="en") |
Iterates over already‑split chunks, asks the LLM to produce a custom description (title + <a name='…'> anchor) respecting strict “no‑hallucination” rules. Stops at the first non‑empty, non‑!noinfo answer. |
Returns the first satisfactory description or an empty string. |
Assumptions & side‑effects
- Input Markdown follows the conventions produced by
compressor.py; headings are prefixed with##. - The functions do not write files; they only return strings that the caller assembles and persists.
- All LLM calls are fire‑and‑forget with respect to UI – progress handling is done by the caller (the manager).
- The module assumes the model’s system prompts (
BASE_INTRODACTION_CREATE_TEXT,BASE_INTRO_CREATE) are present in the global config.
Together, compressor and postprocess form the pre‑ and post‑processing stages of the documentation pipeline: they shrink the raw repository dump to a token‑friendly form, enrich it with LLM‑generated summaries, and finally shape a polished Markdown document ready for delivery.
preprocessor/settings.py – ProjectSettings
Responsibility
Collects project‑wide metadata (name + arbitrary key/value pairs) and builds the system prompt that is passed to the LLM before any generation step.
Interaction with the system
- Created by the manager after the configuration file is read.
- Its
promptproperty is concatenated with other prompts (e.g.,BASE_SETTINGS_PROMPT) and supplied tospliter.write_docs_by_partsand to the custom‑description generator in postprocess.
Key members
| Member | Meaning |
|---|---|
project_name |
Human‑readable project identifier (used by LLM). |
info (dict) |
Additional user‑defined fields (e.g., framework, version). |
add_info(key, value) |
Store a new field. |
prompt (property) |
Returns a single string: BASE_SETTINGS_PROMPT + “Project Name: …” + each key: value line. |
Assumptions / Side‑effects
- Relies on
BASE_SETTINGS_PROMPTbeing defined in the global config. - No I/O – only string construction.
preprocessor/spliter.py – Chunking & LLM‑driven part generation
Responsibility
Breaks a massive source‑code dump into token‑friendly chunks, sends each chunk to the LLM (synchronously or asynchronously), and stitches the partial answers back together.
Interaction with the system
- Called by
Manager.factory_generate_docaftercompress_to_one. - Receives the global info prompt (from
ProjectSettings.prompt) and feeds it to the LLM together with each chunk. - Progress updates are routed to a
BaseProgressimplementation (rich‑based or console).
Key functions
| Function | Purpose |
|---|---|
split_data(data, max_symbols) |
Splits raw text on line breaks, then enforces a rough size limit (max_symbols). Handles oversized pieces by halving them recursively and finally groups them into split_objects. |
write_docs_by_parts(part, model, global_info, prev_info=None, language="en") |
Builds the LLM prompt (system language, BASE_PART_COMPLITE_TEXT, optional previous part) and calls model.get_answer_without_history. Strips surrounding Markdown fences. |
async_write_docs_by_parts(...) |
Same as above but works with an AsyncModel, respects a semaphore (max parallel calls) and optionally updates a progress bar. |
gen_doc_parts(full_code_mix, global_info, max_symbols, model, language, progress_bar) |
Orchestrates synchronous processing: splits, iterates, calls write_docs_by_parts, keeps a sliding window of the last 3000 characters for context, updates progress. |
async_gen_doc_parts(...) |
Parallel version using asyncio.gather. |
Assumptions / Side‑effects
- Input
dataalready contains newline separators ("\n"). max_symbolsis tuned to the LLM’s token limit; the function uses heuristics (*1.5,*1.25).- No file I/O – returned string is later written by the manager.
`ui/progress_base.py – Progress reporting abstraction
Responsibility
Provides a minimal interface (BaseProgress) for reporting task progress, with two concrete implementations:
LibProgress– wraps rich’sProgressfor nice terminal UI.ConsoleGtiHubProgress– simpleprint‑based fallback.
Interaction with the system
- Instances are passed to
spliter.gen_doc_parts/async_gen_doc_partsand to the post‑processing stage. - The pipeline creates a sub‑task for each major step (e.g., “Generate doc parts”) and advances it after each chunk is processed.
Key classes
| Class | Core behaviour |
|---|---|
BaseProgress |
Abstract API (create_new_subtask, update_task, remove_subtask). |
LibProgress |
Stores a base task (_base_task) and the current sub‑task (_cur_sub_task). update_task advances the appropriate task. |
ConsoleTask |
Helper that prints a start line and incremental percent progress. |
ConsoleGtiHubProgress |
Uses ConsoleTask for both general and sub‑tasks; suitable when rich is unavailable. |
Assumptions / Side‑effects
LibProgressexpects arich.progress.Progressobject passed at construction.- All methods are side‑effect‑only (printing or updating the UI); they never modify the documentation data.
These three modules constitute the pre‑processing layer: they prepare project context (ProjectSettings), split the raw repository dump into LLM‑friendly parts (spliter), and keep the user informed about progress (progress_base). Their outputs feed directly into the generation and post‑processing stages described elsewhere.
pyproject.toml – Project metadata & dependency manifest
The file is the canonical entry‑point for building, packaging, and installing autodocgenerator (v 0.6.9). It follows the modern PEP 518 layout used by Poetry (the chosen build backend). Below is a concise walkthrough that will help a new developer understand what each block does and why it matters for the overall system.
1. Project definition ([project])
| Key | Meaning | Typical value in this file |
|---|---|---|
name |
The distribution identifier as published on PyPI. | autodocgenerator |
version |
Semantic version of the library. | 0.6.9 |
description |
Short one‑liner shown on the package index. | “This Project helps you to create docs for your projects” |
authors |
List of maintainers with contact data. | [{name = "dima‑on", email = "sinica911@gmail.com"}] |
license |
SPDX‑compatible license declaration. | {text = "MIT"} |
readme |
Path to the long description file (used for PyPI). | README.md |
requires‑python |
Minimum and maximum supported interpreter range. | >=3.11,<4.0 |
These fields are consumed by packaging tools (pip, poetry, build) to generate the wheel / sdist and to populate the metadata that end‑users see.
2. Runtime dependencies (dependencies)
The list enumerates all third‑party packages required for the library to work. A few groups are worth highlighting because they map directly to major subsystems of autodocgenerator:
| Sub‑system | Packages (representative) | Role in the system |
|---|---|---|
| LLM back‑ends | openai, google-genai, groq |
Unified interface (model.get_answer_without_history) to various large‑language‑model APIs. |
| HTTP & auth | httpx, httpcore, requests, urllib3, google-auth, certifi |
Network communication with LLM providers, handling retries, TLS verification, etc. |
| Async & concurrency | anyio, tenacity, rich_progress, tqdm, asyncio (built‑in) |
Retry logic, progress bars, and the async execution model used by async_gen_doc_parts. |
| Project inspection | dulwich, findpython, platformdirs, virtualenv, pywin32-ctypes |
Locating the source tree, reading git objects, discovering the active interpreter, handling Windows‑specific path quirks. |
| Parsing & formatting | markdown-it-py, mdurl, Pygments, pyyaml, tomlkit |
Converting source code/comments into Markdown, reading configuration files, syntax‑highlighting. |
| Caching & compression | CacheControl, zstandard |
Local caching of remote LLM responses and optional compression of large intermediate blobs. |
| Utility & helpers | annotated-types, pydantic, pydantic_core, typing‑extensions, typing‑inspection, more-itertools, RapidFuzz |
Strongly‑typed data models, validation, fuzzy matching when linking symbols, and collection utilities. |
| CLI & UX | cleo, rich, rich_progress, colorama, crashtest, shellingham |
Building the command‑line interface (autodocgenerator entry‑point), colourful terminal output, and graceful error handling. |
| Security & key storage | keyring, rsa |
Storing API tokens securely on the host system. |
| File handling | filelock, msgpack, python-dotenv |
Guarding concurrent writes, binary serialization of intermediate data, loading environment variables. |
| Testing / dev helpers (not listed but may be added later) | – | Typically pytest, mypy, etc., would appear here. |
All version pins are exact (e.g., openai==2.14.0). This guarantees reproducible builds – a crucial property when the library talks to external AI services that may change behaviour across minor releases.
3. Build system ([build‑system])
[build-system]
requires = ["poetry-core>=2.0.0"]
build-backend = "poetry.core.masonry.api"
requires– Poetry supplies the PEP‑517 build backend (poetry-core).build-backend– The entry point thatpipinvokes to build the wheel.
Because the project uses Poetry, developers can run:
poetry install # creates a venv and installs all deps
poetry build # produces .whl and .tar.gz in dist/
poetry publish # upload to PyPI
4. Practical notes for contributors
| Situation | What to do |
|---|---|
| Add a new library (e.g., a new LLM provider) | poetry add <package>. This will automatically update the dependencies block with an exact version. |
| Upgrade a dependency | poetry update <package> – Poetry will rewrite the version pin while keeping the lock file (poetry.lock) in sync. |
| Switch to a different Python range | Edit requires-python (e.g., to support 3.12) and run poetry lock --no-update to regenerate the lock file. |
| Remove unused code | Delete the import and then run poetry remove <package> to keep the manifest lean. |
| Audit security | Run poetry show --outdated and consult snyk / pip-audit on the locked versions. |
5. How this file fits the overall autodocgenerator pipeline
- CLI entry point – Defined elsewhere (
src/autodocgenerator/__main__.py) but depends on the CLI packages listed here (cleo,rich). - Project analysis – Uses
dulwich(git),findpython, andvirtualenvto collect source files, which are later fed into the spliter module (split_data). - LLM communication – The
modelabstractions (OpenAIModel,GoogleGenAIModel,GroqModel) rely on the HTTP and auth libraries declared. - Documentation generation – The core generation functions (
write_docs_by_parts,gen_doc_parts, …) invoke the LLM, format the result withmarkdown-it-py/Pygments, and finally hand the Markdown string to the post‑processing stage. - Progress reporting –
rich/rich_progressprovide the UI thatui/progress_base.pywraps.
All of these runtime components are guaranteed to be available because they are enumerated in dependencies. The lock file (poetry.lock, not shown) pins the exact builds that were tested during development, ensuring the generation pipeline behaves identically on every developer’s machine and on CI.
TL;DR
pyproject.toml is the single source of truth for packaging, Python version support, and the full dependency graph of autodocgenerator. It enables reproducible builds, straightforward CI/CD, and clear visibility into which external libraries power the repository‑analysis → LLM‑prompt → Markdown‑output pipeline. Keeping this file accurate and up‑to‑date is essential for the stability of the whole documentation‑generation system.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autodocgenerator-0.7.1.tar.gz.
File metadata
- Download URL: autodocgenerator-0.7.1.tar.gz
- Upload date:
- Size: 48.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.12.12 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d443f17cb84ca27d6312ccbb3b51fdfdcea60eb209856d1d1234651211290b5a
|
|
| MD5 |
05849c3a0eb57ac8b94864cbbb7d44ca
|
|
| BLAKE2b-256 |
8bf74034deeaf9bbf78c7cb470db92a5813cf8a86ab06219962bd8a4d69232cf
|
File details
Details for the file autodocgenerator-0.7.1-py3-none-any.whl.
File metadata
- Download URL: autodocgenerator-0.7.1-py3-none-any.whl
- Upload date:
- Size: 38.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.12.12 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eea00e38b071ca418ef031aa667bfeb8df889209f5e4d126418380c1f152f74b
|
|
| MD5 |
c5352f3e4ce8197e1035136c7a204f91
|
|
| BLAKE2b-256 |
9a2db988f48997e6224fdd2fb5f07e5ec5397f79570abac0b5572e2af86de747
|