autodocgenerator

This Project helps you to create docs for your projects

These details have not been verified by PyPI

Project description

Project Title
AutoDoc — Automatic Project Documentation Generator

Project Goal

AutoDoc automatically produces high‑quality, up‑to‑date documentation for a codebase by combining source‑code analysis, large‑language‑model (LLM) text generation, and semantic embeddings. The tool scans all relevant files, extracts meaningful snippets, compresses and sorts them, and assembles a coherent document that can be written in one or more parts. It also intelligently decides whether regeneration is necessary by comparing the current repository state against a cached Git commit SHA.

Core Logic & Principles

Layer	Responsibility	Key Components	Interaction Flow
Orchestrator	Holds configuration, models, and mutable documentation state.	`Manager` – singleton‑like object that stores `config`, `llm_model`, `embedding_model`, a progress UI, and a mutable `doc_info` (`DocSchema`).	Every module receives a reference to the `Manager` so they can read/write documentation sections.
Configuration	Reads `autodocconfig.yml` and exposes runtime settings.	`ConfigReader` → `Config` & `StructureSettings`.	`Doc Factory` uses these settings to decide which modules to invoke and how large each part can be (`max_doc_part_size`, `include_intro_*`).
Source‑Code Pipeline	Pre‑processes repository files into chunks suitable for LLM consumption.	`preprocessor` → `code_mix` → `compressor` → `spliter`	`Manager.generate_code_file` orchestrates this chain, producing a list of code‑to‑text candidates.
LLM & Embedding	Transforms code snippets into natural‑language explanations and generates vector embeddings for semantic ordering.	`LLM Wrapper` (`GPTModel` → `sync_model`), `Embedding` (`embedding_model`).	`run_file.gen_doc` calls the LLM to produce text and the embedding model to produce vectors used later by the sorting logic.
Semantic Ordering & Compression	Sorts the generated text by meaning, compresses large sections, and splits the final output into manageable parts.	`Sorting`, `Compressor`, `split_text_by_anchors`	After the LLM produces a raw output, `sort_vectors`, `compress`, and `split_text_by_anchors` produce a clean, well‑structured document.
Persistence & Incremental Regeneration	Writes the final documentation and records the current Git commit for future change detection.	`check_git_status`, `should_change?`, `CacheSettings.last_commit`	The Manager checks if the repository state has changed before running a full pipeline.
Factory Pattern	Enables plug‑in of new documentation modules.	`Doc Factory`, `BaseModule` subclasses	Modules are dynamically aggregated and executed against the `Manager`, making the system highly extensible.

The entire workflow can be started from the command line (Auto Runner) which instantiates the models, loads the configuration, and runs the full generation pipeline.

Key Features

Automated Doc Generation – Scans all project files, ignoring patterns defined in the config, and produces natural‑language documentation.
LLM‑Driven Text Generation – Uses a GPT‑based wrapper to convert code snippets into explanatory text.
Semantic Embedding & Ordering – Generates vector embeddings for each section, sorts them by meaning, and splits long texts into logical anchors.
Compression – Compresses verbose outputs with a dedicated model to keep sections within user‑defined size limits (max_doc_part_size).
Incremental Builds – Checks the current Git SHA against a cached value to decide whether regeneration is needed, saving time on unchanged projects.
Pluggable Module System – Employs a factory pattern (Doc Factory) to allow developers to add custom processing modules without modifying core logic.
Configurable Pipeline – autodocconfig.yml controls language, ignore patterns, project name, thresholds, logging levels, and toggles such as use_global_file or include_intro_*.
Cross‑Library Integration – Built on top of google.genai for embeddings, numpy for vector operations, and standard libraries (re, fnmatch, yaml) for parsing and text manipulation.
Progress Feedback – Integrated progress bar to keep users informed during long runs.

Dependencies

Category	Library / Tool	Purpose
Configuration	`yaml`	Parse `autodocconfig.yml`.
Git Integration	`gitpython` or subprocess calls	Retrieve current commit SHA.
LLM & Embeddings	`google.genai`	Generate embeddings and LLM text.
Numerics & Vector Ops	`numpy`	Handle vector calculations (length, sorting).
Text Processing	`re`, `fnmatch`	Pattern matching, ignore list handling.
CLI & UX	`argparse`, `tqdm`	Command line interface & progress bar.
Utilities	`os`, `pathlib`, `logging`	File system operations, logging.
Project Settings	Custom modules (`projectsettings.py`)	Manage project‑level settings.
Packaging	`setuptools`/`pip`	Install the AutoDoc package.

All dependencies are pure‑Python and available via pip. The project can be installed and run on any environment that has Python 3.10+ and the necessary credentials for the Google GenAI API.

Executive Navigation Tree

📂 Git & Deployment

#check-git-status-component
#install_script
#install_sh
#install-workflow-with-remote-scripts-and-github-secret
#run-file-component
#folder-system
#file-paths

📂 Configuration

#config-reader-component
#projectsettings
#autodoc-config-structure
#dependencies
#cache_settings

📂 Core Architecture

#gpt-model-component
#gptmodel-class
#inputs-outputs
#responsibility
#interaction
#logic-flow
#manager-class
#manager-class-usage
#code-mix
#codemix-module
#global-info
#custom-intro
#factory
#ordering
#sorting-module
#cleanup

📂 Data Processing

#embedding
#embedding-layer
#compressor-component
#compressor-flow
#compress-function
#compress-and-compare-function
#compress-to-one-function
#data-contracts
#data-contract-summary
#split_data

📂 Logging & Error Handling

#logging-and-errors
#logging_component
#error-handling
#progress_component

📂 Documentation Generation

#doc-parts
#write_docs_by_parts
#gen_doc_parts
#doc_content
#doc_head_schema
#doc_info_schema Check Git Status Component

Description

The check_git_status function is a critical component in the Auto Doc Generator pipeline, responsible for determining whether the documentation needs to be regenerated based on Git repository changes.

Functional Flow

The function takes a Manager instance as input and performs the following steps:

Checks if the current GitHub event is a workflow dispatch. If so, it returns True, indicating that the documentation should be regenerated.
Retrieves the last commit hash from the .auto_doc_cache_file.json file.
Uses the get_diff_by_hash function to compare the current repository state with the last committed state, excluding Markdown files.
If the difference exceeds the threshold defined in the Manager instance's configuration or if the last commit hash is empty, it updates the last commit hash and returns True.
Otherwise, it returns False, indicating that the documentation does not need to be regenerated.

Code Structure

The check_git_status function is defined in the autodocgenerator/auto_runner/check_git_status.py file and relies on the following external dependencies:

subprocess for executing Git commands
Manager instance for accessing configuration and repository information
CacheSettings for loading and updating the last commit hash

Config Reader Component

Description

The config_reader module is responsible for parsing the autodocconfig.yml file and extracting relevant configuration settings for the Auto Doc Generator.

Functional Flow

The read_config function takes the contents of the autodocconfig.yml file as input and performs the following steps:

Loads the configuration data using the yaml library.
Extracts the project name, language, ignore files, and build settings from the configuration data.
Creates a Config instance and populates it with the extracted settings.
Extracts custom module descriptions and creates a list of BaseModule instances.
Extracts structure settings and creates a StructureSettings instance.
Returns the Config, BaseModule list, and StructureSettings instance.

Code Structure

The config_reader module is defined in the autodocgenerator/auto_runner/config_reader.py file and relies on the following external dependencies:

yaml for loading configuration data
Config and StructureSettings for representing configuration settings
BaseModule for creating custom module instances

`ProjectSettings` – Metadata Container

Construction – Instantiated with a mandatory project_name.
Dynamic Properties – info dictionary holds arbitrary key/value pairs that are appended to the prompt.
prompt Property – Concatenates:
1. Global BASE_SETTINGS_PROMPT constant.
2. Project Name line.
3. One line per key/value in info.

Method	Input	Output	Notes
`add_info(key, value)`	`str`, `Any`	None	Mutates `info` dictionary
`prompt`	None	`str`	Generated on each access

The file is a YAML document that defines several top‑level sections used by the documentation generator.

project_name – Title of the project.
language – Language code for generated documentation.
ignore_files – List of patterns (glob style) for files and directories that the tool should skip. Typical entries include compiled artefacts, virtualenv folders, IDE folders, databases, logs, and markdown files.
build_settings – Controls the build process.
- save_logs: Boolean to keep log files.
- log_level: Integer indicating verbosity.
- threshold_changes: Numeric limit for what constitutes a significant change.
structure_settings – Determines how the output is organized.
- include_intro_links: Add links at the beginning.
- include_intro_text: Add introductory text.
- include_order: Preserve ordering of elements.
- use_global_file: Reference a single global documentation file.
- max_doc_part_size: Size cap for individual documentation blocks.
project_additional_info – Custom free‑form fields; here a key global idea contains a short project description.
custom_descriptions – List of descriptive strings that can be injected into generated content, often containing instructions or explanations about installation scripts, environment secrets, or how to use certain classes or modules.

`install.ps1` – CI Setup & Project Configuration

Purpose
Automates creation of the GitHub Actions workflow file and a default autodocconfig.yml for a newly cloned project.

Entity	Type	Role	Notes
`$currentFolderName`	Variable	Project name	Derived from current folder, inserted into config
`.github/workflows/autodoc.yml`	Workflow file	Reusable workflow trigger	Points to Drag‑GameStudio/ADG reusable workflow
`autodocconfig.yml`	YAML	Runtime configuration	Contains ignore patterns, build and structure settings

Core Flow

Directory Creation – New-Item -ItemType Directory ensures .github/workflows/ exists.
Workflow Generation – $content holds the workflow YAML; written via Out-File.
Configuration Generation – $configContent is a multi‑line PowerShell string that populates autodocconfig.yml.
Feedback – Write-Host signals completion.

The script uses here‑strings (@'…'@ and @"…@") to avoid variable interpolation.

Interaction with the Rest of the Pipeline

Component	Interaction	Notes
`BaseLogger`	Used by `DocFactory`, `Manager`, and other modules to emit status messages.	Singleton ensures a single logger instance.
`BaseProgress` subclasses	Employed by `Manager.generate_doc_parts` and `DocFactory` to report progress.	`LibProgress` is optional if Rich is available.
`install.ps1`	Pre‑runs during project bootstrap to provide CI hooks and configuration.	Generates files before `Auto Runner` is invoked.

Data Contract Summary

Entity	Type	Role	Notes
`log.message`	`str`	Log text	Human‑readable
`log.level`	`int`	Severity threshold	Lower = less filtering
`log.file_path`	`str`	Target file for `FileLoggerTemplate`	Appended, UTF‑8
`progress.total_len`	`int`	Sub‑task length	Updated per increment
`config.project_name`	`str`	Project identifier	Injected into config file

The provided code contains no external library calls beyond standard library and rich; therefore, no external behavior assumptions are made.

`install.sh` – CI Setup & Runtime Configuration

Purpose
Automates the creation of a GitHub Actions workflow (autodoc.yml) and a default autodocconfig.yml for a freshly cloned Auto‑Doc project. The script is executed during project bootstrap.

Entity	Type	Role	Notes
`.github/workflows`	Directory	Workflow host	Created if missing
`autodoc.yml`	YAML	Reusable workflow trigger	Uses `Drag‑GameStudio/ADG/.github/workflows/reuseble_agd.yml@main`
`autodocconfig.yml`	YAML	Runtime configuration	Populated with project name, language, ignore patterns, build and structure settings
`$currentFolderName`	Variable	Project identifier	Derived from the current working directory (`basename "$PWD"`)
`GROCK_API_KEY`	Secret	LLM API key	Referenced via `${{ secrets.GROCK_API_KEY }}` in the workflow

Core Flow

Directory Preparation
```
mkdir -p .github/workflows
```
Guarantees the workflow folder exists.
Workflow Generation
- cat <<EOF > .github/workflows/autodoc.yml writes a minimal reusable workflow that triggers on workflow_dispatch.
- The workflow file contains a placeholder for the GROCK_API_KEY secret.
Configuration Generation
- cat <<EOF > autodocconfig.yml creates a YAML file containing:
  - project_name – the folder name.
  - language – default "en".
  - ignore_files – comprehensive list of patterns to skip during scanning.
  - build_settings – flags for log persistence and level.
  - structure_settings – toggles for introductory text, links, ordering, global file usage, and maximum document part size.

User Feedback

echo "✅ Done! .github/workflows/autodoc.yml has been created."
echo "✅ Done! autodocconfig.yml has been created."

Interaction with the Rest of the Pipeline

Component	Interaction	Notes
Auto Runner	Reads `autodocconfig.yml` during execution	Provides `Config` for module orchestration
GitHub Actions	`autodoc.yml` triggers the CI pipeline	Delegates to the reusable ADG workflow
Manager	Consumes `config.project_name`	Influences output directory and logging

Data Contract

Entity	Type	Role	Notes
`autodoc.yml`	File	CI trigger	Contains `permissions`, `uses`, and `secrets` sections
`autodocconfig.yml`	File	Runtime config	Parsed by `Config Reader` to produce `Config` object
`project_name`	`str`	Identifier	Injected into config; used by `DocFactory`
`language`	`str`	Output language	`"en"` by default
`ignore_files`	`list[str]`	Skipped paths	Applied by `BaseLogger`/`Manager` when scanning repo
`build_settings`	`dict`	Build flags	`save_logs`, `log_level`
`structure_settings`	`dict`	Output layout flags	`include_intro_*`, `include_order`, `use_global_file`, `max_doc_part_size`

The script uses Bash here‑strings (@'…'@/@"…@") to prevent unintended variable interpolation inside the generated YAML.
The installation procedure uses two remote scripts – one for Windows PowerShell and one for Unix-like systems – accessed through short pipelines. For PowerShell you execute a command that downloads the script and runs it in one step. For Linux and macOS, a similar one-liner pulls the shell script and pipes it to the shell interpreter.
In a Continuous‑Integration environment, you also need to provide an API key to the external service “Grock.” This is done by creating a repository secret called GROCK_API_KEY in GitHub Actions and assigning it the key obtained from the Grock documentation site. The secret is then referenced in workflow files so that the installation scripts can authenticate automatically. Run File Component

Description

The gen_doc function is the entry point for generating documentation using the Auto Doc Generator.

Functional Flow

The gen_doc function takes the project path, Config, BaseModule list, and StructureSettings instance as input and performs the following steps:

Creates a GPTModel instance for code-to-text generation and an Embedding instance for vector embeddings.
Creates a Manager instance with the provided configuration and models.
Checks if the documentation needs to be regenerated using the check_git_status function.
If regeneration is required, it generates code files, global information, and document parts using the Manager instance.
Applies custom modules and additional modules (e.g., intro text and links) to the document.
Creates an embedding layer, clears the cache, and saves the generated document.
Returns the generated document as a string.

Code Structure

The gen_doc function is defined in the autodocgenerator/auto_runner/run_file.py file and relies on the following external dependencies:

Manager instance for accessing configuration and repository information
GPTModel and Embedding for code-to-text generation and vector embeddings
Config and StructureSettings for representing configuration settings
BaseModule for creating custom module instances

Data Contract

The following table summarizes the inputs, outputs, and parameters of the gen_doc function:

Entity	Type	Role	Notes
project_path	string	input	project directory path
config	Config	input	project configuration
custom_modules	list[BaseModule]	input	custom module instances
structure_settings	StructureSettings	input	structure settings instance
output_doc	string	output	generated document
manager	Manager	parameter	manager instance
gpt_model	GPTModel	parameter	GPT model instance
embedding_model	Embedding	parameter	embedding model instance

Technical Logic Flow

The gen_doc function follows a linear logic flow, with each step building upon the previous one:

Create models and manager instance
Check if regeneration is required
Generate code files, global information, and document parts
Apply custom modules and additional modules
Create embedding layer, clear cache, and save document
Return generated document

Note that this logic flow assumes a linear execution path, with no branching or conditional statements that would alter the overall flow.

GPT Model Component

Description

The GPTModel class is a key component of the Auto Doc Generator, responsible for generating human-like text based on a given prompt. It leverages the Groq API and various pre-trained models to produce high-quality text.

Functional Flow

The GPTModel class follows these steps to generate text:

Initialize the model with an API key, history, and a list of available models.
Set up the Groq client and logger.
When generate_answer is called, check if the history or prompt should be used.
Attempt to generate text using the current model. If it fails, try the next model in the list.
If all models fail, raise a ModelExhaustedException.
Log the generated answer and return it as a string.

Code Structure

The GPTModel class is defined in the autodocgenerator/engine/models/gpt_model.py file and relies on the following external dependencies:

Groq and AsyncGroq for interacting with the Groq API
Model and AsyncModel for inheritance and shared functionality
History for storing and retrieving conversation history
BaseLogger and log classes for logging events and errors

Data Contract

The following table summarizes the inputs, outputs, and parameters of the GPTModel class:

Entity	Type	Role	Notes
api_key	string	input	Groq API key
history	History	input	conversation history
models_list	list[string]	input	list of available models
use_random	bool	input	whether to use a random model
prompt	list[dict[string, string]]	input	optional prompt to use instead of history
result	string	output	generated text
model_name	string	parameter	current model being used
client	Groq	parameter	Groq client instance
logger	BaseLogger	parameter	logger instance

Technical Logic Flow

The GPTModel class follows a linear logic flow, with each step building upon the previous one:

Initialize the model and its dependencies.
Set up the Groq client and logger.
When generate_answer is called, determine which input to use (history or prompt).
Attempt to generate text using the current model.
If the model fails, try the next model in the list.
If all models fail, raise a ModelExhaustedException.
Log the generated answer and return it as a string.

Note that this logic flow assumes a linear execution path, with no branching or conditional statements that would alter the overall flow. However, the generate_answer method does contain a loop that will continue until a model successfully generates text or all models have been exhausted.

Critical Logic Assumption: The GPTModel class assumes that at least one model in the models_list will be available and functional. If all models fail, a ModelExhaustedException is raised, and the generator will not produce any text.

Warning: The GPTModel class uses a random model from the models_list if use_random is True. This may lead to inconsistent results if the models have different capabilities or biases.

GPTModel Class – Text Generation Engine

Location: autodocgenerator/engine/models/gpt_model.py
Role in Pipeline: Provides synchronous text generation to the rest of the Auto Doc Generator. It sits between the high‑level DocFactory and the Groq API, exposing a simple generate_answer, get_answer_without_history, and get_answer interface.

Data Contract

Entity	Type	Role	Notes
`api_key`	`str`	input	Groq API key supplied at construction.
`history`	`History`	input	Conversation log that may be prepended to each request.
`models_list`	`list[str]`	input	Ordered (or shuffled if `use_random=True`) list of model identifiers.
`use_random`	`bool`	input	Whether to randomise the model order on initialization.
`prompt`	`list[dict[str,str]]`	input	Optional override to the historical prompt.
`result`	`str`	output	Generated text returned by `generate_answer`.
`model_name`	`str`	parameter	Current model being attempted.
`client`	`Groq`	parameter	Low‑level client used for API calls.
`logger`	`BaseLogger`	parameter	Logger used for audit and error reporting.

Functional Responsibility

The GPTModel orchestrates a fail‑over loop across a set of pre‑configured LLMs hosted on Groq. It:

Initialises a Groq client with the provided API key(s) and stores the model list.
Selects the current model (potentially at random).
Builds a prompt that may include the conversation history or a supplied prompt.
Invokes the Groq API until a model returns a successful response or all options are exhausted.
Logs the event and returns the raw text.

If the loop completes without success, a ModelExhaustedException is raised, aborting the documentation generation for that segment.

Visible Interactions

With History – appends user and assistant turns when get_answer is used.
With BaseLogger – records each attempt, success, or failure at the module level.
With Model / AsyncModel – inherits method signatures, enabling synchronous or asynchronous usage throughout the factory pipeline.
With external API – uses Groq (synchronous client) to send messages and receive completions.

Technical Logic Flow

Constructor
- Copies models_list.
- If use_random, shuffles the list.
- Sets current_model_index and current_key_index to zero.
generate_answer(with_history=True, prompt=None)
- If with_history and prompt is None, constructs a request from self.history.
- Otherwise, uses supplied prompt.
- Tries client.generate on self.regen_models_name[self.current_model_index].
- On exception, increments current_model_index and retries.
- Continues until a model succeeds or the list is exhausted.
get_answer_without_history(prompt)
- Calls generate_answer(with_history=False, prompt=prompt) and returns result.
get_answer(prompt)
- Adds user turn to history.
- Calls generate_answer().
- Adds assistant turn to history.
- Returns answer.

Critical Logic Assumption:
At least one model in models_list will be reachable; otherwise a ModelExhaustedException is raised.

Warning:
Random selection (use_random=True) introduces variability in model performance and output style, which may affect consistency of generated documentation.

Missing or Unimplemented Details

The actual HTTP/GRPC call syntax to Groq is not present in the provided snippet; implementation details are omitted.
Error handling for specific Groq responses is not shown.
No unit tests or configuration examples for api_keys are included in the fragment.

Manager Class Overview

The Manager encapsulates the end‑to‑end workflow that transforms a project’s source code into a coherent markdown document.
It owns the mutable doc_info schema, orchestrates the LLM and embedding pipelines, and writes intermediate artefacts into a hidden cache directory.

Manager Initialization

def __init__(self, project_directory: str,
             config: Config,
             llm_model: Model,
             embedding_model: Embedding,
             progress_bar: BaseProgress = BaseProgress())

Entity	Type	Role	Notes
`project_directory`	`str`	Root of the target project	Path on disk
`config`	`Config`	Runtime configuration	Parsed from `autodocconfig.yml`
`llm_model`	`Model`	LLM wrapper	Handles generation requests
`embedding_model`	`Embedding`	Vector‑generation engine	Used later in `create_embedding_layer`
`progress_bar`	`BaseProgress`	UI progress indicator	Default is a dummy base class

Behaviour

Instantiates a DocInfoSchema.
Sets up a file‑based BaseLogger writing to <project>/.auto_doc_cache/report.txt.
Invokes init_folder_system() to create cache artefacts.

The Manager class is used as the central orchestrator for generating documentation.
It is instantiated with the project root, a configuration object, a language‑model instance, an embedding model, and a progress‑bar helper:

sync_model = GPTModel(GROQ_API_KEYS, use_random=False)
embedding = Embedding(GOOGLE_EMBEDDING_API_KEY)

manager = Manager(
    project_path, 
    config=config,
    llm_model=sync_model,
    embedding_model=embedding,
    progress_bar=ConsoleGtiHubProgress(),
)

Typical lifecycle of a Manager instance in a script:

# decide whether to regenerate documentation
if not check_git_status(manager):
    exit()

# 1. Produce a representation of the codebase
manager.generate_code_file()

# 2. Create a global summary file if enabled
if structure_settings.use_global_file:
    manager.generate_global_info(compress_power=4)

# 3. Split the documentation into manageable parts
manager.generete_doc_parts(
    max_symbols=structure_settings.max_doc_part_size,
    with_global_file=structure_settings.use_global_file,
)

# 4. Build the documentation using a factory of modules
manager.factory_generate_doc(DocFactory(*custom_modules))

# 5. Optionally order sections
if structure_settings.include_order:
    manager.order_doc()

# 6. Add intro elements
additionals = []
if structure_settings.include_intro_text:
    additionals.append(IntroText())
if structure_settings.include_intro_links:
    additionals.append(IntroLinks())
manager.factory_generate_doc(DocFactory(*additionals, with_splited=False), to_start=True)

# 7. Final touches
manager.create_embedding_layer()
manager.clear_cache()
manager.save()

# 8. Retrieve the assembled document
full_doc = manager.doc_info.doc.get_full_doc()

Key methods exposed by Manager (as used in the provided code):

Method	Purpose
`generate_code_file()`	Scans the repository and generates internal code representations.
`generate_global_info(compress_power)`	Builds a global information file, optionally compressing it.
`generete_doc_parts(max_symbols, with_global_file)`	Splits documentation into parts limited by `max_symbols`.
`factory_generate_doc(factory, to_start=False)`	Generates documentation sections using a `DocFactory`.
`order_doc()`	Orders the generated documentation sections.
`create_embedding_layer()`	Builds embeddings for the documentation content.
`clear_cache()`	Clears temporary cache files.
`save()`	Persists all generated artifacts to disk.
`doc_info.doc.get_full_doc()`	Retrieves the full assembled document as a string.
`read_file_by_file_key(file_key, is_outside)`	Reads the content of a cached file.
`get_file_path(file_key, is_outside)`	Returns the absolute path to a file in the project or outside.

These methods together allow a user to run the entire documentation generation pipeline programmatically, controlling which parts of the output to include and when to trigger regeneration based on git status.

Cache Folder Initialization

def init_folder_system(self, project_directory)

Entity	Type	Role	Notes
`CACHE_FOLDER_NAME`	`str`	Directory name	`.auto_doc_cache`
`FILE_NAMES`	`dict`	Map of artefact keys to filenames	e.g. `"logs": "report.txt"`

Creates the cache folder and writes an empty CacheSettings JSON file when it does not yet exist.

File Path Resolution

def get_file_path(self, file_key: str, is_outside: bool = False)

Entity	Type	Role	Notes
`file_key`	`str`	Artifact identifier	Must exist in `FILE_NAMES`
`is_outside`	`bool`	Determines subfolder	`True` ⇒ project root

Utility used by all file I/O operations; keeps the cache layout deterministic.

Code‑Mix Generation

def generate_code_file(self)

Entity	Type	Role	Notes
`cm`	`CodeMix`	Repository‑content builder	Configures with `config.ignore_files`
`code_mix`	`str`	Aggregated source	Stored in `doc_info.code_mix`

Logic Flow

Logs start message.
Instantiates CodeMix with project directory and ignore patterns.
Calls build_repo_content() to read repository files.
Stores result, updates progress bar.

CodeMix Module: Repository Snapshot Generator

Responsibility

CodeMix produces a textual representation of a project's directory tree and inlined file contents, respecting a user‑defined ignore list.

Interaction with the Rest of the Pipeline

The generated string is typically fed into the embedding step where each file or section receives a semantic vector.
Other components may write this snapshot to disk as part of the global doc structure.

Core Methods

Method	Entity	Type	Role	Notes
`__init__`	`root_dir`, `ignore_patterns`	`Path`, `list[str]`	Initializes state	Defaults to current directory; `ignore_patterns` are used in `should_ignore`
`should_ignore`	`path`	`Path`	Returns `bool`	Uses `fnmatch` against full path, basename, and individual parts
`build_repo_content`	—	—	Generates the repo snapshot	Emits logs for ignored items; collects file contents in a formatted string

`should_ignore`

Convert path to a relative path under root_dir.
Iterate over ignore_patterns.
Return True if any pattern matches the relative path, the filename, or any component of the path.
Otherwise, return False.

`build_repo_content`

Begin with the header "Repository Structure:".
Recursively walk root_dir (sorted) and append indented directory/file names.
Append a separator line.
Walk again to embed file contents:
- For each file that is not ignored, append <file path="relative/path">, its decoded text, and a newline separator.
- Capture any exceptions while reading and append an error message.
Join all entries with "\n" and return the final string.

Global Ignore List

The module ships with ignore_list, a predefined set of glob patterns targeting compiled artifacts, virtual environments, cache directories, and other non‑source files.
The list is applied during should_ignore.

Global Information Compression

def generate_global_info(self, compress_power: int = 4, max_symbols: int = 10000)

Entity	Type	Role	Notes
`compress_power`	`int`	Aggressiveness of LLM compression	Default 4
`max_symbols`	`int`	Chunk size for `split_data`	Default 10 k

Workflow

Splits doc_info.code_mix into symbols with split_data.
Calls compress_to_one() (LLM‑powered) with compress_power.
Writes the resulting markdown to global_info.md.

The compressed text becomes a contextual backdrop for subsequent section generation.

Custom Introduction Processor

The Custom Introduction Processor is a lightweight post‑processing module that builds the introductory text of a generated documentation file.
It extracts anchor links, generates introductory paragraphs with or without link lists, and produces custom description blocks for individual sections.

Functional Flow

get_all_html_links
Scans a markdown string for <a name="…"> tags, collects anchor names longer than five characters, prefixes each with #, and returns a list.
Logging: Uses BaseLogger to trace extraction count.
get_links_intro
Accepts the list from get_all_html_links, builds a chat‑style prompt and forwards it to an LLM (Model interface).
The LLM is instructed to create a short introductory paragraph that references the provided anchors.
get_introdaction
Generates a generic introduction from a global data block (typically the entire repository summary).
A system prompt specifies the target language and a base introduction template (BASE_INTRO_CREATE).
generete_custom_discription
Iterates over a string of split data (each part representing a documentation chunk).
For every chunk it constructs a prompt that includes the chunk as context, a system instruction to produce a precise technical description, and a user‑supplied description template (custom_description).
The first non‑empty, non‑“no‑info” response is returned; otherwise an empty string is produced.
generete_custom_discription_without
Builds a single LLM prompt that forces a one‑time <a name="CONTENT_DESCRIPTION"> header, then generates a short hyphenated summary of the supplied text.
The function returns the LLM answer verbatim.

Data Contract

Function	Entity	Type	Role	Notes
`get_all_html_links`	`data`	`str`	Markdown source	Returns a list of `#anchor` strings
	`links`	`list[str]`	Result	Exposed for further processing
`get_links_intro`	`links`	`list[str]`	Anchor list	Passed to LLM
	`model`	`Model`	LLM wrapper	Must expose `get_answer_without_history`
	`language`	`str`	Optional	Defaults to `"en"`
	`intro_links`	`str`	Generated intro	Contains links in markdown
`get_introdaction`	`global_data`	`str`	Global summary	Text passed to LLM
	`model`	`Model`	LLM wrapper	Same as above
	`language`	`str`	Optional	Defaults to `"en"`
	`intro`	`str`	Generated paragraph	Returned directly
`generete_custom_discription`	`splited_data`	`str`	Document chunks	Iterated over; should be an iterable
	`model`	`Model`	LLM wrapper	As above
	`custom_description`	`str`	Prompt target	Text describing what to produce
	`language`	`str`	Optional	Defaults to `"en"`
	`result`	`str`	Description block	Empty if no useful response
`generete_custom_discription_without`	`model`	`Model`	LLM wrapper	Same as above
	`custom_description`	`str`	Prompt target	Full text to summarise
	`language`	`str`	Optional	Defaults to `"en"`
	`result`	`str`	Summary	Contains mandatory `<a name="CONTENT_DESCRIPTION">` header

Interaction with Other Modules

LLM: All functions delegate to an instance of Model (often GPTModel) via get_answer_without_history.
Logging: BaseLogger and InfoLog emit debug information.
Configuration: BASE_INTRODACTION_CREATE_LINKS, BASE_INTRO_CREATE, and BASE_CUSTOM_DISCRIPTIONS are constant templates imported from config.

Factory‑Based Documentation Generation

def factory_generate_doc(self, doc_factory: DocFactory, to_start: bool = False)

Entity	Type	Role	Notes
`doc_factory`	`DocFactory`	Module registry	Executes `generate_doc()` on each registered `BaseModule`
`to_start`	`bool`	Append strategy	`True` ⇒ prepend, `False` ⇒ append

Procedure

Gathers language, full_data, code_mix, and global_info into a dict.
Delegates to doc_factory.generate_doc() with the LLM and progress bar.
Concatenates the returned DocContent with the existing doc_info.doc.

Document Ordering

def order_doc(self)

Entity	Type	Role	Notes
`content_orders`	`list`	Current order metadata	Updated by `get_order()`

Calls the post‑processor get_order() (LLM‑driven) to compute a new sequence for the document parts and overwrites the internal ordering.

Sorting Module: Semantic Title Reordering

Responsibility

The sorting module provides utilities for extracting HTML anchors from raw documentation text, segmenting the document into title/section chunks, and ordering those titles semantically using a language model.

Interaction with the Rest of the Pipeline

split_text_by_anchors is invoked by the post‑processor after the CodeMix output has been concatenated into a single markdown string.
The resulting dictionary is passed to get_order, which contacts the Model instance from autodocgenerator.engine.models.model to produce an ordered list of section identifiers.
The ordered list is then used by downstream modules to assemble the final doc schema.

Core Functions

Function	Entity	Type	Role	Notes
`extract_links_from_start`	`chunks`	`list[str]`	Extracts the first anchor name from each chunk	Returns a tuple `(links, have_to_del_first)`
`split_text_by_anchors`	`text`	`str`	Splits a markdown document into `{anchor: content}`	Raises `Exception` if anchor count mismatches content chunks
`get_order`	`model`, `chanks`	`list[str]`	Requests semantic ordering from the LLM	Logs progress via `BaseLogger`

`extract_links_from_start`

Iterate over provided chunks.
Use regex ^<a name=["']?(.*?)["']?</a> to capture an anchor tag at the beginning of a chunk.
If an anchor name is longer than five characters, add #<anchor> to links and flag is_find.
If a chunk does not contain a valid anchor, set have_to_del_first = True.
Return the collected links and the flag.

`split_text_by_anchors`

Split text on the lookahead pattern (?=<a name["']?[^"'>\s]{6,200}["']?</a>) to preserve anchors.
Strip whitespace from each resulting chunk and filter empties.
Call extract_links_from_start on the chunk list to collect anchor references and a deletion flag.
If the first anchor is not at the very start (text.find("<a name") > 10) or a non‑anchor was found, discard the first chunk (result_chanks.pop(0)).
Validate that the number of anchors matches the number of content chunks; otherwise raise an exception.
Build a dictionary mapping each anchor (#name) to its corresponding content chunk.
Return the dictionary.

`get_order`

Instantiate a BaseLogger and log the start of ordering.
Build a single‑element prompt list with a user instruction to semantically sort the provided titles (chanks).
Call model.get_answer_without_history(prompt) to obtain a comma‑separated string.
Strip whitespace from each title and log the final list.
Return the ordered list of titles.

Cache Clearing

def clear_cache(self)

If config.pbc.save_logs is False, removes the log file from the cache.

Persisting Final Document

def save(self) -> None

Writes the full markdown (doc_info.doc.get_full_doc()) to output_doc.md and serialises the entire doc_info schema to info.json.

External Dependencies

CodeMix – Repository reader.
split_data, gen_doc_parts – Synchronous LLM chunking.
compress_to_one – LLM‑based compression.
get_order, split_text_by_anchors – Post‑processing utilities.
Model, Embedding – Abstract LLM and embedding interfaces.
BaseProgress, BaseLogger – UI and audit logging.
Config, DocInfoSchema – Configuration and document metadata.

Missing or Incomplete Information

The internal structure of DocInfoSchema (fields beyond code_mix, global_info, doc) is not shown.
Implementation of doc_info.doc.get_full_doc() and doc_info.doc.add_parts() is external.
Error handling around file I/O is generic (except: data = None).
No explicit cleanup of temporary files beyond logs.
The logic that populates config.pbc and config.language is not presented.

Embedding Layer Creation

def create_embedding_layer(self) -> None

Entity	Type	Role	Notes
`embedding_model`	`Embedding`	Vectorizer	Passed to each part via `init_embedding()`

Iterates over every part in doc_info.doc.parts, invoking init_embedding to attach an embedding vector.

Vector Embedding Utilities

This module implements lightweight vector operations and a wrapper around the Google Gemini embedding API.
It is used by the post‑processor to attach semantic vectors to document fragments and to order them.

Core Functions

Function	Entity	Type	Role	Notes
`bubble_sort_by_dist`	`arr`	`list`	In‑place sort	Implements bubble sort on list of tuples `(identifier, distance)`
`get_len_btw_vectors`	`vector1`, `vector2`	`numpy.ndarray`	Distance calculator	Returns Euclidean norm (`np.linalg.norm`)
`sort_vectors`	`root_vector`	`list[float]`	Reference vector	`other` is a `dict[str, list[float]]`; returns sorted keys
`Embedding.get_vector`	`prompt`	`str`	LLM request	Returns embedding of the first token vector

Data Contract

Function	Entity	Type	Role	Notes
`bubble_sort_by_dist`	`arr`	`list[tuple]`	Sorting buffer	Mutates and returns `arr`
`get_len_btw_vectors`	`vector1`, `vector2`	`numpy.ndarray`	Distance	Uses `np.linalg.norm`
`sort_vectors`	`root_vector`	`list[float]`	Reference	`other` keys sorted by distance
	`result_list`	`list[str]`	Output	Keys in ascending similarity order
`Embedding.__init__`	`api_key`	`str`	Auth	Instantiates `genai.Client`
`Embedding.get_vector`	`prompt`	`str`	Prompt text	Calls `client.models.embed_content` with a 768‑dimensional config
	`text_response`	`EmbedContentResponse`	API response	Raises `Exception` if embeddings missing
	`vector`	`list[float]`	Embedding	First dimension of the returned embeddings

Interaction with External Libraries

google.genai: Provides Client and types.EmbedContentConfig.
numpy: Used for distance calculations.
typing: Supports type hinting of Any.

Error Handling

Embedding.get_vector raises a generic Exception("promblem with embedding") if text_response.embeddings is None.
No other explicit error handling; the calling code must manage exceptions.

Observations & Missing Context

generete_custom_discription is typed to accept splited_data: str but iterated as a collection, implying the caller supplies an iterable of strings.
The constants (BASE_INTRODACTION_CREATE_LINKS, BASE_INTRO_CREATE, BASE_CUSTOM_DISCRIPTIONS) are external; their content is not available.
The functions rely on a Model interface that exposes get_answer_without_history, but details of its implementation are absent.
No explicit unit tests or error handling for malformed HTML anchors beyond a length check.

Compressor Component – Text Aggregation & LLM‑based Compression

The Compressor module provides three public helpers that wrap the LLM’s text‑summarization logic and progressively reduce a collection of raw snippets to a single compressed document.

Compression Workflow

Step	Function	Purpose	Key Inputs	Side‑Effects
1	`compress`	Sends a single chunk to the LLM and returns the compressed text.	`data`: raw string; `project_settings`: `ProjectSettings`; `model`: `Model`; `compress_power`: int	LLM interaction (network I/O); no mutating state
2	`compress_and_compare`	Groups a list of chunks into compress_power‑sized blocks, compresses each block, and emits progress updates.	`data`: `list[str]`; `model`, `project_settings`, `compress_power`; `progress_bar`: `BaseProgress`	Produces a list of compressed strings of length ⌈len(data)/compress_power⌉
3	`compress_to_one`	Iteratively feeds the output of `compress_and_compare` back into itself until a single string remains.	`data`: `list[str]`; same model and settings; `compress_power`: int	Final compressed document as a string

`compress(data: str, project_settings: ProjectSettings, model: Model, compress_power) → str`

Logic
1. Builds a three‑part prompt:
  - System message with project_settings.prompt.
  - System message with base compression hint get_BASE_COMPRESS_TEXT(len(data), compress_power).
  - User message containing the raw data.
2. Calls model.get_answer_without_history(prompt=prompt) to obtain the compressed output.
3. Returns the LLM’s answer string.
Data Contract

Entity	Type	Role	Notes
`data`	`str`	Raw content chunk	Must fit within the model’s token budget
`project_settings`	`ProjectSettings`	Provides project‑specific context	`prompt` property concatenates base text with project metadata
`model`	`Model`	LLM abstraction	Exposes `get_answer_without_history`
`compress_power`	`int`	Influences prompt text via `get_BASE_COMPRESS_TEXT`	Value typically 4 but can be overridden

Error Handling – No explicit try/except; any exception propagates from model.get_answer_without_history.

`compress_and_compare(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) → list`

Chunk Allocation – Pre‑allocates a result list sized by ceil(len(data)/compress_power).
Progress Management – Creates a sub‑task in the progress bar with the total item count.
Batch Compression – Iterates over data; each element is appended to its bucket (based on integer division by compress_power) after compression.
Update & Finish – Calls progress_bar.update_task() per element and removes the sub‑task once done.
Return – List of compressed blocks, each potentially containing multiple original snippets concatenated.

Entity	Type	Role	Notes
`data`	`list[str]`	Source snippets	Arbitrary number of items
`model`	`Model`	LLM	Same as in `compress`
`project_settings`	`ProjectSettings`	Context prompt	Used for each compression call
`compress_power`	`int`	Batch size	Default 4, can be lowered for small collections
`progress_bar`	`BaseProgress`	UI feedback	Default instance provided

`compress_to_one(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) → str`

Iterative Reduction – While more than one compressed block remains:
- Adjusts new_compress_power: if len(data) < compress_power + 1, sets it to 2.
- Calls compress_and_compare with the current power to shrink the list.
- Increments an iteration counter (unused beyond debugging).
Return – The sole remaining string after all rounds.

Entity	Type	Role	Notes
`data`	`list[str]`	List of compressed fragments	Expected to be the output of a prior `compress_and_compare` or raw data list
`model`, `project_settings`	same as before	LLM context	Passed unchanged through iterations
`compress_power`	`int`	Primary batch size	Dynamically reduced in final passes
`progress_bar`	`BaseProgress`	UI	Propagated to each inner call

Corner Cases – If data is empty, the function will raise an IndexError at data[0]. No guard is present in this fragment.

Data Contract Summary

Component	Entity	Type	Role	Notes
`extract_links_from_start`	`chunks`	`list[str]`	Input	Raw section fragments
`extract_links_from_start`	`links`	`list[str]`	Output	Anchor strings prefixed with `#`
`extract_links_from_start`	`have_to_del_first`	`bool`	Flag	Indicates whether a leading non‑anchor chunk must be dropped
`split_text_by_anchors`	`text`	`str`	Input	Raw documentation file
`split_text_by_anchors`	`result`	`dict[str, str]`	Output	Mapping from anchor to content chunk
`get_order`	`model`	`Model`	Input	LLM abstraction
`get_order`	`chanks`	`list[str]`	Input	List of section titles (anchors)
`get_order`	`new_result`	`list[str]`	Output	Semantically sorted titles
`CodeMix.build_repo_content`	`root_dir`	`Path`	Input	Project root
`CodeMix.build_repo_content`	`content`	`str`	Output	Textual repo snapshot
`CodeMix.should_ignore`	`path`	`Path`	Input	File or directory path
`CodeMix.should_ignore`	`bool`	Output	Determines if the path is filtered

Data Contract Summary

Component	Entity	Type	Role
`compress`	`data`	`str`	Raw snippet
`compress`	`project_settings`	`ProjectSettings`	Context
`compress`	`model`	`Model`	LLM
`compress`	`compress_power`	`int`	Prompt modifier
`compress_and_compare`	`data`	`list[str]`	Input chunks
`compress_and_compare`	`model`	`Model`	LLM
`compress_and_compare`	`project_settings`	`ProjectSettings`	Prompt
`compress_and_compare`	`compress_power`	`int`	Batch size
`compress_and_compare`	`progress_bar`	`BaseProgress`	Progress UI
`compress_to_one`	`data`	`list[str]`	Initial chunks
`compress_to_one`	`model`	`Model`	LLM
`compress_to_one`	`project_settings`	`ProjectSettings`	Prompt
`compress_to_one`	`compress_power`	`int`	Initial batch size
`compress_to_one`	`progress_bar`	`BaseProgress`	UI

Logging & Error Handling

compress_and_compare and compress_to_one rely on BaseProgress for progress reporting; no logging of individual LLM calls is visible.
No exception handling is implemented in these functions; errors from model.get_answer_without_history or from the progress bar surface to the caller.
The module assumes get_BASE_COMPRESS_TEXT returns a properly formatted system message; misuse may cause malformed prompts.

`logging.py` – Structured Logging Utilities

Purpose
Centralizes log message creation and dispatch for the Auto Doc Generator UI.
Logs are formatted with timestamps and severity prefixes and can be routed to the console or a file.

Entity	Type	Role	Notes
`BaseLog`	Abstract class	Base log message container	Holds `message` and numeric `level` (0 = default)
`ErrorLog`, `WarningLog`, `InfoLog`	Subclasses	Severity‑specific loggers	Override `format` to prepend `[ERROR]`, `[WARNING]`, `[INFO]`
`BaseLoggerTemplate`	Interface	Dispatch layer	`log` writes, `global_log` applies level filter
`FileLoggerTemplate`	Concrete	File‑based logger	Appends each formatted message to a file
`BaseLogger`	Singleton	Public API	Stores active `BaseLoggerTemplate` and forwards `log` calls

The logger’s level comparison uses log_level < 0 to mean “unfiltered” (default).

Core Flow

Message Instantiation – ErrorLog("…") creates an instance with the current timestamp via _log_prefix.
Formatting – Each subclass implements format() to prefix severity and timestamp.
Dispatch – BaseLogger.log() forwards to the set template’s global_log().
Filtering – BaseLoggerTemplate.global_log() checks if the message level is allowed before calling log().
Output – BaseLoggerTemplate.log() (or FileLoggerTemplate.log()) writes to stdout or a file.

Error Handling & Logging

split_text_by_anchors: Raises a generic Exception if the anchor count does not match the number of content chunks, signalling malformed input.
CodeMix.build_repo_content: Wraps file reads in a try/except; any exception is appended to the output rather than terminating the process.
Logging: Uses BaseLogger to emit InfoLog messages during ordering; CodeMix logs ignored paths. No other explicit error handling is present.

`progress_base.py` – Task Progress Abstractions

Purpose
Provides a pluggable progress interface for both Rich‑based terminal displays and simple console prints, used during long‑running generation stages.

Entity	Type	Role	Notes
`BaseProgress`	Abstract	Progress API	Stub methods for sub‑task handling
`LibProgress`	Rich implementation	Rich progress bar	Uses `rich.progress.Progress`; tracks base and sub‑tasks
`ConsoleTask`	Helper	Simple textual progress	Prints percent for each increment
`ConsoleGtiHubProgress`	Console wrapper	GitHub‑style console progress	Holds a default “General Progress” and optional sub‑task

Core Flow

Task Creation – create_new_subtask(name, total_len) registers a sub‑task; LibProgress adds to the Rich progress container.
Progress Update – update_task() advances the current sub‑task; if none, advances the base task.
Task Removal – remove_subtask() clears the current sub‑task reference.
Console Fallback – ConsoleGtiHubProgress emits human‑readable percentage via ConsoleTask.progress().

`split_data` – Adaptive Text Chunker

Purpose
Takes a long string (full_code_mix) and a symbol limit (max_symbols) and produces a list of text fragments that each fit under the limit, with a safety margin of 25 %.
The algorithm first iteratively halves any fragment exceeding 150 % of max_symbols, then assembles the final chunks while ensuring no single part grows beyond 125 % of the target.

This function is used by gen_doc_parts to feed the LLM a manageable prompt size.

def split_data(full_code_mix: str, max_symbols: int) -> list[str]

Logic Flow

Initialize logger (BaseLogger) and emit “Starting data splitting…”.
Iteratively split:
- While any element el in splited_by_files has length > 1.5 × max_symbols, split it at its half and insert the second half after the current index.
Assemble final parts:
- Traverse splited_by_files and append elements to the current part until the cumulative length would exceed 1.25 × max_symbols, then start a new part.
Log the number of parts and return the list.

Data Contract

Entity	Type	Role	Notes
`full_code_mix`	`str`	Raw source code	Input to be split
`max_symbols`	`int`	Size threshold	Determines chunk limits
`splited_by_files`	`list[str]`	Intermediate list	Derived from `full_code_mix` (not shown)
`split_objects`	`list[str]`	Output list	Returned to caller
`logger`	`BaseLogger`	Logger	Emits diagnostics

`CacheSettings` – Commit Cache

A lightweight Pydantic model holding the SHA of the last processed commit.

class CacheSettings(BaseModel):
    last_commit: str = ""

Only a single field is defined; no additional behavior.

Documentation Parts Generation

def generete_doc_parts(self, max_symbols=5_000, with_global_file: bool = False)

Entity	Type	Role	Notes
`max_symbols`	`int`	Size of each part	Default 5 k
`with_global_file`	`bool`	Whether to include global info	Overridden to `True` internally

Steps

Reads the cached global file.
Calls gen_doc_parts() (synchronous LLM pipeline) to create chunked markdown.
Writes output to output_doc.md.
Splits the text at anchors with split_text_by_anchors and stores each part in doc_info.doc.parts.

`write_docs_by_parts` – LLM‑Driven Part Generation

Purpose
Transforms a single text chunk (part) into a documentation fragment via a language model.
The function builds a prompt hierarchy, calls the LLM, and strips markdown fences if present.

def write_docs_by_parts(part: str,
                        model: Model,
                        project_settings: ProjectSettings,
                        prev_info: str | None = None,
                        language: str = "en",
                        global_info: str | None = None) -> str

Logic Flow

Initialize logger and log “Generating documentation for a part…”.
Construct prompt:
- System role: language selector (language).
- System role: global project metadata (project_settings.prompt).
- System role: base completion template (BASE_PART_COMPLITE_TEXT).
- Optional system role: global_info.
- Optional system role: prev_info indicating the last written part.
- User role: the actual code chunk (part).
LLM invocation: model.get_answer_without_history(prompt=prompt) → answer.
Strip leading/trailing markdown fences ("""``") if present.
Return the cleaned answer.

The function does not handle any LLM‑specific errors; any exception propagates to the caller.

Data Contract

Entity	Type	Role	Notes
`part`	`str`	Chunk to document	Input to LLM
`model`	`Model`	LLM instance	Must expose `get_answer_without_history`
`project_settings`	`ProjectSettings`	Context	Provides `prompt` property
`prev_info`	`str	None`	Tail of preceding doc
`language`	`str`	Target language	Defaults to `"en"`
`global_info`	`str	None`	Global relation notes
`answer`	`str`	LLM output	May contain fenced code block
`temp_answer`	`str`	Stripped version	Used for fence removal
`logger`	`BaseLogger`	Logger	Emits generation details

`gen_doc_parts` – Multi‑Pass Documentation Builder

Purpose
Orchestrates the full “split‑then‑generate” pipeline: splits the entire source mix, iteratively calls write_docs_by_parts, and concatenates the results.
It also maintains a sliding window of the most recent 3 000 characters to pass as contextual prev_info for the next part.

def gen_doc_parts(full_code_mix,
                  max_symbols,
                  model: Model,
                  project_settings: ProjectSettings,
                  language,
                  progress_bar: BaseProgress,
                  global_info=None) -> str

Logic Flow

Split full_code_mix → splited_data by calling split_data.
Log start of part generation.
Create sub‑task in the progress bar: "Generete doc parts".
Iterate over splited_data:
- Call write_docs_by_parts with current part, model, project_settings, previous result (result) as prev_info.
- Append returned string to all_result.
- Keep only the last 3 000 chars of result for next iteration.
- Update the progress bar.
Finish sub‑task, log total documentation length, and return all_result.

Data Contract

Entity	Type	Role	Notes
`full_code_mix`	`str`	Raw source	Input to be split
`max_symbols`	`int`	Chunk size	Used in `split_data`
`model`	`Model`	LLM	For generation
`project_settings`	`ProjectSettings`	Config	Supplies `prompt`
`language`	`str`	Locale	Passed to `write_docs_by_parts`
`progress_bar`	`BaseProgress`	UI	Tracks per‑part progress
`global_info`	`str	None`	Project relations
`splited_data`	`list[str]`	Chunks	Intermediate
`all_result`	`str`	Full documentation	Final output

`DocContent` – Embedded Text Block

Encapsulates a documentation string and its optional vector embedding.

class DocContent(BaseModel):
    content: str
    embedding_vector: list | None = None

    def init_embedding(self, embedding_model: Embedding):
        self.embedding_vector = embedding_model.get_vector(self.content)

Method	Input	Output	Notes
`init_embedding`	`Embedding`	None	Sets `embedding_vector` in place

`DocHeadSchema` – Ordered Section Collection

Maintains an ordered list of section names and a mapping to their DocContent.

class DocHeadSchema(BaseModel):
    content_orders: list[str] = []
    parts: dict[str, DocContent] = {}

    def add_parts(self, name, content: DocContent): ...
    def get_full_doc(self, split_el: str = "\n") -> str: ...
    def __add__(self, other: "DocHeadSchema") -> "DocHeadSchema": ...

Key Operations

Method	Role	Notes
`add_parts`	Insert a uniquely named section	Avoids name collisions by appending `_{i}`
`get_full_doc`	Concatenate all parts in order	Uses `split_el` separator
`__add__`	Merge with another `DocHeadSchema`	Preserves ordering

`DocInfoSchema` – Full Document Metadata

Root schema aggregating global info, the raw source mix, and the hierarchical sections.

class DocInfoSchema(BaseModel):
    global_info: str = ""
    code_mix: str = ""
    doc: DocHeadSchema = Field(default_factory=DocHeadSchema)

Field	Type	Role	Notes
`global_info`	`str`	Project description	Raw text
`code_mix`	`str`	Raw source	Used by the generator
`doc`	`DocHeadSchema`	Structured doc	Holds sections

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.6.6.3

Apr 16, 2026

1.6.6.1

Apr 11, 2026

1.6.6.0

Apr 11, 2026

1.6.5.9

Apr 11, 2026

1.6.4.9

Apr 9, 2026

1.6.4.7

Apr 9, 2026

1.6.3.7

Apr 5, 2026

1.6.3.5

Apr 4, 2026

1.6.3.4

Apr 4, 2026

1.6.3.1

Apr 4, 2026

1.6.0.9

Apr 3, 2026

1.6.0.8

Apr 3, 2026

1.6.0.6

Apr 2, 2026

1.6.0.5

Apr 2, 2026

1.6.0.4

Apr 2, 2026

1.6.0.3

Apr 2, 2026

1.6.0.2

Apr 2, 2026

1.6.0.1

Apr 2, 2026

1.6.0.0

Apr 2, 2026

1.5.9.9

Apr 2, 2026

1.4.9.6

Mar 22, 2026

1.4.9.5

Mar 20, 2026

1.4.9.2

Mar 20, 2026

1.4.9.1

Mar 20, 2026

1.4.9.0

Mar 20, 2026

1.1.9.0

Mar 20, 2026

This version

1.1.8.9

Mar 20, 2026

1.1.8.8

Mar 20, 2026

1.0.6.8

Mar 19, 2026

1.0.6.6

Mar 19, 2026

1.0.5.6

Mar 18, 2026

1.0.5.0

Mar 18, 2026

1.0.4.0

Mar 18, 2026

1.0.3.9

Mar 18, 2026

1.0.3.5

Mar 18, 2026

1.0.3.3

Mar 18, 2026

0.9.3.1

Feb 7, 2026

0.9.3.0

Feb 7, 2026

0.9.2.8

Feb 6, 2026

0.9.2.7

Feb 6, 2026

0.9.2.5

Feb 5, 2026

0.9.0.4

Jan 28, 2026

0.9.0.3

Jan 28, 2026

0.9.0.2

Jan 28, 2026

0.9.0.1

Jan 28, 2026

0.9.0.0

Jan 28, 2026

0.8.9.9

Jan 27, 2026

0.8.9.8

Jan 27, 2026

0.8.9.7

Jan 27, 2026

0.8.9.6

Jan 27, 2026

0.8.9.5

Jan 26, 2026

0.8.9.1

Jan 26, 2026

0.8.9

Jan 26, 2026

0.8.8

Jan 26, 2026

0.8.7

Jan 26, 2026

0.8.6

Jan 26, 2026

0.8.5.9

Jan 26, 2026

0.8.5.8

Jan 26, 2026

0.8.5.7

Jan 26, 2026

0.8.5.6

Jan 26, 2026

0.8.5.4

Jan 26, 2026

0.8.5.3

Jan 26, 2026

0.8.5.2

Jan 26, 2026

0.8.5.1

Jan 26, 2026

0.8.5

Jan 25, 2026

0.8.4

Jan 25, 2026

0.8.3

Jan 25, 2026

0.8.1

Jan 25, 2026

0.8.0

Jan 25, 2026

0.7.9

Jan 25, 2026

0.7.6

Jan 25, 2026

0.7.5

Jan 25, 2026

0.7.4

Jan 23, 2026

0.7.3

Jan 23, 2026

0.7.2

Jan 23, 2026

0.7.1

Jan 23, 2026

0.7.0

Jan 23, 2026

0.6.9

Jan 23, 2026

0.6.8

Jan 23, 2026

0.6.5

Jan 22, 2026

0.6.3

Jan 22, 2026

0.6.2

Jan 22, 2026

0.6.1

Jan 22, 2026

0.6.0

Jan 21, 2026

0.5.9

Jan 21, 2026

0.5.8

Jan 21, 2026

0.5.5

Jan 21, 2026

0.5.4

Jan 21, 2026

0.5.3

Jan 19, 2026

0.5.2

Jan 19, 2026

0.5.1

Jan 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autodocgenerator-1.1.8.9.tar.gz (60.3 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autodocgenerator-1.1.8.9-py3-none-any.whl (49.9 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file autodocgenerator-1.1.8.9.tar.gz.

File metadata

Download URL: autodocgenerator-1.1.8.9.tar.gz
Upload date: Mar 20, 2026
Size: 60.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure

File hashes

Hashes for autodocgenerator-1.1.8.9.tar.gz
Algorithm	Hash digest
SHA256	`113d3fb16573e8bcdfd6249ffdf7fae7e0a7513ac56558d3b13448f7b1e58f62`
MD5	`4c68b689254f1022127cdd1f1fde18b0`
BLAKE2b-256	`026cc2a2ce34900be8e9a35d1d4665cc464426d2986b44dd5958fff29dce91f7`

See more details on using hashes here.

File details

Details for the file autodocgenerator-1.1.8.9-py3-none-any.whl.

File metadata

Download URL: autodocgenerator-1.1.8.9-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 49.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure

File hashes

Hashes for autodocgenerator-1.1.8.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd844e070aa87f2b94bf432a3046bab673fc6e9a75a04d95809024108c82f416`
MD5	`a7c623bd0ffdd398f471ff199f589341`
BLAKE2b-256	`0a979e60028969e39517ef990167d9d79d0c25977c313188ae3dec84d269d4c8`

See more details on using hashes here.

autodocgenerator 1.1.8.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Project Goal

Core Logic & Principles

Key Features

Dependencies

Executive Navigation Tree

Description

Functional Flow

Code Structure

Description

Functional Flow

Code Structure

ProjectSettings – Metadata Container

install.ps1 – CI Setup & Project Configuration

Interaction with the Rest of the Pipeline

Data Contract Summary

install.sh – CI Setup & Runtime Configuration

Core Flow

Interaction with the Rest of the Pipeline

Data Contract

Description

Functional Flow

Code Structure

Data Contract

Technical Logic Flow

Description

Functional Flow

Code Structure

Data Contract

Technical Logic Flow

GPTModel Class – Text Generation Engine

Data Contract

Functional Responsibility

Visible Interactions

Technical Logic Flow

Missing or Unimplemented Details

Manager Class Overview

Manager Initialization

Cache Folder Initialization

File Path Resolution

Code‑Mix Generation

CodeMix Module: Repository Snapshot Generator

Responsibility

Interaction with the Rest of the Pipeline

Core Methods

should_ignore

build_repo_content

Global Ignore List

Global Information Compression

Custom Introduction Processor

Functional Flow

Data Contract

Interaction with Other Modules

Factory‑Based Documentation Generation

Document Ordering

Sorting Module: Semantic Title Reordering

Responsibility

Interaction with the Rest of the Pipeline

Core Functions

extract_links_from_start

split_text_by_anchors

get_order

Cache Clearing

Persisting Final Document

External Dependencies

Missing or Incomplete Information

Embedding Layer Creation

Vector Embedding Utilities

Core Functions

Data Contract

Interaction with External Libraries

Error Handling

Observations & Missing Context

`ProjectSettings` – Metadata Container

`install.ps1` – CI Setup & Project Configuration

`install.sh` – CI Setup & Runtime Configuration

`should_ignore`

`build_repo_content`

`extract_links_from_start`

`split_text_by_anchors`

`get_order`

`compress(data: str, project_settings: ProjectSettings, model: Model, compress_power) → str`

`compress_and_compare(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) → list`

`compress_to_one(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) → str`

`logging.py` – Structured Logging Utilities

`progress_base.py` – Task Progress Abstractions

`split_data` – Adaptive Text Chunker

`CacheSettings` – Commit Cache

`write_docs_by_parts` – LLM‑Driven Part Generation

`gen_doc_parts` – Multi‑Pass Documentation Builder

`DocContent` – Embedded Text Block

`DocHeadSchema` – Ordered Section Collection

`DocInfoSchema` – Full Document Metadata