This Project helps you to create docs for your projects
Project description
Project Title
AutoDoc — Automatic Project Documentation Generator
Project Goal
AutoDoc automatically produces high‑quality, up‑to‑date documentation for a codebase by combining source‑code analysis, large‑language‑model (LLM) text generation, and semantic embeddings. The tool scans all relevant files, extracts meaningful snippets, compresses and sorts them, and assembles a coherent document that can be written in one or more parts. It also intelligently decides whether regeneration is necessary by comparing the current repository state against a cached Git commit SHA.
Core Logic & Principles
| Layer | Responsibility | Key Components | Interaction Flow |
|---|---|---|---|
| Orchestrator | Holds configuration, models, and mutable documentation state. | Manager – singleton‑like object that stores config, llm_model, embedding_model, a progress UI, and a mutable doc_info (DocSchema). |
Every module receives a reference to the Manager so they can read/write documentation sections. |
| Configuration | Reads autodocconfig.yml and exposes runtime settings. |
ConfigReader → Config & StructureSettings. |
Doc Factory uses these settings to decide which modules to invoke and how large each part can be (max_doc_part_size, include_intro_*). |
| Source‑Code Pipeline | Pre‑processes repository files into chunks suitable for LLM consumption. | preprocessor → code_mix → compressor → spliter |
Manager.generate_code_file orchestrates this chain, producing a list of code‑to‑text candidates. |
| LLM & Embedding | Transforms code snippets into natural‑language explanations and generates vector embeddings for semantic ordering. | LLM Wrapper (GPTModel → sync_model), Embedding (embedding_model). |
run_file.gen_doc calls the LLM to produce text and the embedding model to produce vectors used later by the sorting logic. |
| Semantic Ordering & Compression | Sorts the generated text by meaning, compresses large sections, and splits the final output into manageable parts. | Sorting, Compressor, split_text_by_anchors |
After the LLM produces a raw output, sort_vectors, compress, and split_text_by_anchors produce a clean, well‑structured document. |
| Persistence & Incremental Regeneration | Writes the final documentation and records the current Git commit for future change detection. | check_git_status, should_change?, CacheSettings.last_commit |
The Manager checks if the repository state has changed before running a full pipeline. |
| Factory Pattern | Enables plug‑in of new documentation modules. | Doc Factory, BaseModule subclasses |
Modules are dynamically aggregated and executed against the Manager, making the system highly extensible. |
The entire workflow can be started from the command line (Auto Runner) which instantiates the models, loads the configuration, and runs the full generation pipeline.
Key Features
- Automated Doc Generation – Scans all project files, ignoring patterns defined in the config, and produces natural‑language documentation.
- LLM‑Driven Text Generation – Uses a GPT‑based wrapper to convert code snippets into explanatory text.
- Semantic Embedding & Ordering – Generates vector embeddings for each section, sorts them by meaning, and splits long texts into logical anchors.
- Compression – Compresses verbose outputs with a dedicated model to keep sections within user‑defined size limits (
max_doc_part_size). - Incremental Builds – Checks the current Git SHA against a cached value to decide whether regeneration is needed, saving time on unchanged projects.
- Pluggable Module System – Employs a factory pattern (
Doc Factory) to allow developers to add custom processing modules without modifying core logic. - Configurable Pipeline –
autodocconfig.ymlcontrols language, ignore patterns, project name, thresholds, logging levels, and toggles such asuse_global_fileorinclude_intro_*. - Cross‑Library Integration – Built on top of
google.genaifor embeddings,numpyfor vector operations, and standard libraries (re,fnmatch,yaml) for parsing and text manipulation. - Progress Feedback – Integrated progress bar to keep users informed during long runs.
Dependencies
| Category | Library / Tool | Purpose |
|---|---|---|
| Configuration | yaml |
Parse autodocconfig.yml. |
| Git Integration | gitpython or subprocess calls |
Retrieve current commit SHA. |
| LLM & Embeddings | google.genai |
Generate embeddings and LLM text. |
| Numerics & Vector Ops | numpy |
Handle vector calculations (length, sorting). |
| Text Processing | re, fnmatch |
Pattern matching, ignore list handling. |
| CLI & UX | argparse, tqdm |
Command line interface & progress bar. |
| Utilities | os, pathlib, logging |
File system operations, logging. |
| Project Settings | Custom modules (projectsettings.py) |
Manage project‑level settings. |
| Packaging | setuptools/pip |
Install the AutoDoc package. |
All dependencies are pure‑Python and available via pip. The project can be installed and run on any environment that has Python 3.10+ and the necessary credentials for the Google GenAI API.
Executive Navigation Tree
📂 Git & Deployment
- #check-git-status-component
- #install_script
- #install_sh
- #install-workflow-with-remote-scripts-and-github-secret
- #run-file-component
- #folder-system
- #file-paths
📂 Configuration
📂 Core Architecture
- #gpt-model-component
- #gptmodel-class
- #inputs-outputs
- #responsibility
- #interaction
- #logic-flow
- #manager-class
- #manager-class-usage
- #code-mix
- #codemix-module
- #global-info
- #custom-intro
- #factory
- #ordering
- #sorting-module
- #cleanup
📂 Data Processing
- #embedding
- #embedding-layer
- #compressor-component
- #compressor-flow
- #compress-function
- #compress-and-compare-function
- #compress-to-one-function
- #data-contracts
- #data-contract-summary
- #split_data
📂 Logging & Error Handling
📂 Documentation Generation
- #doc-parts
- #write_docs_by_parts
- #gen_doc_parts
- #doc_content
- #doc_head_schema
- #doc_info_schema Check Git Status Component
Description
The check_git_status function is a critical component in the Auto Doc Generator pipeline, responsible for determining whether the documentation needs to be regenerated based on Git repository changes.
Functional Flow
The function takes a Manager instance as input and performs the following steps:
- Checks if the current GitHub event is a workflow dispatch. If so, it returns
True, indicating that the documentation should be regenerated. - Retrieves the last commit hash from the
.auto_doc_cache_file.jsonfile. - Uses the
get_diff_by_hashfunction to compare the current repository state with the last committed state, excluding Markdown files. - If the difference exceeds the threshold defined in the
Managerinstance's configuration or if the last commit hash is empty, it updates the last commit hash and returnsTrue. - Otherwise, it returns
False, indicating that the documentation does not need to be regenerated.
Code Structure
The check_git_status function is defined in the autodocgenerator/auto_runner/check_git_status.py file and relies on the following external dependencies:
subprocessfor executing Git commandsManagerinstance for accessing configuration and repository informationCacheSettingsfor loading and updating the last commit hash
Description
The config_reader module is responsible for parsing the autodocconfig.yml file and extracting relevant configuration settings for the Auto Doc Generator.
Functional Flow
The read_config function takes the contents of the autodocconfig.yml file as input and performs the following steps:
- Loads the configuration data using the
yamllibrary. - Extracts the project name, language, ignore files, and build settings from the configuration data.
- Creates a
Configinstance and populates it with the extracted settings. - Extracts custom module descriptions and creates a list of
BaseModuleinstances. - Extracts structure settings and creates a
StructureSettingsinstance. - Returns the
Config,BaseModulelist, andStructureSettingsinstance.
Code Structure
The config_reader module is defined in the autodocgenerator/auto_runner/config_reader.py file and relies on the following external dependencies:
yamlfor loading configuration dataConfigandStructureSettingsfor representing configuration settingsBaseModulefor creating custom module instances
ProjectSettings – Metadata Container
- Construction – Instantiated with a mandatory
project_name. - Dynamic Properties –
infodictionary holds arbitrary key/value pairs that are appended to the prompt. promptProperty – Concatenates:- Global
BASE_SETTINGS_PROMPTconstant. Project Nameline.- One line per key/value in
info.
- Global
| Method | Input | Output | Notes |
|---|---|---|---|
add_info(key, value) |
str, Any |
None | Mutates info dictionary |
prompt |
None | str |
Generated on each access |
The file is a YAML document that defines several top‑level sections used by the documentation generator.
-
project_name – Title of the project.
-
language – Language code for generated documentation.
-
ignore_files – List of patterns (glob style) for files and directories that the tool should skip. Typical entries include compiled artefacts, virtualenv folders, IDE folders, databases, logs, and markdown files.
-
build_settings – Controls the build process.
- save_logs: Boolean to keep log files.
- log_level: Integer indicating verbosity.
- threshold_changes: Numeric limit for what constitutes a significant change.
-
structure_settings – Determines how the output is organized.
- include_intro_links: Add links at the beginning.
- include_intro_text: Add introductory text.
- include_order: Preserve ordering of elements.
- use_global_file: Reference a single global documentation file.
- max_doc_part_size: Size cap for individual documentation blocks.
-
project_additional_info – Custom free‑form fields; here a key global idea contains a short project description.
-
custom_descriptions – List of descriptive strings that can be injected into generated content, often containing instructions or explanations about installation scripts, environment secrets, or how to use certain classes or modules.
install.ps1 – CI Setup & Project Configuration
Purpose
Automates creation of the GitHub Actions workflow file and a default autodocconfig.yml for a newly cloned project.
| Entity | Type | Role | Notes |
|---|---|---|---|
$currentFolderName |
Variable | Project name | Derived from current folder, inserted into config |
.github/workflows/autodoc.yml |
Workflow file | Reusable workflow trigger | Points to Drag‑GameStudio/ADG reusable workflow |
autodocconfig.yml |
YAML | Runtime configuration | Contains ignore patterns, build and structure settings |
Core Flow
- Directory Creation –
New-Item -ItemType Directoryensures.github/workflows/exists. - Workflow Generation –
$contentholds the workflow YAML; written viaOut-File. - Configuration Generation –
$configContentis a multi‑line PowerShell string that populatesautodocconfig.yml. - Feedback –
Write-Hostsignals completion.
The script uses here‑strings (
@'…'@and@"…@") to avoid variable interpolation.
Interaction with the Rest of the Pipeline
| Component | Interaction | Notes |
|---|---|---|
BaseLogger |
Used by DocFactory, Manager, and other modules to emit status messages. |
Singleton ensures a single logger instance. |
BaseProgress subclasses |
Employed by Manager.generate_doc_parts and DocFactory to report progress. |
LibProgress is optional if Rich is available. |
install.ps1 |
Pre‑runs during project bootstrap to provide CI hooks and configuration. | Generates files before Auto Runner is invoked. |
Data Contract Summary
| Entity | Type | Role | Notes |
|---|---|---|---|
log.message |
str |
Log text | Human‑readable |
log.level |
int |
Severity threshold | Lower = less filtering |
log.file_path |
str |
Target file for FileLoggerTemplate |
Appended, UTF‑8 |
progress.total_len |
int |
Sub‑task length | Updated per increment |
config.project_name |
str |
Project identifier | Injected into config file |
The provided code contains no external library calls beyond standard library and rich; therefore, no external behavior assumptions are made.
install.sh – CI Setup & Runtime Configuration
Purpose
Automates the creation of a GitHub Actions workflow (autodoc.yml) and a default autodocconfig.yml for a freshly cloned Auto‑Doc project. The script is executed during project bootstrap.
| Entity | Type | Role | Notes |
|---|---|---|---|
.github/workflows |
Directory | Workflow host | Created if missing |
autodoc.yml |
YAML | Reusable workflow trigger | Uses Drag‑GameStudio/ADG/.github/workflows/reuseble_agd.yml@main |
autodocconfig.yml |
YAML | Runtime configuration | Populated with project name, language, ignore patterns, build and structure settings |
$currentFolderName |
Variable | Project identifier | Derived from the current working directory (basename "$PWD") |
GROCK_API_KEY |
Secret | LLM API key | Referenced via ${{ secrets.GROCK_API_KEY }} in the workflow |
Core Flow
-
Directory Preparation
mkdir -p .github/workflows
Guarantees the workflow folder exists.
-
Workflow Generation
cat <<EOF > .github/workflows/autodoc.ymlwrites a minimal reusable workflow that triggers onworkflow_dispatch.- The workflow file contains a placeholder for the
GROCK_API_KEYsecret.
-
Configuration Generation
cat <<EOF > autodocconfig.ymlcreates a YAML file containing:- project_name – the folder name.
- language – default
"en". - ignore_files – comprehensive list of patterns to skip during scanning.
- build_settings – flags for log persistence and level.
- structure_settings – toggles for introductory text, links, ordering, global file usage, and maximum document part size.
-
User Feedback
echo "✅ Done! .github/workflows/autodoc.yml has been created." echo "✅ Done! autodocconfig.yml has been created."
Interaction with the Rest of the Pipeline
| Component | Interaction | Notes |
|---|---|---|
| Auto Runner | Reads autodocconfig.yml during execution |
Provides Config for module orchestration |
| GitHub Actions | autodoc.yml triggers the CI pipeline |
Delegates to the reusable ADG workflow |
| Manager | Consumes config.project_name |
Influences output directory and logging |
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
autodoc.yml |
File | CI trigger | Contains permissions, uses, and secrets sections |
autodocconfig.yml |
File | Runtime config | Parsed by Config Reader to produce Config object |
project_name |
str |
Identifier | Injected into config; used by DocFactory |
language |
str |
Output language | "en" by default |
ignore_files |
list[str] |
Skipped paths | Applied by BaseLogger/Manager when scanning repo |
build_settings |
dict |
Build flags | save_logs, log_level |
structure_settings |
dict |
Output layout flags | include_intro_*, include_order, use_global_file, max_doc_part_size |
The script uses Bash here‑strings (
@'…'@/@"…@") to prevent unintended variable interpolation inside the generated YAML.
The installation procedure uses two remote scripts – one for Windows PowerShell and one for Unix-like systems – accessed through short pipelines. For PowerShell you execute a command that downloads the script and runs it in one step. For Linux and macOS, a similar one-liner pulls the shell script and pipes it to the shell interpreter.
In a Continuous‑Integration environment, you also need to provide an API key to the external service “Grock.” This is done by creating a repository secret called GROCK_API_KEY in GitHub Actions and assigning it the key obtained from the Grock documentation site. The secret is then referenced in workflow files so that the installation scripts can authenticate automatically. Run File Component
Description
The gen_doc function is the entry point for generating documentation using the Auto Doc Generator.
Functional Flow
The gen_doc function takes the project path, Config, BaseModule list, and StructureSettings instance as input and performs the following steps:
- Creates a
GPTModelinstance for code-to-text generation and anEmbeddinginstance for vector embeddings. - Creates a
Managerinstance with the provided configuration and models. - Checks if the documentation needs to be regenerated using the
check_git_statusfunction. - If regeneration is required, it generates code files, global information, and document parts using the
Managerinstance. - Applies custom modules and additional modules (e.g., intro text and links) to the document.
- Creates an embedding layer, clears the cache, and saves the generated document.
- Returns the generated document as a string.
Code Structure
The gen_doc function is defined in the autodocgenerator/auto_runner/run_file.py file and relies on the following external dependencies:
Managerinstance for accessing configuration and repository informationGPTModelandEmbeddingfor code-to-text generation and vector embeddingsConfigandStructureSettingsfor representing configuration settingsBaseModulefor creating custom module instances
Data Contract
The following table summarizes the inputs, outputs, and parameters of the gen_doc function:
| Entity | Type | Role | Notes |
|---|---|---|---|
| project_path | string | input | project directory path |
| config | Config | input | project configuration |
| custom_modules | list[BaseModule] | input | custom module instances |
| structure_settings | StructureSettings | input | structure settings instance |
| output_doc | string | output | generated document |
| manager | Manager | parameter | manager instance |
| gpt_model | GPTModel | parameter | GPT model instance |
| embedding_model | Embedding | parameter | embedding model instance |
Technical Logic Flow
The gen_doc function follows a linear logic flow, with each step building upon the previous one:
- Create models and manager instance
- Check if regeneration is required
- Generate code files, global information, and document parts
- Apply custom modules and additional modules
- Create embedding layer, clear cache, and save document
- Return generated document
Note that this logic flow assumes a linear execution path, with no branching or conditional statements that would alter the overall flow.
Description
The GPTModel class is a key component of the Auto Doc Generator, responsible for generating human-like text based on a given prompt. It leverages the Groq API and various pre-trained models to produce high-quality text.
Functional Flow
The GPTModel class follows these steps to generate text:
- Initialize the model with an API key, history, and a list of available models.
- Set up the Groq client and logger.
- When
generate_answeris called, check if the history or prompt should be used. - Attempt to generate text using the current model. If it fails, try the next model in the list.
- If all models fail, raise a
ModelExhaustedException. - Log the generated answer and return it as a string.
Code Structure
The GPTModel class is defined in the autodocgenerator/engine/models/gpt_model.py file and relies on the following external dependencies:
GroqandAsyncGroqfor interacting with the Groq APIModelandAsyncModelfor inheritance and shared functionalityHistoryfor storing and retrieving conversation historyBaseLoggerand log classes for logging events and errors
Data Contract
The following table summarizes the inputs, outputs, and parameters of the GPTModel class:
| Entity | Type | Role | Notes |
|---|---|---|---|
| api_key | string | input | Groq API key |
| history | History | input | conversation history |
| models_list | list[string] | input | list of available models |
| use_random | bool | input | whether to use a random model |
| prompt | list[dict[string, string]] | input | optional prompt to use instead of history |
| result | string | output | generated text |
| model_name | string | parameter | current model being used |
| client | Groq | parameter | Groq client instance |
| logger | BaseLogger | parameter | logger instance |
Technical Logic Flow
The GPTModel class follows a linear logic flow, with each step building upon the previous one:
- Initialize the model and its dependencies.
- Set up the Groq client and logger.
- When
generate_answeris called, determine which input to use (history or prompt). - Attempt to generate text using the current model.
- If the model fails, try the next model in the list.
- If all models fail, raise a
ModelExhaustedException. - Log the generated answer and return it as a string.
Note that this logic flow assumes a linear execution path, with no branching or conditional statements that would alter the overall flow. However, the generate_answer method does contain a loop that will continue until a model successfully generates text or all models have been exhausted.
Critical Logic Assumption: The
GPTModelclass assumes that at least one model in themodels_listwill be available and functional. If all models fail, aModelExhaustedExceptionis raised, and the generator will not produce any text.
Warning: The
GPTModelclass uses a random model from themodels_listifuse_randomisTrue. This may lead to inconsistent results if the models have different capabilities or biases.
GPTModel Class – Text Generation Engine
Location: autodocgenerator/engine/models/gpt_model.py
Role in Pipeline: Provides synchronous text generation to the rest of the Auto Doc Generator. It sits between the high‑level DocFactory and the Groq API, exposing a simple generate_answer, get_answer_without_history, and get_answer interface.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
api_key |
str |
input | Groq API key supplied at construction. |
history |
History |
input | Conversation log that may be prepended to each request. |
models_list |
list[str] |
input | Ordered (or shuffled if use_random=True) list of model identifiers. |
use_random |
bool |
input | Whether to randomise the model order on initialization. |
prompt |
list[dict[str,str]] |
input | Optional override to the historical prompt. |
result |
str |
output | Generated text returned by generate_answer. |
model_name |
str |
parameter | Current model being attempted. |
client |
Groq |
parameter | Low‑level client used for API calls. |
logger |
BaseLogger |
parameter | Logger used for audit and error reporting. |
Functional Responsibility
The GPTModel orchestrates a fail‑over loop across a set of pre‑configured LLMs hosted on Groq. It:
- Initialises a
Groqclient with the provided API key(s) and stores the model list. - Selects the current model (potentially at random).
- Builds a prompt that may include the conversation
historyor a suppliedprompt. - Invokes the Groq API until a model returns a successful response or all options are exhausted.
- Logs the event and returns the raw text.
If the loop completes without success, a ModelExhaustedException is raised, aborting the documentation generation for that segment.
Visible Interactions
- With
History– appends user and assistant turns whenget_answeris used. - With
BaseLogger– records each attempt, success, or failure at the module level. - With
Model/AsyncModel– inherits method signatures, enabling synchronous or asynchronous usage throughout the factory pipeline. - With external API – uses
Groq(synchronous client) to send messages and receive completions.
Technical Logic Flow
-
Constructor
- Copies
models_list. - If
use_random, shuffles the list. - Sets
current_model_indexandcurrent_key_indexto zero.
- Copies
-
generate_answer(with_history=True, prompt=None)- If
with_historyandpromptisNone, constructs a request fromself.history. - Otherwise, uses supplied
prompt. - Tries
client.generateonself.regen_models_name[self.current_model_index]. - On exception, increments
current_model_indexand retries. - Continues until a model succeeds or the list is exhausted.
- If
-
get_answer_without_history(prompt)- Calls
generate_answer(with_history=False, prompt=prompt)and returns result.
- Calls
-
get_answer(prompt)- Adds user turn to history.
- Calls
generate_answer(). - Adds assistant turn to history.
- Returns answer.
Critical Logic Assumption:
At least one model inmodels_listwill be reachable; otherwise aModelExhaustedExceptionis raised.
Warning:
Random selection (use_random=True) introduces variability in model performance and output style, which may affect consistency of generated documentation.
Missing or Unimplemented Details
- The actual HTTP/GRPC call syntax to
Groqis not present in the provided snippet; implementation details are omitted. - Error handling for specific Groq responses is not shown.
- No unit tests or configuration examples for
api_keysare included in the fragment.
Manager Class Overview
The Manager encapsulates the end‑to‑end workflow that transforms a project’s source code into a coherent markdown document.
It owns the mutable doc_info schema, orchestrates the LLM and embedding pipelines, and writes intermediate artefacts into a hidden cache directory.
Manager Initialization
def __init__(self, project_directory: str,
config: Config,
llm_model: Model,
embedding_model: Embedding,
progress_bar: BaseProgress = BaseProgress())
| Entity | Type | Role | Notes |
|---|---|---|---|
project_directory |
str |
Root of the target project | Path on disk |
config |
Config |
Runtime configuration | Parsed from autodocconfig.yml |
llm_model |
Model |
LLM wrapper | Handles generation requests |
embedding_model |
Embedding |
Vector‑generation engine | Used later in create_embedding_layer |
progress_bar |
BaseProgress |
UI progress indicator | Default is a dummy base class |
Behaviour
- Instantiates a
DocInfoSchema. - Sets up a file‑based
BaseLoggerwriting to<project>/.auto_doc_cache/report.txt. - Invokes
init_folder_system()to create cache artefacts.
The Manager class is used as the central orchestrator for generating documentation.
It is instantiated with the project root, a configuration object, a language‑model instance, an embedding model, and a progress‑bar helper:
sync_model = GPTModel(GROQ_API_KEYS, use_random=False)
embedding = Embedding(GOOGLE_EMBEDDING_API_KEY)
manager = Manager(
project_path,
config=config,
llm_model=sync_model,
embedding_model=embedding,
progress_bar=ConsoleGtiHubProgress(),
)
Typical lifecycle of a Manager instance in a script:
# decide whether to regenerate documentation
if not check_git_status(manager):
exit()
# 1. Produce a representation of the codebase
manager.generate_code_file()
# 2. Create a global summary file if enabled
if structure_settings.use_global_file:
manager.generate_global_info(compress_power=4)
# 3. Split the documentation into manageable parts
manager.generete_doc_parts(
max_symbols=structure_settings.max_doc_part_size,
with_global_file=structure_settings.use_global_file,
)
# 4. Build the documentation using a factory of modules
manager.factory_generate_doc(DocFactory(*custom_modules))
# 5. Optionally order sections
if structure_settings.include_order:
manager.order_doc()
# 6. Add intro elements
additionals = []
if structure_settings.include_intro_text:
additionals.append(IntroText())
if structure_settings.include_intro_links:
additionals.append(IntroLinks())
manager.factory_generate_doc(DocFactory(*additionals, with_splited=False), to_start=True)
# 7. Final touches
manager.create_embedding_layer()
manager.clear_cache()
manager.save()
# 8. Retrieve the assembled document
full_doc = manager.doc_info.doc.get_full_doc()
Key methods exposed by Manager (as used in the provided code):
| Method | Purpose |
|---|---|
generate_code_file() |
Scans the repository and generates internal code representations. |
generate_global_info(compress_power) |
Builds a global information file, optionally compressing it. |
generete_doc_parts(max_symbols, with_global_file) |
Splits documentation into parts limited by max_symbols. |
factory_generate_doc(factory, to_start=False) |
Generates documentation sections using a DocFactory. |
order_doc() |
Orders the generated documentation sections. |
create_embedding_layer() |
Builds embeddings for the documentation content. |
clear_cache() |
Clears temporary cache files. |
save() |
Persists all generated artifacts to disk. |
doc_info.doc.get_full_doc() |
Retrieves the full assembled document as a string. |
read_file_by_file_key(file_key, is_outside) |
Reads the content of a cached file. |
get_file_path(file_key, is_outside) |
Returns the absolute path to a file in the project or outside. |
These methods together allow a user to run the entire documentation generation pipeline programmatically, controlling which parts of the output to include and when to trigger regeneration based on git status.
Cache Folder Initialization
def init_folder_system(self, project_directory)
| Entity | Type | Role | Notes |
|---|---|---|---|
CACHE_FOLDER_NAME |
str |
Directory name | .auto_doc_cache |
FILE_NAMES |
dict |
Map of artefact keys to filenames | e.g. "logs": "report.txt" |
Creates the cache folder and writes an empty CacheSettings JSON file when it does not yet exist.
File Path Resolution
def get_file_path(self, file_key: str, is_outside: bool = False)
| Entity | Type | Role | Notes |
|---|---|---|---|
file_key |
str |
Artifact identifier | Must exist in FILE_NAMES |
is_outside |
bool |
Determines subfolder | True ⇒ project root |
Utility used by all file I/O operations; keeps the cache layout deterministic.
Code‑Mix Generation
def generate_code_file(self)
| Entity | Type | Role | Notes |
|---|---|---|---|
cm |
CodeMix |
Repository‑content builder | Configures with config.ignore_files |
code_mix |
str |
Aggregated source | Stored in doc_info.code_mix |
Logic Flow
- Logs start message.
- Instantiates
CodeMixwith project directory and ignore patterns. - Calls
build_repo_content()to read repository files. - Stores result, updates progress bar.
CodeMix Module: Repository Snapshot Generator
Responsibility
CodeMix produces a textual representation of a project's directory tree and inlined file contents, respecting a user‑defined ignore list.
Interaction with the Rest of the Pipeline
The generated string is typically fed into the embedding step where each file or section receives a semantic vector.
Other components may write this snapshot to disk as part of the global doc structure.
Core Methods
| Method | Entity | Type | Role | Notes |
|---|---|---|---|---|
__init__ |
root_dir, ignore_patterns |
Path, list[str] |
Initializes state | Defaults to current directory; ignore_patterns are used in should_ignore |
should_ignore |
path |
Path |
Returns bool |
Uses fnmatch against full path, basename, and individual parts |
build_repo_content |
— | — | Generates the repo snapshot | Emits logs for ignored items; collects file contents in a formatted string |
should_ignore
- Convert
pathto a relative path underroot_dir. - Iterate over
ignore_patterns. - Return
Trueif any pattern matches the relative path, the filename, or any component of the path. - Otherwise, return
False.
build_repo_content
- Begin with the header
"Repository Structure:". - Recursively walk
root_dir(sorted) and append indented directory/file names. - Append a separator line.
- Walk again to embed file contents:
- For each file that is not ignored, append
<file path="relative/path">, its decoded text, and a newline separator. - Capture any exceptions while reading and append an error message.
- For each file that is not ignored, append
- Join all entries with
"\n"and return the final string.
Global Ignore List
The module ships with ignore_list, a predefined set of glob patterns targeting compiled artifacts, virtual environments, cache directories, and other non‑source files.
The list is applied during should_ignore.
Global Information Compression
def generate_global_info(self, compress_power: int = 4, max_symbols: int = 10000)
| Entity | Type | Role | Notes |
|---|---|---|---|
compress_power |
int |
Aggressiveness of LLM compression | Default 4 |
max_symbols |
int |
Chunk size for split_data |
Default 10 k |
Workflow
- Splits
doc_info.code_mixinto symbols withsplit_data. - Calls
compress_to_one()(LLM‑powered) withcompress_power. - Writes the resulting markdown to
global_info.md.
The compressed text becomes a contextual backdrop for subsequent section generation.
Custom Introduction Processor
The Custom Introduction Processor is a lightweight post‑processing module that builds the introductory text of a generated documentation file.
It extracts anchor links, generates introductory paragraphs with or without link lists, and produces custom description blocks for individual sections.
Functional Flow
-
get_all_html_links
Scans a markdown string for<a name="…">tags, collects anchor names longer than five characters, prefixes each with#, and returns a list.
Logging: UsesBaseLoggerto trace extraction count. -
get_links_intro
Accepts the list fromget_all_html_links, builds a chat‑style prompt and forwards it to an LLM (Modelinterface).
The LLM is instructed to create a short introductory paragraph that references the provided anchors. -
get_introdaction
Generates a generic introduction from a global data block (typically the entire repository summary).
A system prompt specifies the target language and a base introduction template (BASE_INTRO_CREATE). -
generete_custom_discription
Iterates over a string of split data (each part representing a documentation chunk).
For every chunk it constructs a prompt that includes the chunk as context, a system instruction to produce a precise technical description, and a user‑supplied description template (custom_description).
The first non‑empty, non‑“no‑info” response is returned; otherwise an empty string is produced. -
generete_custom_discription_without
Builds a single LLM prompt that forces a one‑time<a name="CONTENT_DESCRIPTION">header, then generates a short hyphenated summary of the supplied text.
The function returns the LLM answer verbatim.
Data Contract
| Function | Entity | Type | Role | Notes |
|---|---|---|---|---|
get_all_html_links |
data |
str |
Markdown source | Returns a list of #anchor strings |
links |
list[str] |
Result | Exposed for further processing | |
get_links_intro |
links |
list[str] |
Anchor list | Passed to LLM |
model |
Model |
LLM wrapper | Must expose get_answer_without_history |
|
language |
str |
Optional | Defaults to "en" |
|
intro_links |
str |
Generated intro | Contains links in markdown | |
get_introdaction |
global_data |
str |
Global summary | Text passed to LLM |
model |
Model |
LLM wrapper | Same as above | |
language |
str |
Optional | Defaults to "en" |
|
intro |
str |
Generated paragraph | Returned directly | |
generete_custom_discription |
splited_data |
str |
Document chunks | Iterated over; should be an iterable |
model |
Model |
LLM wrapper | As above | |
custom_description |
str |
Prompt target | Text describing what to produce | |
language |
str |
Optional | Defaults to "en" |
|
result |
str |
Description block | Empty if no useful response | |
generete_custom_discription_without |
model |
Model |
LLM wrapper | Same as above |
custom_description |
str |
Prompt target | Full text to summarise | |
language |
str |
Optional | Defaults to "en" |
|
result |
str |
Summary | Contains mandatory <a name="CONTENT_DESCRIPTION"> header |
Interaction with Other Modules
- LLM: All functions delegate to an instance of
Model(oftenGPTModel) viaget_answer_without_history. - Logging:
BaseLoggerandInfoLogemit debug information. - Configuration:
BASE_INTRODACTION_CREATE_LINKS,BASE_INTRO_CREATE, andBASE_CUSTOM_DISCRIPTIONSare constant templates imported fromconfig.
Factory‑Based Documentation Generation
def factory_generate_doc(self, doc_factory: DocFactory, to_start: bool = False)
| Entity | Type | Role | Notes |
|---|---|---|---|
doc_factory |
DocFactory |
Module registry | Executes generate_doc() on each registered BaseModule |
to_start |
bool |
Append strategy | True ⇒ prepend, False ⇒ append |
Procedure
- Gathers
language,full_data,code_mix, andglobal_infointo a dict. - Delegates to
doc_factory.generate_doc()with the LLM and progress bar. - Concatenates the returned
DocContentwith the existingdoc_info.doc.
Document Ordering
def order_doc(self)
| Entity | Type | Role | Notes |
|---|---|---|---|
content_orders |
list |
Current order metadata | Updated by get_order() |
Calls the post‑processor get_order() (LLM‑driven) to compute a new sequence for the document parts and overwrites the internal ordering.
Sorting Module: Semantic Title Reordering
Responsibility
The sorting module provides utilities for extracting HTML anchors from raw documentation text, segmenting the document into title/section chunks, and ordering those titles semantically using a language model.
Interaction with the Rest of the Pipeline
split_text_by_anchorsis invoked by the post‑processor after theCodeMixoutput has been concatenated into a single markdown string.- The resulting dictionary is passed to
get_order, which contacts theModelinstance fromautodocgenerator.engine.models.modelto produce an ordered list of section identifiers. - The ordered list is then used by downstream modules to assemble the final doc schema.
Core Functions
| Function | Entity | Type | Role | Notes |
|---|---|---|---|---|
extract_links_from_start |
chunks |
list[str] |
Extracts the first anchor name from each chunk | Returns a tuple (links, have_to_del_first) |
split_text_by_anchors |
text |
str |
Splits a markdown document into {anchor: content} |
Raises Exception if anchor count mismatches content chunks |
get_order |
model, chanks |
list[str] |
Requests semantic ordering from the LLM | Logs progress via BaseLogger |
extract_links_from_start
- Iterate over provided
chunks. - Use regex
^<a name=["']?(.*?)["']?</a>to capture an anchor tag at the beginning of a chunk. - If an anchor name is longer than five characters, add
#<anchor>tolinksand flagis_find. - If a chunk does not contain a valid anchor, set
have_to_del_first = True. - Return the collected
linksand the flag.
split_text_by_anchors
- Split
texton the lookahead pattern(?=<a name["']?[^"'>\s]{6,200}["']?</a>)to preserve anchors. - Strip whitespace from each resulting chunk and filter empties.
- Call
extract_links_from_starton the chunk list to collect anchor references and a deletion flag. - If the first anchor is not at the very start (
text.find("<a name") > 10) or a non‑anchor was found, discard the first chunk (result_chanks.pop(0)). - Validate that the number of anchors matches the number of content chunks; otherwise raise an exception.
- Build a dictionary mapping each anchor (
#name) to its corresponding content chunk. - Return the dictionary.
get_order
- Instantiate a
BaseLoggerand log the start of ordering. - Build a single‑element prompt list with a user instruction to semantically sort the provided titles (
chanks). - Call
model.get_answer_without_history(prompt)to obtain a comma‑separated string. - Strip whitespace from each title and log the final list.
- Return the ordered list of titles.
Cache Clearing
def clear_cache(self)
If config.pbc.save_logs is False, removes the log file from the cache.
Persisting Final Document
def save(self) -> None
Writes the full markdown (doc_info.doc.get_full_doc()) to output_doc.md and serialises the entire doc_info schema to info.json.
External Dependencies
CodeMix– Repository reader.split_data,gen_doc_parts– Synchronous LLM chunking.compress_to_one– LLM‑based compression.get_order,split_text_by_anchors– Post‑processing utilities.Model,Embedding– Abstract LLM and embedding interfaces.BaseProgress,BaseLogger– UI and audit logging.Config,DocInfoSchema– Configuration and document metadata.
Missing or Incomplete Information
- The internal structure of
DocInfoSchema(fields beyondcode_mix,global_info,doc) is not shown. - Implementation of
doc_info.doc.get_full_doc()anddoc_info.doc.add_parts()is external. - Error handling around file I/O is generic (
except: data = None). - No explicit cleanup of temporary files beyond logs.
- The logic that populates
config.pbcandconfig.languageis not presented.
Embedding Layer Creation
def create_embedding_layer(self) -> None
| Entity | Type | Role | Notes |
|---|---|---|---|
embedding_model |
Embedding |
Vectorizer | Passed to each part via init_embedding() |
Iterates over every part in doc_info.doc.parts, invoking init_embedding to attach an embedding vector.
Vector Embedding Utilities
This module implements lightweight vector operations and a wrapper around the Google Gemini embedding API.
It is used by the post‑processor to attach semantic vectors to document fragments and to order them.
Core Functions
| Function | Entity | Type | Role | Notes |
|---|---|---|---|---|
bubble_sort_by_dist |
arr |
list |
In‑place sort | Implements bubble sort on list of tuples (identifier, distance) |
get_len_btw_vectors |
vector1, vector2 |
numpy.ndarray |
Distance calculator | Returns Euclidean norm (np.linalg.norm) |
sort_vectors |
root_vector |
list[float] |
Reference vector | other is a dict[str, list[float]]; returns sorted keys |
Embedding.get_vector |
prompt |
str |
LLM request | Returns embedding of the first token vector |
Data Contract
| Function | Entity | Type | Role | Notes |
|---|---|---|---|---|
bubble_sort_by_dist |
arr |
list[tuple] |
Sorting buffer | Mutates and returns arr |
get_len_btw_vectors |
vector1, vector2 |
numpy.ndarray |
Distance | Uses np.linalg.norm |
sort_vectors |
root_vector |
list[float] |
Reference | other keys sorted by distance |
result_list |
list[str] |
Output | Keys in ascending similarity order | |
Embedding.__init__ |
api_key |
str |
Auth | Instantiates genai.Client |
Embedding.get_vector |
prompt |
str |
Prompt text | Calls client.models.embed_content with a 768‑dimensional config |
text_response |
EmbedContentResponse |
API response | Raises Exception if embeddings missing |
|
vector |
list[float] |
Embedding | First dimension of the returned embeddings |
Interaction with External Libraries
google.genai: ProvidesClientandtypes.EmbedContentConfig.numpy: Used for distance calculations.typing: Supports type hinting ofAny.
Error Handling
Embedding.get_vectorraises a genericException("promblem with embedding")iftext_response.embeddingsisNone.- No other explicit error handling; the calling code must manage exceptions.
Observations & Missing Context
generete_custom_discriptionis typed to acceptsplited_data: strbut iterated as a collection, implying the caller supplies an iterable of strings.- The constants (
BASE_INTRODACTION_CREATE_LINKS,BASE_INTRO_CREATE,BASE_CUSTOM_DISCRIPTIONS) are external; their content is not available. - The functions rely on a
Modelinterface that exposesget_answer_without_history, but details of its implementation are absent. - No explicit unit tests or error handling for malformed HTML anchors beyond a length check.
Compressor Component – Text Aggregation & LLM‑based Compression
The Compressor module provides three public helpers that wrap the LLM’s text‑summarization logic and progressively reduce a collection of raw snippets to a single compressed document.
Compression Workflow
| Step | Function | Purpose | Key Inputs | Side‑Effects |
|---|---|---|---|---|
| 1 | compress |
Sends a single chunk to the LLM and returns the compressed text. | data: raw string; project_settings: ProjectSettings; model: Model; compress_power: int |
LLM interaction (network I/O); no mutating state |
| 2 | compress_and_compare |
Groups a list of chunks into compress_power‑sized blocks, compresses each block, and emits progress updates. | data: list[str]; model, project_settings, compress_power; progress_bar: BaseProgress |
Produces a list of compressed strings of length ⌈len(data)/compress_power⌉ |
| 3 | compress_to_one |
Iteratively feeds the output of compress_and_compare back into itself until a single string remains. |
data: list[str]; same model and settings; compress_power: int |
Final compressed document as a string |
compress(data: str, project_settings: ProjectSettings, model: Model, compress_power) → str
-
Logic
- Builds a three‑part prompt:
- System message with
project_settings.prompt. - System message with base compression hint
get_BASE_COMPRESS_TEXT(len(data), compress_power). - User message containing the raw
data.
- System message with
- Calls
model.get_answer_without_history(prompt=prompt)to obtain the compressed output. - Returns the LLM’s answer string.
- Builds a three‑part prompt:
-
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
str |
Raw content chunk | Must fit within the model’s token budget |
project_settings |
ProjectSettings |
Provides project‑specific context | prompt property concatenates base text with project metadata |
model |
Model |
LLM abstraction | Exposes get_answer_without_history |
compress_power |
int |
Influences prompt text via get_BASE_COMPRESS_TEXT |
Value typically 4 but can be overridden |
- Error Handling – No explicit try/except; any exception propagates from
model.get_answer_without_history.
compress_and_compare(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) → list
- Chunk Allocation – Pre‑allocates a result list sized by
ceil(len(data)/compress_power). - Progress Management – Creates a sub‑task in the progress bar with the total item count.
- Batch Compression – Iterates over
data; each element is appended to its bucket (based on integer division bycompress_power) after compression. - Update & Finish – Calls
progress_bar.update_task()per element and removes the sub‑task once done. - Return – List of compressed blocks, each potentially containing multiple original snippets concatenated.
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
list[str] |
Source snippets | Arbitrary number of items |
model |
Model |
LLM | Same as in compress |
project_settings |
ProjectSettings |
Context prompt | Used for each compression call |
compress_power |
int |
Batch size | Default 4, can be lowered for small collections |
progress_bar |
BaseProgress |
UI feedback | Default instance provided |
compress_to_one(data: list, model: Model, project_settings: ProjectSettings, compress_power: int = 4, progress_bar: BaseProgress = BaseProgress()) → str
- Iterative Reduction – While more than one compressed block remains:
- Adjusts
new_compress_power: iflen(data) < compress_power + 1, sets it to2. - Calls
compress_and_comparewith the current power to shrink the list. - Increments an iteration counter (unused beyond debugging).
- Adjusts
- Return – The sole remaining string after all rounds.
| Entity | Type | Role | Notes |
|---|---|---|---|
data |
list[str] |
List of compressed fragments | Expected to be the output of a prior compress_and_compare or raw data list |
model, project_settings |
same as before | LLM context | Passed unchanged through iterations |
compress_power |
int |
Primary batch size | Dynamically reduced in final passes |
progress_bar |
BaseProgress |
UI | Propagated to each inner call |
- Corner Cases – If
datais empty, the function will raise anIndexErroratdata[0]. No guard is present in this fragment.
Data Contract Summary
| Component | Entity | Type | Role | Notes |
|---|---|---|---|---|
extract_links_from_start |
chunks |
list[str] |
Input | Raw section fragments |
extract_links_from_start |
links |
list[str] |
Output | Anchor strings prefixed with # |
extract_links_from_start |
have_to_del_first |
bool |
Flag | Indicates whether a leading non‑anchor chunk must be dropped |
split_text_by_anchors |
text |
str |
Input | Raw documentation file |
split_text_by_anchors |
result |
dict[str, str] |
Output | Mapping from anchor to content chunk |
get_order |
model |
Model |
Input | LLM abstraction |
get_order |
chanks |
list[str] |
Input | List of section titles (anchors) |
get_order |
new_result |
list[str] |
Output | Semantically sorted titles |
CodeMix.build_repo_content |
root_dir |
Path |
Input | Project root |
CodeMix.build_repo_content |
content |
str |
Output | Textual repo snapshot |
CodeMix.should_ignore |
path |
Path |
Input | File or directory path |
CodeMix.should_ignore |
bool |
Output | Determines if the path is filtered |
Data Contract Summary
| Component | Entity | Type | Role | Notes |
|---|---|---|---|---|
compress |
data |
str |
Raw snippet | |
compress |
project_settings |
ProjectSettings |
Context | |
compress |
model |
Model |
LLM | |
compress |
compress_power |
int |
Prompt modifier | |
compress_and_compare |
data |
list[str] |
Input chunks | |
compress_and_compare |
model |
Model |
LLM | |
compress_and_compare |
project_settings |
ProjectSettings |
Prompt | |
compress_and_compare |
compress_power |
int |
Batch size | |
compress_and_compare |
progress_bar |
BaseProgress |
Progress UI | |
compress_to_one |
data |
list[str] |
Initial chunks | |
compress_to_one |
model |
Model |
LLM | |
compress_to_one |
project_settings |
ProjectSettings |
Prompt | |
compress_to_one |
compress_power |
int |
Initial batch size | |
compress_to_one |
progress_bar |
BaseProgress |
UI |
Logging & Error Handling
compress_and_compareandcompress_to_onerely onBaseProgressfor progress reporting; no logging of individual LLM calls is visible.- No exception handling is implemented in these functions; errors from
model.get_answer_without_historyor from the progress bar surface to the caller. - The module assumes
get_BASE_COMPRESS_TEXTreturns a properly formatted system message; misuse may cause malformed prompts.
logging.py – Structured Logging Utilities
Purpose
Centralizes log message creation and dispatch for the Auto Doc Generator UI.
Logs are formatted with timestamps and severity prefixes and can be routed to the console or a file.
| Entity | Type | Role | Notes |
|---|---|---|---|
BaseLog |
Abstract class | Base log message container | Holds message and numeric level (0 = default) |
ErrorLog, WarningLog, InfoLog |
Subclasses | Severity‑specific loggers | Override format to prepend [ERROR], [WARNING], [INFO] |
BaseLoggerTemplate |
Interface | Dispatch layer | log writes, global_log applies level filter |
FileLoggerTemplate |
Concrete | File‑based logger | Appends each formatted message to a file |
BaseLogger |
Singleton | Public API | Stores active BaseLoggerTemplate and forwards log calls |
The logger’s level comparison uses
log_level < 0to mean “unfiltered” (default).
Core Flow
- Message Instantiation –
ErrorLog("…")creates an instance with the current timestamp via_log_prefix. - Formatting – Each subclass implements
format()to prefix severity and timestamp. - Dispatch –
BaseLogger.log()forwards to the set template’sglobal_log(). - Filtering –
BaseLoggerTemplate.global_log()checks if the messagelevelis allowed before callinglog(). - Output –
BaseLoggerTemplate.log()(orFileLoggerTemplate.log()) writes to stdout or a file.
Error Handling & Logging
split_text_by_anchors: Raises a genericExceptionif the anchor count does not match the number of content chunks, signalling malformed input.CodeMix.build_repo_content: Wraps file reads in atry/except; any exception is appended to the output rather than terminating the process.- Logging: Uses
BaseLoggerto emitInfoLogmessages during ordering;CodeMixlogs ignored paths. No other explicit error handling is present.
progress_base.py – Task Progress Abstractions
Purpose
Provides a pluggable progress interface for both Rich‑based terminal displays and simple console prints, used during long‑running generation stages.
| Entity | Type | Role | Notes |
|---|---|---|---|
BaseProgress |
Abstract | Progress API | Stub methods for sub‑task handling |
LibProgress |
Rich implementation | Rich progress bar | Uses rich.progress.Progress; tracks base and sub‑tasks |
ConsoleTask |
Helper | Simple textual progress | Prints percent for each increment |
ConsoleGtiHubProgress |
Console wrapper | GitHub‑style console progress | Holds a default “General Progress” and optional sub‑task |
Core Flow
- Task Creation –
create_new_subtask(name, total_len)registers a sub‑task;LibProgressadds to the Rich progress container. - Progress Update –
update_task()advances the current sub‑task; if none, advances the base task. - Task Removal –
remove_subtask()clears the current sub‑task reference. - Console Fallback –
ConsoleGtiHubProgressemits human‑readable percentage viaConsoleTask.progress().
split_data – Adaptive Text Chunker
Purpose
Takes a long string (full_code_mix) and a symbol limit (max_symbols) and produces a list of text fragments that each fit under the limit, with a safety margin of 25 %.
The algorithm first iteratively halves any fragment exceeding 150 % of max_symbols, then assembles the final chunks while ensuring no single part grows beyond 125 % of the target.
This function is used by
gen_doc_partsto feed the LLM a manageable prompt size.
def split_data(full_code_mix: str, max_symbols: int) -> list[str]
Logic Flow
- Initialize logger (
BaseLogger) and emit “Starting data splitting…”. - Iteratively split:
- While any element
elinsplited_by_fileshas length > 1.5 ×max_symbols, split it at its half and insert the second half after the current index.
- While any element
- Assemble final parts:
- Traverse
splited_by_filesand append elements to the current part until the cumulative length would exceed 1.25 ×max_symbols, then start a new part.
- Traverse
- Log the number of parts and return the list.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
full_code_mix |
str |
Raw source code | Input to be split |
max_symbols |
int |
Size threshold | Determines chunk limits |
splited_by_files |
list[str] |
Intermediate list | Derived from full_code_mix (not shown) |
split_objects |
list[str] |
Output list | Returned to caller |
logger |
BaseLogger |
Logger | Emits diagnostics |
CacheSettings – Commit Cache
A lightweight Pydantic model holding the SHA of the last processed commit.
class CacheSettings(BaseModel):
last_commit: str = ""
Only a single field is defined; no additional behavior.
Documentation Parts Generation
def generete_doc_parts(self, max_symbols=5_000, with_global_file: bool = False)
| Entity | Type | Role | Notes |
|---|---|---|---|
max_symbols |
int |
Size of each part | Default 5 k |
with_global_file |
bool |
Whether to include global info | Overridden to True internally |
Steps
- Reads the cached global file.
- Calls
gen_doc_parts()(synchronous LLM pipeline) to create chunked markdown. - Writes output to
output_doc.md. - Splits the text at anchors with
split_text_by_anchorsand stores each part indoc_info.doc.parts.
write_docs_by_parts – LLM‑Driven Part Generation
Purpose
Transforms a single text chunk (part) into a documentation fragment via a language model.
The function builds a prompt hierarchy, calls the LLM, and strips markdown fences if present.
def write_docs_by_parts(part: str,
model: Model,
project_settings: ProjectSettings,
prev_info: str | None = None,
language: str = "en",
global_info: str | None = None) -> str
Logic Flow
- Initialize logger and log “Generating documentation for a part…”.
- Construct prompt:
- System role: language selector (
language). - System role: global project metadata (
project_settings.prompt). - System role: base completion template (
BASE_PART_COMPLITE_TEXT). - Optional system role:
global_info. - Optional system role:
prev_infoindicating the last written part. - User role: the actual code chunk (
part).
- System role: language selector (
- LLM invocation:
model.get_answer_without_history(prompt=prompt)→answer. - Strip leading/trailing markdown fences (
"""``") if present. - Return the cleaned answer.
The function does not handle any LLM‑specific errors; any exception propagates to the caller.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
part |
str |
Chunk to document | Input to LLM |
model |
Model |
LLM instance | Must expose get_answer_without_history |
project_settings |
ProjectSettings |
Context | Provides prompt property |
prev_info |
`str | None` | Tail of preceding doc |
language |
str |
Target language | Defaults to "en" |
global_info |
`str | None` | Global relation notes |
answer |
str |
LLM output | May contain fenced code block |
temp_answer |
str |
Stripped version | Used for fence removal |
logger |
BaseLogger |
Logger | Emits generation details |
gen_doc_parts – Multi‑Pass Documentation Builder
Purpose
Orchestrates the full “split‑then‑generate” pipeline: splits the entire source mix, iteratively calls write_docs_by_parts, and concatenates the results.
It also maintains a sliding window of the most recent 3 000 characters to pass as contextual prev_info for the next part.
def gen_doc_parts(full_code_mix,
max_symbols,
model: Model,
project_settings: ProjectSettings,
language,
progress_bar: BaseProgress,
global_info=None) -> str
Logic Flow
- Split
full_code_mix→splited_databy callingsplit_data. - Log start of part generation.
- Create sub‑task in the progress bar:
"Generete doc parts". - Iterate over
splited_data:- Call
write_docs_by_partswith current part,model,project_settings, previous result (result) asprev_info. - Append returned string to
all_result. - Keep only the last 3 000 chars of
resultfor next iteration. - Update the progress bar.
- Call
- Finish sub‑task, log total documentation length, and return
all_result.
Data Contract
| Entity | Type | Role | Notes |
|---|---|---|---|
full_code_mix |
str |
Raw source | Input to be split |
max_symbols |
int |
Chunk size | Used in split_data |
model |
Model |
LLM | For generation |
project_settings |
ProjectSettings |
Config | Supplies prompt |
language |
str |
Locale | Passed to write_docs_by_parts |
progress_bar |
BaseProgress |
UI | Tracks per‑part progress |
global_info |
`str | None` | Project relations |
splited_data |
list[str] |
Chunks | Intermediate |
all_result |
str |
Full documentation | Final output |
DocContent – Embedded Text Block
Encapsulates a documentation string and its optional vector embedding.
class DocContent(BaseModel):
content: str
embedding_vector: list | None = None
def init_embedding(self, embedding_model: Embedding):
self.embedding_vector = embedding_model.get_vector(self.content)
| Method | Input | Output | Notes |
|---|---|---|---|
init_embedding |
Embedding |
None | Sets embedding_vector in place |
DocHeadSchema – Ordered Section Collection
Maintains an ordered list of section names and a mapping to their DocContent.
class DocHeadSchema(BaseModel):
content_orders: list[str] = []
parts: dict[str, DocContent] = {}
def add_parts(self, name, content: DocContent): ...
def get_full_doc(self, split_el: str = "\n") -> str: ...
def __add__(self, other: "DocHeadSchema") -> "DocHeadSchema": ...
Key Operations
| Method | Role | Notes |
|---|---|---|
add_parts |
Insert a uniquely named section | Avoids name collisions by appending _{i} |
get_full_doc |
Concatenate all parts in order | Uses split_el separator |
__add__ |
Merge with another DocHeadSchema |
Preserves ordering |
DocInfoSchema – Full Document Metadata
Root schema aggregating global info, the raw source mix, and the hierarchical sections.
class DocInfoSchema(BaseModel):
global_info: str = ""
code_mix: str = ""
doc: DocHeadSchema = Field(default_factory=DocHeadSchema)
| Field | Type | Role | Notes |
|---|---|---|---|
global_info |
str |
Project description | Raw text |
code_mix |
str |
Raw source | Used by the generator |
doc |
DocHeadSchema |
Structured doc | Holds sections |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autodocgenerator-1.1.8.9.tar.gz.
File metadata
- Download URL: autodocgenerator-1.1.8.9.tar.gz
- Upload date:
- Size: 60.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
113d3fb16573e8bcdfd6249ffdf7fae7e0a7513ac56558d3b13448f7b1e58f62
|
|
| MD5 |
4c68b689254f1022127cdd1f1fde18b0
|
|
| BLAKE2b-256 |
026cc2a2ce34900be8e9a35d1d4665cc464426d2986b44dd5958fff29dce91f7
|
File details
Details for the file autodocgenerator-1.1.8.9-py3-none-any.whl.
File metadata
- Download URL: autodocgenerator-1.1.8.9-py3-none-any.whl
- Upload date:
- Size: 49.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd844e070aa87f2b94bf432a3046bab673fc6e9a75a04d95809024108c82f416
|
|
| MD5 |
a7c623bd0ffdd398f471ff199f589341
|
|
| BLAKE2b-256 |
0a979e60028969e39517ef990167d9d79d0c25977c313188ae3dec84d269d4c8
|