No project description provided
Project description
ManualForge
Configuration-driven management manual generation framework. Define your data sources, fields, and templates in YAML — get a formatted report.
Built on Kedro pipelines with Polars for data processing and Typst for document rendering.
Philosophy
ManualForge separates what you want to produce from how it's produced.
- What: Defined in
conf/base/parameters_manualforge.yml— your data sources, expected columns, standardization rules, sort orders, summary dimensions, and report templates. - How: Implemented by the pipeline nodes — reusable data processing functions that read from your config.
To create a new manual for a different domain, you only need to edit the config file (and optionally provide new templates). No Python code changes required.
Features
| Capability | Description |
|---|---|
| Multi-sheet Excel ingestion | Auto-detect headers, filter cover sheets, merge into structured DataFrames |
| Field standardization | Mapping files + exact matching + fuzzy matching (difflib / duckdb) |
| Config-driven summaries | Define group-by dimensions, sort orders, ability categories, and output paths in YAML |
| Typst report generation | Jinja2 templates → Typst source → PDF compilation |
| Pipeline hooks | Shell command hooks at pipeline/node granularity for pre/post processing |
Quick Start
# 1. Install dependencies
pip install -r requirements.txt
# 2. Copy and customize configuration
cp conf/examples/parameters_manualforge.yml.example conf/base/parameters_manualforge.yml
cp conf/examples/catalog.yml.example conf/base/catalog.yml
cp conf/examples/hooks.yml.example conf/base/hooks.yml
cp conf/examples/parameters.yml.example conf/base/parameters.yml
cp conf/examples/credentials.yml.example conf/local/credentials.yml
# 3. Edit the config files to point to your data sources
# (conf/base/ is gitignored — your real configs stay local)
# 4. Run the pipeline
kedro run
# Run specific node groups
kedro run --tags conversion # Excel → Parquet only
kedro run --tags standardization # Standardization only
kedro run --tags csv # Summary tables only
Project Structure
├── conf/
│ ├── base/ # ★ Gitignored — copy from examples/
│ │ ├── parameters_manualforge.yml # Central project configuration
│ │ ├── catalog.yml # Kedro data catalog
│ │ ├── hooks.yml # Pipeline hooks (shell commands)
│ │ └── parameters.yml # Pipeline parameters
│ ├── examples/ # ★ Tracked example templates
│ │ ├── parameters_manualforge.yml.example
│ │ ├── catalog.yml.example
│ │ ├── hooks.yml.example
│ │ ├── parameters.yml.example
│ │ └── credentials.yml.example
│ ├── local/ # Local-only (gitignored)
│ │ └── credentials.yml
│ └── logging.yml
├── data/ # Gitignored except .gitkeep
│ ├── 01_raw/ # Raw Excel/CSV + mapping files
│ ├── 02_intermediate/ # Parquet, reconcile reports
│ ├── 03_primary/ # Standardized data
│ ├── 04_feature/ # Summary tables (CSV + Markdown)
│ └── 08_reporting/ # Typst sources & compiled PDFs
├── scripts/ # Auxiliary scripts
│ ├── convert_csv_to_md.py # CSV → Markdown conversion
│ ├── extract_rule_field_mapping.py # Rule field extraction
│ ├── extract_rule_overview.py # Rule overview extraction
│ └── render_with_forge.py # Markdown → DOCX/PDF rendering
├── src/manualforge/ # Framework source code
│ ├── config.py # Configuration helper utilities
│ ├── hooks.py # Kedro pipeline hooks
│ ├── io/ # Custom Kedro datasets (PolarsExcelDataset)
│ ├── pipelines/ # Pipeline definitions & node functions
│ └── settings.py # Kedro project settings
├── templates/ # Jinja2 Typst templates
│ └── report.typ.j2
├── pyproject.toml # Project metadata & dependencies
└── requirements.txt
Configuration Guide
The central configuration file is conf/base/parameters_manualforge.yml. Copy from conf/examples/ and customize:
1. Data Sources
Define your Excel files, expected headers, and sheet filtering rules:
datasources:
primary_data:
filepath: "data/01_raw/your_data.xlsx"
sheet:
exclude_names: ["封面", "封皮"]
name_becomes_column: "sheet_name"
header_detection:
mode: keyword_match
expected_headers:
- "column_a"
- "column_b"
cleaning:
drop_rows_where:
column_a: ["column_a"] # drop residual header rows
fill_null: forward
deduplicate: true
2. Field Standardization
Define which fields to standardize, their mapping files, and special corrections:
standardization:
fields:
- name: "dept_name"
mapping_file: "data/01_raw/dept_list"
case_corrections:
wrong_name: "correct_name"
special_mappings:
alias: "canonical_name"
fuzzy:
enabled: true
threshold: 0.8
method: difflib # difflib | duckdb
3. Sort Orders
Define reusable sort order lists referenced by summaries:
sort_orders:
model_names:
- "Model A"
- "Model B"
dep_names:
- "HR"
- "Finance"
4. Summaries
Define what summary tables to generate:
summaries:
my_summary:
description: "Fields grouped by model and department"
group_by: ["model", "department"]
struct_columns: ["module", "system", "field_name"]
sort_by:
department: dep_names
output:
csv: "data/04_feature/my_summary.csv"
5. Reports
Define report templates and output:
reports:
my_report:
description: "Rules cookbook"
template_source: inline
data_source: rules_data
output_typ: "data/08_reporting/output.typ"
typst_compile:
enabled: true
Data Layers
| Layer | Directory | Description |
|---|---|---|
| Raw | data/01_raw/ |
Source Excel/CSV files, mapping files |
| Intermediate | data/02_intermediate/ |
Parquet, reconcile reports |
| Primary | data/03_primary/ |
Standardized data |
| Feature | data/04_feature/ |
Summary tables (CSV + Markdown) |
| Reporting | data/08_reporting/ |
Typst sources & PDF output |
Requirements
- Python >= 3.10
- Typst CLI (for PDF compilation)
Development
pip install -e ".[dev]"
ruff check src/
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manualforge-0.1.2.tar.gz.
File metadata
- Download URL: manualforge-0.1.2.tar.gz
- Upload date:
- Size: 36.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c594070712b220229b445611f5e418947c6e42fc799ae487b025aa7def64de73
|
|
| MD5 |
897e7d4d244e5127469e405bba20fa87
|
|
| BLAKE2b-256 |
5b48f32c86d8d267bc67c54660e2e476c15dc07eb32375dece1e76135b982716
|
File details
Details for the file manualforge-0.1.2-py3-none-any.whl.
File metadata
- Download URL: manualforge-0.1.2-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2af054120b0718f9735e74b7c31932f01e194fa71bf17447d1a4604c16e1bd5a
|
|
| MD5 |
060b2d1558412f4c715e742e9dd67772
|
|
| BLAKE2b-256 |
712f796c9c319ecfcf2880a4c53107cbb191b1fcb29d48ab43b986b898f99883
|