A configuration-driven management manual generation framework based on Kedro pipelines with Polars and Typst.
Project description
ManualForge
Configuration-driven management manual generation framework. 配置驱动的管理手册生成框架。 Define your data sources, fields, and templates in YAML — get a formatted report. 在 YAML 中定义数据源、字段和模板,即可生成格式化报告。
Built on Kedro pipelines with Polars for data processing and Typst for document rendering. 基于 Kedro 流水线 + Polars 数据处理 + Typst 文档渲染。
ManualForge is a reusable Python package (pip install manualforge). For a real-world downstream application, see modelmanual — a Chinese regulatory manual generator that extends ManualForge with rule-code field mapping and reconciliation reporting.
ManualForge 是一个可复用的 Python 包(pip install manualforge)。实际下游应用示例见 modelmanual —— 一个基于 ManualForge 的中文法规手册生成器,扩展了规则代码字段映射和核对报告功能。
Philosophy / 设计理念
ManualForge separates what you want to produce from how it's produced. ManualForge 将「要生成什么」与「如何生成」解耦。
- What / 内容: Defined in
conf/base/parameters_manualforge.yml— your data sources, expected columns, standardization rules, sort orders, summary dimensions, and report templates. 在配置文件中定义数据源、期望列、标准化规则、排序、汇总维度和报告模板。 - How / 方法: Implemented by the pipeline nodes — reusable data processing functions that read from your config. 由流水线节点实现——可复用的数据处理函数,读取配置驱动行为。
To create a new manual for a different domain, you only need to edit the config file (and optionally provide new templates). No Python code changes required. 要为新领域创建手册,只需编辑配置文件(可选提供新模板),无需修改 Python 代码。
Features / 功能
| Capability / 能力 | Description / 说明 |
|---|---|
| Multi-sheet Excel ingestion / 多表 Excel 读取 | Auto-detect headers, filter cover sheets, merge into structured DataFrames. 自动检测表头,过滤封面页,合并为结构化 DataFrame。 |
| Field standardization / 字段标准化 | Mapping files + exact matching + fuzzy matching (difflib / duckdb). 映射文件 + 精确匹配 + 模糊匹配。 |
| Config-driven summaries / 配置驱动汇总 | Define group-by dimensions, sort orders, ability categories, and output paths in YAML. 在 YAML 中定义分组维度、排序、能力类别和输出路径。 |
| Typst report generation / Typst 报告生成 | Jinja2 templates → Typst source → PDF compilation. Jinja2 模板 → Typst 源码 → PDF 编译。 |
| Pipeline hooks / 流水线钩子 | Shell command hooks at pipeline/node granularity for pre/post processing. 流水线/节点粒度的 shell 命令钩子,用于前后处理。 |
| Auto-backup / 自动备份 | Pre-run config snapshot + post-run data backup via hooks. 跑前配置快照 + 跑后数据备份,通过 hooks 自动触发。 |
| Config deploy / 配置部署 | cfg-backup / cfg-deploy — backup, restore, and deploy configs from templates. 备份、恢复和从模板部署配置文件。 |
Quick Start / 快速开始
# 1. Install dependencies / 安装依赖
pip install -r requirements.txt
# 2. Copy and customize configuration / 复制并自定义配置
# Option A: interactive deployment / 交互式部署
./scripts/cfg-deploy --from-examples
# Option B: manual copy / 手动复制
cp conf/examples/parameters_manualforge.yml.example conf/base/parameters_manualforge.yml
cp conf/examples/catalog.yml.example conf/base/catalog.yml
cp conf/examples/hooks.yml.example conf/base/hooks.yml
cp conf/examples/parameters.yml.example conf/base/parameters.yml
cp conf/examples/credentials.yml.example conf/local/credentials.yml
# 3. Edit the config files to point to your data sources
# 编辑配置文件,指向你的数据源
# (conf/base/ is gitignored — your real configs stay local)
# (conf/base/ 已 gitignore — 实际配置保存在本地)
# 4. Run the pipeline / 运行流水线
kedro run
# Run specific node groups / 运行特定节点组
kedro run --tags conversion # Excel → Parquet only / 仅 Excel → Parquet
kedro run --tags standardization # Standardization only / 仅标准化
kedro run --tags csv # Summary tables only / 仅汇总表
Backup & Config Management / 备份与配置管理
Auto-backup via hooks (runs on every kedro run):
通过 hooks 自动备份(每次 kedro run 自动触发):
kedro run
├─ [before_pipeline] cfg-backup ← snapshot conf/base/
└─ [after_pipeline] backup_data.sh ← snapshot pipeline output data
Manual backup/restore/deploy: 手动备份/恢复/部署:
# Config backup / 配置备份
./scripts/cfg-backup # backup conf/base/ → conf/.backups/
./scripts/cfg-backup -l # list existing backups
# Config restore / deploy from examples / 配置恢复 / 从模板部署
./scripts/cfg-deploy # interactive menu | 交互菜单
./scripts/cfg-deploy -l # list config backups
./scripts/cfg-deploy -r 20260617_105645 # restore specific backup | 恢复指定备份
./scripts/cfg-deploy --from-examples # deploy fresh templates | 从模板部署
./scripts/cfg-deploy --from-examples --dry-run # preview | 预览
# Data backup / 数据备份
./scripts/backup_data.sh # backup pipeline output → data/.backups/
./scripts/backup_data.sh -k 5 # keep only last 5 backups
Project Structure / 项目结构
├── conf/
│ ├── base/ # ★ Gitignored — copy from examples/ | 从 examples/ 复制
│ │ ├── parameters_manualforge.yml # Central project configuration | 项目中心配置
│ │ ├── catalog.yml # Kedro data catalog | 数据目录
│ │ ├── hooks.yml # Pipeline hooks (shell commands) | 流水线钩子
│ │ └── parameters.yml # Pipeline parameters | 流水线参数
│ ├── examples/ # ★ Tracked example templates | 版本追踪的示例模板
│ │ ├── parameters_manualforge.yml.example
│ │ ├── catalog.yml.example
│ │ ├── hooks.yml.example
│ │ ├── parameters.yml.example
│ │ └── credentials.yml.example
│ ├── local/ # Local-only (gitignored) | 仅本地 (gitignored)
│ │ └── credentials.yml
│ └── logging.yml
├── data/ # Gitignored except .gitkeep | 除 .gitkeep 外均 gitignored
│ ├── 01_raw/ # Raw Excel/CSV + mapping files | 原始数据 + 映射文件
│ ├── 02_intermediate/ # Parquet, reconcile reports | 中间数据、核对报告
│ ├── 03_primary/ # Standardized data | 标准化后数据
│ ├── 04_feature/ # Summary tables (CSV + Markdown) | 汇总表
│ └── 08_reporting/ # Typst sources & compiled PDFs | Typst 源码和 PDF
├── scripts/ # Auxiliary scripts | 辅助脚本
│ ├── backup_data.sh # ★ Backup pipeline output data | 备份管道输出数据
│ ├── cfg-backup # ★ Backup conf/base/ config | 备份配置文件
│ ├── cfg-deploy # ★ Deploy/restore configs | 部署/恢复配置
│ ├── main.sh # ★ Kedro runner with auto-backup | 带自动备份的启动脚本
│ ├── convert_csv_to_md.py # CSV → Markdown conversion | 转换
│ ├── extract_rule_field_mapping.py # Rule field extraction | 规则字段提取
│ ├── extract_rule_overview.py # Rule overview extraction | 规则概览提取
│ └── render_with_forge.py # Markdown → DOCX/PDF rendering | 渲染
├── src/manualforge/ # Framework source code | 框架源码
│ ├── config.py # Configuration helper utilities | 配置工具
│ ├── hooks.py # Kedro pipeline hooks (PipelineHooks base class) | 流水线钩子基类
│ ├── io/ # Custom Kedro datasets (PolarsExcelDataset) | 自定义数据集
│ ├── pipelines/
│ │ └── data_processing_pl/ # Core pipeline: 12 reusable nodes | 核心流水线:12 个可复用节点
│ │ ├── nodes.py # Node functions | 节点函数
│ │ ├── pipeline.py # Pipeline definition | 流水线定义
│ │ ├── rulecsv2typ.py # CSV → Typst/Jinja conversion | CSV → Typst/Jinja 转换
│ │ └── standardize_fields.py # Field standardization engine | 字段标准化引擎
│ ├── pipeline_registry.py # Pipeline registration | 流水线注册
│ ├── settings.py # Kedro project settings | 项目设置
│ └── __main__.py # CLI entry point | CLI 入口
├── templates/ # Jinja2 Typst templates | Jinja2 Typst 模板
│ └── recipe.typ.j2
├── pyproject.toml # Project metadata & dependencies | 项目元数据和依赖
└── requirements.txt
Configuration Guide / 配置指南
The central configuration file is conf/base/parameters_manualforge.yml. Copy from conf/examples/ and customize.
核心配置文件为 conf/base/parameters_manualforge.yml。从 conf/examples/ 复制后进行自定义。
1. Data Sources / 数据源
Define your Excel files, expected headers, and sheet filtering rules. 定义 Excel 文件、期望表头和 Sheet 过滤规则:
datasources:
primary_data:
filepath: "data/01_raw/your_data.xlsx"
sheet:
exclude_names: ["封面", "封皮"]
name_becomes_column: "sheet_name"
header_detection:
mode: keyword_match
expected_headers:
- "column_a"
- "column_b"
cleaning:
drop_rows_where:
column_a: ["column_a"] # drop residual header rows | 删除残留表头行
fill_null: forward
deduplicate: true
2. Field Standardization / 字段标准化
Define which fields to standardize, their mapping files, and special corrections. 定义需要标准化的字段、映射文件和特殊修正:
standardization:
fields:
- name: "dept_name"
mapping_file: "data/01_raw/dept_list"
case_corrections:
wrong_name: "correct_name"
special_mappings:
alias: "canonical_name"
fuzzy:
enabled: true
threshold: 0.8
method: difflib # difflib | duckdb
3. Sort Orders / 排序
Define reusable sort order lists referenced by summaries. 定义汇总引用的可复用排序列表:
sort_orders:
model_names:
- "Model A"
- "Model B"
dep_names:
- "HR"
- "Finance"
4. Summaries / 汇总
Define what summary tables to generate. 定义要生成的汇总表:
summaries:
my_summary:
description: "Fields grouped by model and department"
group_by: ["model", "department"]
struct_columns: ["module", "system", "field_name"]
sort_by:
department: dep_names
output:
csv: "data/04_feature/my_summary.csv"
5. Reports / 报告
Define report templates and output. 定义报告模板和输出:
reports:
my_report:
description: "Rules cookbook"
template_source: inline
data_source: rules_data
output_typ: "data/08_reporting/output.typ"
typst_compile:
enabled: true
Data Layers / 数据分层
| Layer / 层级 | Directory / 目录 | Description / 说明 |
|---|---|---|
| Raw / 原始 | data/01_raw/ |
Source Excel/CSV files, mapping files / 源文件与映射文件 |
| Intermediate / 中间 | data/02_intermediate/ |
Parquet, reconcile reports / Parquet 与核对报告 |
| Primary / 主数据 | data/03_primary/ |
Standardized data / 标准化后数据 |
| Feature / 特征 | data/04_feature/ |
Summary tables (CSV + Markdown) / 汇总表 |
| Reporting / 报告 | data/08_reporting/ |
Typst sources & PDF output / Typst 源码与 PDF |
Requirements / 环境要求
- Python >= 3.10
- Typst CLI (for PDF compilation / 用于 PDF 编译)
Recent Changes / 近期变更
2026-06-29
- Template extraction (
rulecsv2typ.py): Extracted inlineRECIPE_TEMPLATE_LEVEL2into standalonetemplates/recipe.typ.j2; loaded via_load_recipe_template()usingPath(__file__).parents[4]. Removed orphanedtemplates/report.typ.j2.
2026-06-26
- Rule code parsing (
nodes.py): Added_parse_rule_codesand_RULE_CODE_REfor GZW rule code extraction; rule-code column saved asList(String)without forward-fill; added "规则代码" toEXPECTED_HEADERinconvert_excel_to_parquet_fj1andprocess_attachment1_excel. - Model name standardization (
nodes.py): Added_load_model_mapping()and_normalize_model_name()for fuzzy-matching model names against a reference table; 概览 sheet now uses its own 模型名称 column instead of the sheet name. - Department field (
nodes.py): Added 主研部门 to fj2EXPECTED_HEADER. - Path resolution (
nodes.py,standardize_fields.py): ChangedPath(__file__).parents[4]toPath.cwd()so paths resolve correctly when ManualForge is used as an installed package (e.g., from modelmanual). - Direct data access (
rulecsv2typ.py):convert_rules_to_typst_jinjanow reads from the Kedro catalog as a Polars DataFrame directly instead of round-tripping through CSV on disk.
Development / 开发
pip install -e ".[dev]"
ruff check src/
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manualforge-0.3.1.tar.gz.
File metadata
- Download URL: manualforge-0.3.1.tar.gz
- Upload date:
- Size: 46.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99f316d3eb976cafe6bbea7e0c3d514c0f06ca502e627dc8a600403ae47a5d09
|
|
| MD5 |
2ce7f9442ab970d0b5d5f130fee84116
|
|
| BLAKE2b-256 |
77ac9d9bd391154227aac3b8c3460b3615bd579c489551897e0d0e6ee617c855
|
File details
Details for the file manualforge-0.3.1-py3-none-any.whl.
File metadata
- Download URL: manualforge-0.3.1-py3-none-any.whl
- Upload date:
- Size: 43.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e591d9a6eace9d6e3e8e6499e23513d1987256ba6d829e6c0bc42105cb194d10
|
|
| MD5 |
6cd2830412a5298bb201f65a95c3c4b0
|
|
| BLAKE2b-256 |
f988614630f14d17843da30beaf26e5a0b977717a03f1ff96e103c6edf6ffc2a
|