No project description provided
Project description
ManualForge
Configuration-driven management manual generation framework. 配置驱动的管理手册生成框架。 Define your data sources, fields, and templates in YAML — get a formatted report. 在 YAML 中定义数据源、字段和模板,即可生成格式化报告。
Built on ManualForge framework with Kedro pipelines, Polars for data processing, and Typst for document rendering. 基于 ManualForge 框架 + Kedro 流水线 + Polars 数据处理 + Typst 文档渲染。
Philosophy / 设计理念
ManualForge separates what you want to produce from how it's produced. ManualForge 将「要生成什么」与「如何生成」解耦。
- What / 内容: Defined in
conf/base/parameters_manualforge.yml— your data sources, expected columns, standardization rules, sort orders, summary dimensions, and report templates. 在配置文件中定义数据源、期望列、标准化规则、排序、汇总维度和报告模板。 - How / 方法: Implemented by the pipeline nodes — reusable data processing functions that read from your config. 由流水线节点实现——可复用的数据处理函数,读取配置驱动行为。
To create a new manual for a different domain, you only need to edit the config file (and optionally provide new templates). No Python code changes required. 要为新领域创建手册,只需编辑配置文件(可选提供新模板),无需修改 Python 代码。
Features / 功能
| Capability / 能力 | Description / 说明 |
|---|---|
| Multi-sheet Excel ingestion / 多表 Excel 读取 | Auto-detect headers, filter cover sheets, merge into structured DataFrames. 自动检测表头,过滤封面页,合并为结构化 DataFrame。 |
| Field standardization / 字段标准化 | Mapping files + exact matching + fuzzy matching (difflib / duckdb). 映射文件 + 精确匹配 + 模糊匹配。 |
| Config-driven summaries / 配置驱动汇总 | Define group-by dimensions, sort orders, ability categories, and output paths in YAML. 在 YAML 中定义分组维度、排序、能力类别和输出路径。 |
| Typst report generation / Typst 报告生成 | Jinja2 templates → Typst source → PDF compilation. Jinja2 模板 → Typst 源码 → PDF 编译。 |
| Pipeline hooks / 流水线钩子 | Shell command hooks at pipeline/node granularity for pre/post processing. 流水线/节点粒度的 shell 命令钩子,用于前后处理。 |
Quick Start / 快速开始
# 1. Install dependencies / 安装依赖
pip install -r requirements.txt
# 2. Copy and customize configuration / 复制并自定义配置
cp conf/examples/parameters_manualforge.yml.example conf/base/parameters_manualforge.yml
cp conf/examples/catalog.yml.example conf/base/catalog.yml
cp conf/examples/hooks.yml.example conf/base/hooks.yml
cp conf/examples/parameters.yml.example conf/base/parameters.yml
cp conf/examples/credentials.yml.example conf/local/credentials.yml
# 3. Edit the config files to point to your data sources
# 编辑配置文件,指向你的数据源
# (conf/base/ is gitignored — your real configs stay local)
# (conf/base/ 已 gitignore — 实际配置保存在本地)
# 4. Run the pipeline / 运行流水线
kedro run
# Run specific node groups / 运行特定节点组
kedro run --tags conversion # Excel → Parquet only / 仅 Excel → Parquet
kedro run --tags standardization # Standardization only / 仅标准化
kedro run --tags csv # Summary tables only / 仅汇总表
Project Structure / 项目结构
├── conf/
│ ├── base/ # ★ Gitignored — copy from examples/ | 从 examples/ 复制
│ │ ├── parameters_manualforge.yml # Central project configuration | 项目中心配置
│ │ ├── catalog.yml # Kedro data catalog | 数据目录
│ │ ├── hooks.yml # Pipeline hooks (shell commands) | 流水线钩子
│ │ └── parameters.yml # Pipeline parameters | 流水线参数
│ ├── examples/ # ★ Tracked example templates | 版本追踪的示例模板
│ │ ├── parameters_manualforge.yml.example
│ │ ├── catalog.yml.example
│ │ ├── hooks.yml.example
│ │ ├── parameters.yml.example
│ │ └── credentials.yml.example
│ ├── local/ # Local-only (gitignored) | 仅本地 (gitignored)
│ │ └── credentials.yml
│ └── logging.yml
├── data/ # Gitignored except .gitkeep | 除 .gitkeep 外均 gitignored
│ ├── 01_raw/ # Raw Excel/CSV + mapping files | 原始数据 + 映射文件
│ ├── 02_intermediate/ # Parquet, reconcile reports | 中间数据、核对报告
│ ├── 03_primary/ # Standardized data | 标准化后数据
│ ├── 04_feature/ # Summary tables (CSV + Markdown) | 汇总表
│ └── 08_reporting/ # Typst sources & compiled PDFs | Typst 源码和 PDF
├── scripts/ # Auxiliary scripts | 辅助脚本
│ ├── convert_csv_to_md.py # CSV → Markdown conversion | 转换
│ ├── extract_rule_field_mapping.py # Rule field extraction | 规则字段提取
│ ├── extract_rule_overview.py # Rule overview extraction | 规则概览提取
│ └── render_with_forge.py # Markdown → DOCX/PDF rendering | 渲染
├── src/manualforge/ # Framework source code | 框架源码
│ ├── config.py # Configuration helper utilities | 配置工具
│ ├── hooks.py # Kedro pipeline hooks | 流水线钩子
│ ├── io/ # Custom Kedro datasets (PolarsExcelDataset) | 自定义数据集
│ ├── pipelines/ # Pipeline definitions & node functions | 流水线定义和节点
│ └── settings.py # Kedro project settings | 项目设置
├── templates/ # Jinja2 Typst templates | Jinja2 Typst 模板
│ └── report.typ.j2
├── pyproject.toml # Project metadata & dependencies | 项目元数据和依赖
└── requirements.txt
Configuration Guide / 配置指南
The central configuration file is conf/base/parameters_manualforge.yml. Copy from conf/examples/ and customize.
核心配置文件为 conf/base/parameters_manualforge.yml。从 conf/examples/ 复制后进行自定义。
1. Data Sources / 数据源
Define your Excel files, expected headers, and sheet filtering rules. 定义 Excel 文件、期望表头和 Sheet 过滤规则:
datasources:
primary_data:
filepath: "data/01_raw/your_data.xlsx"
sheet:
exclude_names: ["封面", "封皮"]
name_becomes_column: "sheet_name"
header_detection:
mode: keyword_match
expected_headers:
- "column_a"
- "column_b"
cleaning:
drop_rows_where:
column_a: ["column_a"] # drop residual header rows | 删除残留表头行
fill_null: forward
deduplicate: true
2. Field Standardization / 字段标准化
Define which fields to standardize, their mapping files, and special corrections. 定义需要标准化的字段、映射文件和特殊修正:
standardization:
fields:
- name: "dept_name"
mapping_file: "data/01_raw/dept_list"
case_corrections:
wrong_name: "correct_name"
special_mappings:
alias: "canonical_name"
fuzzy:
enabled: true
threshold: 0.8
method: difflib # difflib | duckdb
3. Sort Orders / 排序
Define reusable sort order lists referenced by summaries. 定义汇总引用的可复用排序列表:
sort_orders:
model_names:
- "Model A"
- "Model B"
dep_names:
- "HR"
- "Finance"
4. Summaries / 汇总
Define what summary tables to generate. 定义要生成的汇总表:
summaries:
my_summary:
description: "Fields grouped by model and department"
group_by: ["model", "department"]
struct_columns: ["module", "system", "field_name"]
sort_by:
department: dep_names
output:
csv: "data/04_feature/my_summary.csv"
5. Reports / 报告
Define report templates and output. 定义报告模板和输出:
reports:
my_report:
description: "Rules cookbook"
template_source: inline
data_source: rules_data
output_typ: "data/08_reporting/output.typ"
typst_compile:
enabled: true
Data Layers / 数据分层
| Layer / 层级 | Directory / 目录 | Description / 说明 |
|---|---|---|
| Raw / 原始 | data/01_raw/ |
Source Excel/CSV files, mapping files / 源文件与映射文件 |
| Intermediate / 中间 | data/02_intermediate/ |
Parquet, reconcile reports / Parquet 与核对报告 |
| Primary / 主数据 | data/03_primary/ |
Standardized data / 标准化后数据 |
| Feature / 特征 | data/04_feature/ |
Summary tables (CSV + Markdown) / 汇总表 |
| Reporting / 报告 | data/08_reporting/ |
Typst sources & PDF output / Typst 源码与 PDF |
Requirements / 环境要求
- Python >= 3.10
- ManualForge — core framework for config-driven manual generation / 配置驱动手册生成的核心框架
- Typst CLI (for PDF compilation / 用于 PDF 编译)
Development / 开发
pip install -e ".[dev]"
ruff check src/
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manualforge-0.1.4.tar.gz.
File metadata
- Download URL: manualforge-0.1.4.tar.gz
- Upload date:
- Size: 46.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9526706c403bbdefb423f05b2bd9be06cad3fb742092d4d53816a419fe66519
|
|
| MD5 |
7170f14068fcb46d362bbbde1048254c
|
|
| BLAKE2b-256 |
03ec707e5690242ea26e78b85ff5abbb13450bcf86c0159f6b5b7b6ec8a6d1bf
|
File details
Details for the file manualforge-0.1.4-py3-none-any.whl.
File metadata
- Download URL: manualforge-0.1.4-py3-none-any.whl
- Upload date:
- Size: 45.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aed7cd21e75262adaa9e11e2c6699105b27449b9bf5daaffaaa40e63004810a5
|
|
| MD5 |
22dfee01608994297a642e2d47a1ea24
|
|
| BLAKE2b-256 |
1b64d4d284d4146dc801b83ece41361c1ba36dc1dc9704ec9b56d35570322559
|