Skip to main content

No project description provided

Project description

ManualForge

Configuration-driven management manual generation framework. 配置驱动的管理手册生成框架。 Define your data sources, fields, and templates in YAML — get a formatted report. 在 YAML 中定义数据源、字段和模板,即可生成格式化报告。

Built on ManualForge framework with Kedro pipelines, Polars for data processing, and Typst for document rendering. 基于 ManualForge 框架 + Kedro 流水线 + Polars 数据处理 + Typst 文档渲染。

Philosophy / 设计理念

ManualForge separates what you want to produce from how it's produced. ManualForge 将「要生成什么」与「如何生成」解耦。

  • What / 内容: Defined in conf/base/parameters_manualforge.yml — your data sources, expected columns, standardization rules, sort orders, summary dimensions, and report templates. 在配置文件中定义数据源、期望列、标准化规则、排序、汇总维度和报告模板。
  • How / 方法: Implemented by the pipeline nodes — reusable data processing functions that read from your config. 由流水线节点实现——可复用的数据处理函数,读取配置驱动行为。

To create a new manual for a different domain, you only need to edit the config file (and optionally provide new templates). No Python code changes required. 要为新领域创建手册,只需编辑配置文件(可选提供新模板),无需修改 Python 代码。

Features / 功能

Capability / 能力 Description / 说明
Multi-sheet Excel ingestion / 多表 Excel 读取 Auto-detect headers, filter cover sheets, merge into structured DataFrames. 自动检测表头,过滤封面页,合并为结构化 DataFrame。
Field standardization / 字段标准化 Mapping files + exact matching + fuzzy matching (difflib / duckdb). 映射文件 + 精确匹配 + 模糊匹配。
Config-driven summaries / 配置驱动汇总 Define group-by dimensions, sort orders, ability categories, and output paths in YAML. 在 YAML 中定义分组维度、排序、能力类别和输出路径。
Typst report generation / Typst 报告生成 Jinja2 templates → Typst source → PDF compilation. Jinja2 模板 → Typst 源码 → PDF 编译。
Pipeline hooks / 流水线钩子 Shell command hooks at pipeline/node granularity for pre/post processing. 流水线/节点粒度的 shell 命令钩子,用于前后处理。

Quick Start / 快速开始

# 1. Install dependencies / 安装依赖
pip install -r requirements.txt

# 2. Copy and customize configuration / 复制并自定义配置
cp conf/examples/parameters_manualforge.yml.example conf/base/parameters_manualforge.yml
cp conf/examples/catalog.yml.example          conf/base/catalog.yml
cp conf/examples/hooks.yml.example            conf/base/hooks.yml
cp conf/examples/parameters.yml.example       conf/base/parameters.yml
cp conf/examples/credentials.yml.example      conf/local/credentials.yml

# 3. Edit the config files to point to your data sources
#    编辑配置文件,指向你的数据源
#    (conf/base/ is gitignored — your real configs stay local)
#    (conf/base/ 已 gitignore — 实际配置保存在本地)

# 4. Run the pipeline / 运行流水线
kedro run

# Run specific node groups / 运行特定节点组
kedro run --tags conversion        # Excel → Parquet only / 仅 Excel → Parquet
kedro run --tags standardization   # Standardization only / 仅标准化
kedro run --tags csv               # Summary tables only / 仅汇总表

Project Structure / 项目结构

├── conf/
│   ├── base/                          # ★ Gitignored — copy from examples/ | 从 examples/ 复制
│   │   ├── parameters_manualforge.yml # Central project configuration | 项目中心配置
│   │   ├── catalog.yml                # Kedro data catalog | 数据目录
│   │   ├── hooks.yml                  # Pipeline hooks (shell commands) | 流水线钩子
│   │   └── parameters.yml             # Pipeline parameters | 流水线参数
│   ├── examples/                      # ★ Tracked example templates | 版本追踪的示例模板
│   │   ├── parameters_manualforge.yml.example
│   │   ├── catalog.yml.example
│   │   ├── hooks.yml.example
│   │   ├── parameters.yml.example
│   │   └── credentials.yml.example
│   ├── local/                         # Local-only (gitignored) | 仅本地 (gitignored)
│   │   └── credentials.yml
│   └── logging.yml
├── data/                              # Gitignored except .gitkeep | 除 .gitkeep 外均 gitignored
│   ├── 01_raw/                        # Raw Excel/CSV + mapping files | 原始数据 + 映射文件
│   ├── 02_intermediate/              # Parquet, reconcile reports | 中间数据、核对报告
│   ├── 03_primary/                   # Standardized data | 标准化后数据
│   ├── 04_feature/                   # Summary tables (CSV + Markdown) | 汇总表
│   └── 08_reporting/                 # Typst sources & compiled PDFs | Typst 源码和 PDF
├── scripts/                          # Auxiliary scripts | 辅助脚本
│   ├── convert_csv_to_md.py          # CSV → Markdown conversion | 转换
│   ├── extract_rule_field_mapping.py # Rule field extraction | 规则字段提取
│   ├── extract_rule_overview.py      # Rule overview extraction | 规则概览提取
│   └── render_with_forge.py          # Markdown → DOCX/PDF rendering | 渲染
├── src/manualforge/                  # Framework source code | 框架源码
│   ├── config.py                     # Configuration helper utilities | 配置工具
│   ├── hooks.py                      # Kedro pipeline hooks | 流水线钩子
│   ├── io/                           # Custom Kedro datasets (PolarsExcelDataset) | 自定义数据集
│   ├── pipelines/                    # Pipeline definitions & node functions | 流水线定义和节点
│   └── settings.py                   # Kedro project settings | 项目设置
├── templates/                        # Jinja2 Typst templates | Jinja2 Typst 模板
│   └── report.typ.j2
├── pyproject.toml                    # Project metadata & dependencies | 项目元数据和依赖
└── requirements.txt

Configuration Guide / 配置指南

The central configuration file is conf/base/parameters_manualforge.yml. Copy from conf/examples/ and customize. 核心配置文件为 conf/base/parameters_manualforge.yml。从 conf/examples/ 复制后进行自定义。

1. Data Sources / 数据源

Define your Excel files, expected headers, and sheet filtering rules. 定义 Excel 文件、期望表头和 Sheet 过滤规则:

datasources:
  primary_data:
    filepath: "data/01_raw/your_data.xlsx"
    sheet:
      exclude_names: ["封面", "封皮"]
      name_becomes_column: "sheet_name"
    header_detection:
      mode: keyword_match
      expected_headers:
        - "column_a"
        - "column_b"
    cleaning:
      drop_rows_where:
        column_a: ["column_a"]   # drop residual header rows | 删除残留表头行
      fill_null: forward
      deduplicate: true

2. Field Standardization / 字段标准化

Define which fields to standardize, their mapping files, and special corrections. 定义需要标准化的字段、映射文件和特殊修正:

standardization:
  fields:
    - name: "dept_name"
      mapping_file: "data/01_raw/dept_list"
      case_corrections:
        wrong_name: "correct_name"
      special_mappings:
        alias: "canonical_name"
      fuzzy:
        enabled: true
        threshold: 0.8
        method: difflib             # difflib | duckdb

3. Sort Orders / 排序

Define reusable sort order lists referenced by summaries. 定义汇总引用的可复用排序列表:

sort_orders:
  model_names:
    - "Model A"
    - "Model B"
  dep_names:
    - "HR"
    - "Finance"

4. Summaries / 汇总

Define what summary tables to generate. 定义要生成的汇总表:

summaries:
  my_summary:
    description: "Fields grouped by model and department"
    group_by: ["model", "department"]
    struct_columns: ["module", "system", "field_name"]
    sort_by:
      department: dep_names
    output:
      csv: "data/04_feature/my_summary.csv"

5. Reports / 报告

Define report templates and output. 定义报告模板和输出:

reports:
  my_report:
    description: "Rules cookbook"
    template_source: inline
    data_source: rules_data
    output_typ: "data/08_reporting/output.typ"
    typst_compile:
      enabled: true

Data Layers / 数据分层

Layer / 层级 Directory / 目录 Description / 说明
Raw / 原始 data/01_raw/ Source Excel/CSV files, mapping files / 源文件与映射文件
Intermediate / 中间 data/02_intermediate/ Parquet, reconcile reports / Parquet 与核对报告
Primary / 主数据 data/03_primary/ Standardized data / 标准化后数据
Feature / 特征 data/04_feature/ Summary tables (CSV + Markdown) / 汇总表
Reporting / 报告 data/08_reporting/ Typst sources & PDF output / Typst 源码与 PDF

Requirements / 环境要求

  • Python >= 3.10
  • ManualForge — core framework for config-driven manual generation / 配置驱动手册生成的核心框架
  • Typst CLI (for PDF compilation / 用于 PDF 编译)

Development / 开发

pip install -e ".[dev]"
ruff check src/
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manualforge-0.1.4.tar.gz (46.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manualforge-0.1.4-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file manualforge-0.1.4.tar.gz.

File metadata

  • Download URL: manualforge-0.1.4.tar.gz
  • Upload date:
  • Size: 46.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for manualforge-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f9526706c403bbdefb423f05b2bd9be06cad3fb742092d4d53816a419fe66519
MD5 7170f14068fcb46d362bbbde1048254c
BLAKE2b-256 03ec707e5690242ea26e78b85ff5abbb13450bcf86c0159f6b5b7b6ec8a6d1bf

See more details on using hashes here.

File details

Details for the file manualforge-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: manualforge-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for manualforge-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 aed7cd21e75262adaa9e11e2c6699105b27449b9bf5daaffaaa40e63004810a5
MD5 22dfee01608994297a642e2d47a1ea24
BLAKE2b-256 1b64d4d284d4146dc801b83ece41361c1ba36dc1dc9704ec9b56d35570322559

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page