Skip to main content

No project description provided

Project description

ManualForge

Configuration-driven management manual generation framework. Define your data sources, fields, and templates in YAML — get a formatted report.

Built on Kedro pipelines with Polars for data processing and Typst for document rendering.

Philosophy

ManualForge separates what you want to produce from how it's produced.

  • What: Defined in conf/base/parameters_manualforge.yml — your data sources, expected columns, standardization rules, sort orders, summary dimensions, and report templates.
  • How: Implemented by the pipeline nodes — reusable data processing functions that read from your config.

To create a new manual for a different domain, you only need to edit the config file (and optionally provide new templates). No Python code changes required.

Features

Capability Description
Multi-sheet Excel ingestion Auto-detect headers, filter cover sheets, merge into structured DataFrames
Field standardization Mapping files + exact matching + fuzzy matching (difflib / duckdb)
Config-driven summaries Define group-by dimensions, sort orders, ability categories, and output paths in YAML
Typst report generation Jinja2 templates → Typst source → PDF compilation
Pipeline hooks Shell command hooks at pipeline/node granularity for pre/post processing

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Copy and customize configuration
cp conf/examples/parameters_manualforge.yml.example conf/base/parameters_manualforge.yml
cp conf/examples/catalog.yml.example          conf/base/catalog.yml
cp conf/examples/hooks.yml.example            conf/base/hooks.yml
cp conf/examples/parameters.yml.example       conf/base/parameters.yml
cp conf/examples/credentials.yml.example      conf/local/credentials.yml

# 3. Edit the config files to point to your data sources
#    (conf/base/ is gitignored — your real configs stay local)

# 4. Run the pipeline
kedro run

# Run specific node groups
kedro run --tags conversion        # Excel → Parquet only
kedro run --tags standardization   # Standardization only
kedro run --tags csv               # Summary tables only

Project Structure

├── conf/
│   ├── base/                          # ★ Gitignored — copy from examples/
│   │   ├── parameters_manualforge.yml # Central project configuration
│   │   ├── catalog.yml                # Kedro data catalog
│   │   ├── hooks.yml                  # Pipeline hooks (shell commands)
│   │   └── parameters.yml             # Pipeline parameters
│   ├── examples/                      # ★ Tracked example templates
│   │   ├── parameters_manualforge.yml.example
│   │   ├── catalog.yml.example
│   │   ├── hooks.yml.example
│   │   ├── parameters.yml.example
│   │   └── credentials.yml.example
│   ├── local/                         # Local-only (gitignored)
│   │   └── credentials.yml
│   └── logging.yml
├── data/                              # Gitignored except .gitkeep
│   ├── 01_raw/                        # Raw Excel/CSV + mapping files
│   ├── 02_intermediate/              # Parquet, reconcile reports
│   ├── 03_primary/                   # Standardized data
│   ├── 04_feature/                   # Summary tables (CSV + Markdown)
│   └── 08_reporting/                 # Typst sources & compiled PDFs
├── scripts/                          # Auxiliary scripts
│   ├── convert_csv_to_md.py          # CSV → Markdown conversion
│   ├── extract_rule_field_mapping.py # Rule field extraction
│   ├── extract_rule_overview.py      # Rule overview extraction
│   └── render_with_forge.py          # Markdown → DOCX/PDF rendering
├── src/manualforge/                  # Framework source code
│   ├── config.py                     # Configuration helper utilities
│   ├── hooks.py                      # Kedro pipeline hooks
│   ├── io/                           # Custom Kedro datasets (PolarsExcelDataset)
│   ├── pipelines/                    # Pipeline definitions & node functions
│   └── settings.py                   # Kedro project settings
├── templates/                        # Jinja2 Typst templates
│   └── report.typ.j2
├── pyproject.toml                    # Project metadata & dependencies
└── requirements.txt

Configuration Guide

The central configuration file is conf/base/parameters_manualforge.yml. Copy from conf/examples/ and customize:

1. Data Sources

Define your Excel files, expected headers, and sheet filtering rules:

datasources:
  primary_data:
    filepath: "data/01_raw/your_data.xlsx"
    sheet:
      exclude_names: ["封面", "封皮"]
      name_becomes_column: "sheet_name"
    header_detection:
      mode: keyword_match
      expected_headers:
        - "column_a"
        - "column_b"
    cleaning:
      drop_rows_where:
        column_a: ["column_a"]   # drop residual header rows
      fill_null: forward
      deduplicate: true

2. Field Standardization

Define which fields to standardize, their mapping files, and special corrections:

standardization:
  fields:
    - name: "dept_name"
      mapping_file: "data/01_raw/dept_list"
      case_corrections:
        wrong_name: "correct_name"
      special_mappings:
        alias: "canonical_name"
      fuzzy:
        enabled: true
        threshold: 0.8
        method: difflib             # difflib | duckdb

3. Sort Orders

Define reusable sort order lists referenced by summaries:

sort_orders:
  model_names:
    - "Model A"
    - "Model B"
  dep_names:
    - "HR"
    - "Finance"

4. Summaries

Define what summary tables to generate:

summaries:
  my_summary:
    description: "Fields grouped by model and department"
    group_by: ["model", "department"]
    struct_columns: ["module", "system", "field_name"]
    sort_by:
      department: dep_names
    output:
      csv: "data/04_feature/my_summary.csv"

5. Reports

Define report templates and output:

reports:
  my_report:
    description: "Rules cookbook"
    template_source: inline
    data_source: rules_data
    output_typ: "data/08_reporting/output.typ"
    typst_compile:
      enabled: true

Data Layers

Layer Directory Description
Raw data/01_raw/ Source Excel/CSV files, mapping files
Intermediate data/02_intermediate/ Parquet, reconcile reports
Primary data/03_primary/ Standardized data
Feature data/04_feature/ Summary tables (CSV + Markdown)
Reporting data/08_reporting/ Typst sources & PDF output

Requirements

  • Python >= 3.10
  • Typst CLI (for PDF compilation)

Development

pip install -e ".[dev]"
ruff check src/
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manualforge-0.1.2.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manualforge-0.1.2-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file manualforge-0.1.2.tar.gz.

File metadata

  • Download URL: manualforge-0.1.2.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for manualforge-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c594070712b220229b445611f5e418947c6e42fc799ae487b025aa7def64de73
MD5 897e7d4d244e5127469e405bba20fa87
BLAKE2b-256 5b48f32c86d8d267bc67c54660e2e476c15dc07eb32375dece1e76135b982716

See more details on using hashes here.

File details

Details for the file manualforge-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: manualforge-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 36.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for manualforge-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2af054120b0718f9735e74b7c31932f01e194fa71bf17447d1a4604c16e1bd5a
MD5 060b2d1558412f4c715e742e9dd67772
BLAKE2b-256 712f796c9c319ecfcf2880a4c53107cbb191b1fcb29d48ab43b986b898f99883

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page