Skip to main content

No project description provided

Project description

ManualForge

Configuration-driven management manual generation framework. Define your data sources, fields, and templates in YAML — get a formatted report.

Built on Kedro pipelines with Polars for data processing and Typst for document rendering.

Philosophy

ManualForge separates what you want to produce from how it's produced.

  • What: Defined in conf/base/parameters_manualforge.yml — your data sources, expected columns, standardization rules, sort orders, summary dimensions, and report templates.
  • How: Implemented by the pipeline nodes — reusable data processing functions that read from your config.

To create a new manual for a different domain, you only need to edit the config file (and optionally provide new templates). No Python code changes required.

Features

Capability Description
Multi-sheet Excel ingestion Auto-detect headers, filter cover sheets, merge into structured DataFrames
Field standardization Mapping files + exact matching + fuzzy matching (difflib / duckdb)
Config-driven summaries Define group-by dimensions, sort orders, ability categories, and output paths in YAML
Typst report generation Jinja2 templates → Typst source → PDF compilation
Pipeline hooks Shell command hooks at pipeline/node granularity for pre/post processing

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Copy and customize configuration
cp conf/examples/parameters_manualforge.yml.example conf/base/parameters_manualforge.yml
cp conf/examples/catalog.yml.example          conf/base/catalog.yml
cp conf/examples/hooks.yml.example            conf/base/hooks.yml
cp conf/examples/parameters.yml.example       conf/base/parameters.yml
cp conf/examples/credentials.yml.example      conf/local/credentials.yml

# 3. Edit the config files to point to your data sources
#    (conf/base/ is gitignored — your real configs stay local)

# 4. Run the pipeline
kedro run

# Run specific node groups
kedro run --tags conversion        # Excel → Parquet only
kedro run --tags standardization   # Standardization only
kedro run --tags csv               # Summary tables only

Project Structure

├── conf/
│   ├── base/                          # ★ Gitignored — copy from examples/
│   │   ├── parameters_manualforge.yml # Central project configuration
│   │   ├── catalog.yml                # Kedro data catalog
│   │   ├── hooks.yml                  # Pipeline hooks (shell commands)
│   │   └── parameters.yml             # Pipeline parameters
│   ├── examples/                      # ★ Tracked example templates
│   │   ├── parameters_manualforge.yml.example
│   │   ├── catalog.yml.example
│   │   ├── hooks.yml.example
│   │   ├── parameters.yml.example
│   │   └── credentials.yml.example
│   ├── local/                         # Local-only (gitignored)
│   │   └── credentials.yml
│   └── logging.yml
├── data/                              # Gitignored except .gitkeep
│   ├── 01_raw/                        # Raw Excel/CSV + mapping files
│   ├── 02_intermediate/              # Parquet, reconcile reports
│   ├── 03_primary/                   # Standardized data
│   ├── 04_feature/                   # Summary tables (CSV + Markdown)
│   └── 08_reporting/                 # Typst sources & compiled PDFs
├── scripts/                          # Auxiliary scripts
│   ├── convert_csv_to_md.py          # CSV → Markdown conversion
│   ├── extract_rule_field_mapping.py # Rule field extraction
│   ├── extract_rule_overview.py      # Rule overview extraction
│   └── render_with_forge.py          # Markdown → DOCX/PDF rendering
├── src/manualforge/                  # Framework source code
│   ├── config.py                     # Configuration helper utilities
│   ├── hooks.py                      # Kedro pipeline hooks
│   ├── io/                           # Custom Kedro datasets (PolarsExcelDataset)
│   ├── pipelines/                    # Pipeline definitions & node functions
│   └── settings.py                   # Kedro project settings
├── templates/                        # Jinja2 Typst templates
│   └── report.typ.j2
├── pyproject.toml                    # Project metadata & dependencies
└── requirements.txt

Configuration Guide

The central configuration file is conf/base/parameters_manualforge.yml. Copy from conf/examples/ and customize:

1. Data Sources

Define your Excel files, expected headers, and sheet filtering rules:

datasources:
  primary_data:
    filepath: "data/01_raw/your_data.xlsx"
    sheet:
      exclude_names: ["封面", "封皮"]
      name_becomes_column: "sheet_name"
    header_detection:
      mode: keyword_match
      expected_headers:
        - "column_a"
        - "column_b"
    cleaning:
      drop_rows_where:
        column_a: ["column_a"]   # drop residual header rows
      fill_null: forward
      deduplicate: true

2. Field Standardization

Define which fields to standardize, their mapping files, and special corrections:

standardization:
  fields:
    - name: "dept_name"
      mapping_file: "data/01_raw/dept_list"
      case_corrections:
        wrong_name: "correct_name"
      special_mappings:
        alias: "canonical_name"
      fuzzy:
        enabled: true
        threshold: 0.8
        method: difflib             # difflib | duckdb

3. Sort Orders

Define reusable sort order lists referenced by summaries:

sort_orders:
  model_names:
    - "Model A"
    - "Model B"
  dep_names:
    - "HR"
    - "Finance"

4. Summaries

Define what summary tables to generate:

summaries:
  my_summary:
    description: "Fields grouped by model and department"
    group_by: ["model", "department"]
    struct_columns: ["module", "system", "field_name"]
    sort_by:
      department: dep_names
    output:
      csv: "data/04_feature/my_summary.csv"

5. Reports

Define report templates and output:

reports:
  my_report:
    description: "Rules cookbook"
    template_source: inline
    data_source: rules_data
    output_typ: "data/08_reporting/output.typ"
    typst_compile:
      enabled: true

Data Layers

Layer Directory Description
Raw data/01_raw/ Source Excel/CSV files, mapping files
Intermediate data/02_intermediate/ Parquet, reconcile reports
Primary data/03_primary/ Standardized data
Feature data/04_feature/ Summary tables (CSV + Markdown)
Reporting data/08_reporting/ Typst sources & PDF output

Requirements

  • Python >= 3.10
  • Typst CLI (for PDF compilation)

Development

pip install -e ".[dev]"
ruff check src/
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manualforge-0.1.1.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manualforge-0.1.1-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file manualforge-0.1.1.tar.gz.

File metadata

  • Download URL: manualforge-0.1.1.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for manualforge-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2125a9764390cad3bd719246b7061889ab0a9f193a537f86ea35ba216b8d1b87
MD5 e8bd66131d9257be304b5c472e055d8d
BLAKE2b-256 35238ca432878cd680149856ee9082f5d3513c8f101d0931da0de9fa5b799d96

See more details on using hashes here.

File details

Details for the file manualforge-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: manualforge-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for manualforge-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 273105713bc29d603ef40cf58ff6815311b0d45b098c001b330307f0699305f5
MD5 789f84c8bf2033ef1f5c945c846bfd87
BLAKE2b-256 446da4ab99bd201cfec48c745698a76b1ab126398dd463de3351ceab396dee47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page