Extract xlsx templates with full visual fidelity and render data-driven reports in xlsx and PDF formats.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mindoffwork

These details have not been verified by PyPI

Project description

Mindoff Dataport

Build high-fidelity Excel and PDF reports from reusable .xlsx templates.

Mindoff Dataport turns styled Excel workbooks into reusable report templates, compiles runtime data into a portable ReportBundle, and exports production-ready .xlsx and .pdf outputs while preserving layout, structure, and visual fidelity.

Source: https://github.com/mindoffwork/mindoff-dataport

Key Features

Template-First Report Generation
Turn real Excel workbooks into reusable report templates without rebuilding layouts in code.
Compile Once. Export Natively to XLSX and PDF.
Build reports once and export polished .xlsx and .pdf outputs from the same source with consistent fidelity.
Dataframes Plug Directly Into Templates
Connect dataframe inputs directly to templates so report generation fits naturally into modern data workflows.
Built for Large Exports Without Memory Bloat
Export large datasets with confidence, without turning memory usage into a bottleneck.
Flexible Repeating and Dynamic Sheets
Generate repeated sections and dynamic sheets for customer-wise, region-wise, or report-wise output from a single template system.
Runtime Layout Control Without Template Rework
Fine-tune output layout programmatically without redesigning the original workbook.

Documentation

Purpose
Install
Quick Start
Core Concepts
API Reference
Template Placeholders
Data Contract
Export Options
Dataframe Column Layout
Sizing Options
Supported Styling
Custom Fonts for PDF
ReportBundle Directory
Recipes
Current Scope
License

1. Purpose

Mindoff Dataport is built to turn Excel-based report designs into reusable, data-driven outputs with a template format that is convenient to create, review, and maintain.

Reuse existing Excel report layouts instead of rebuilding them from scratch in code.
Fill those layouts with live business data and keep the final output polished and presentation-ready.
Generate both Excel and PDF from the same report source, so teams do not maintain separate reporting flows.
Scale one template into many outputs, whether that means repeated sections, multiple sheets, or report variants for different audiences.
Support larger exports more reliably as report volume grows.

2. Install

pip install mindoff-dataport

For dataframe support (required when passing Polars DataFrames or LazyFrames):

pip install "mindoff-dataport[polars]"

3. Quick Start

import polars as pl
from mindoff_dataport import mo_dataport

# 1. Extract the template
template = mo_dataport.extract("invoice_template.xlsx")

# 2. Inspect what the template requires
required_inputs = mo_dataport.inputs(template)
# {'Invoice': {'customer_name': 'string', 'invoice_number': 'number', 'line_items': 'dataframe'}}

# 3. Compile: bind data to the template
polars_dataframe = pl.DataFrame(
    {
        "item": ["Widget A", "Widget B"],
        "amount": [125, 275],
    }
)

bundle = mo_dataport.compile(
    template,
    data={
        "Invoice": {
            "customer_name": "Acme Industries",
            "invoice_number": 1024,
            "line_items": polars_dataframe,
        }
    },
)

# 4. Export to XLSX
mo_dataport.export(bundle, "invoice_filled.xlsx")

# 4b. Export to PDF
mo_dataport.export(bundle, "invoice_filled.pdf", format="pdf")

Examples

Clone the repo, install dependencies, then run any example directly:

git clone https://github.com/mindoffwork/mindoff-dataport
cd mindoff-dataport
pip install -e ".[polars]"

python examples/<name>/run.py

Each example folder contains template.xlsx, run.py, and data.parquet (where applicable). Output files are written to examples/<name>/output/ and are not tracked by git.

Example	What it shows
`basic/`	Minimal XLSX + PDF export from a parquet-backed template
`bundle_path/`	Compile to a persistent bundle directory, export later
`dataframe_options/`	Split `dataframe-header` / `dataframe-content` anchors with per-column occupation and alignment
`dataframe_shift/`	`dataframe_shift="both"` — dataframe expands right and down inside repeat blocks
`dynamic_sheets/`	One output sheet per data group using `{{key:sheet-name}}` expansion
`input_discovery/`	Introspect required template inputs before building a payload
`repeat_block/`	One repeat block per customer — per-block scalars and dataframes
`repeat_dataframe_headers/`	`repeat_dataframe_headers=True` — repeat column headers across paginated PDF blocks
`split_workbooks_streaming/`	`max_rows_per_workbook` — split large exports across multiple workbooks
`style_showcase/`	Full style coverage (font, fill, alignment, borders) exported via openpyxl, xlsxwriter, and PDF
`validation_errors/`	How validation errors surface before any file is written
`benchmark/`	Runtime and memory benchmarks vs. raw openpyxl / xlsxwriter / ReportLab

4. Core Concepts

Workflow

.xlsx template  â”€â”€extract()â”€â”€â–º  WorkbookSchema
                                     â”‚
                              compile(schema, data)
                                     â”‚
                                     â–¼
                              ReportBundle (directory)
                              â”œâ”€â”€ manifest.json
                              â”œâ”€â”€ report.json
                              â””â”€â”€ data/*.parquet
                                     â”‚
                            export(bundle, path, format=â€¦)
                                     â”‚
                             â”Œâ”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”
                          .xlsx           .pdf

Import Alias

The recommended entrypoint is:

from mindoff_dataport import mo_dataport

All four public functions are also importable at the top level:

from mindoff_dataport import (
    extract_template,
    get_template_inputs,
    compile_report_bundle,
    export_report_bundle,
)

mo_dataport.extract / mo_dataport.inputs / mo_dataport.compile / mo_dataport.export are short aliases for the same functions.

5. API Reference

Template Extraction API

Reads an .xlsx file and returns a WorkbookSchema containing cell styles, dimensions, merged regions, manual print breaks, and discovered placeholder types.

Usage

schema = extract("template.xlsx")
# or
schema = extract_template("template.xlsx")

Parameter	Type	Required	Description
`path`	`str`	Yes	Path to the `.xlsx` template file

Returns: WorkbookSchema

Input Discovery API

Inspects the schema and returns a sheet-scoped dictionary of all inputs the template requires, keyed by sheet name and then by placeholder key.

Usage

contract = inputs(schema)
# or
contract = get_template_inputs(schema)

Parameter	Type	Required	Description
`schema`	`WorkbookSchema`	Yes	Schema produced by `extract()`

Returns: dict[str, dict[str, str | list]]

Example output:

{
    "Sales Summary": {
        "report_title": "string",
        "generated_on": "date",
        "sales_rows": "dataframe",
    }
}

Bundle Compilation API

Binds runtime data to the template, validates all inputs against the sheet contract, materialises Polars DataFrames / LazyFrames to Parquet, and produces a ReportBundle.

Usage

bundle = compile(
    template=schema,
    data=payload,
    bundle_path="out_bundle",
    dataframe_options=None,
    dataframe_shift="both",
)
# or
bundle = compile_report_bundle(schema, payload)

Parameter	Type	Required	Description
`template`	`WorkbookSchema`	Yes	Schema from `extract()`
`data`	`dict[str, Any]`	Yes	Sheet-scoped payload. See Data Contract
`bundle_path`	`str \| None`	No	If provided, writes the bundle as a directory at this path. Omit for in-memory only
`dataframe_options`	`dict[str, Any] \| None`	No	Per-sheet, per-placeholder dataframe layout overrides. See Dataframe Column Layout
`dataframe_shift`	`str`	No	How normal-sheet template cells/merges move around dataframe output: `"both"`, `"horizontal"`, `"vertical"`, or `"none"`

Returns: ReportBundle

Raises: KeyError if a required placeholder key is missing from the payload.

Bundle Export API

Renders the bundle to a file. Accepts an in-memory ReportBundle or a path to a persisted bundle directory.

Usage

export(bundle, "report.xlsx", format="xlsx")
# or
export_report_bundle("out_bundle", "report.pdf", format="pdf")

Parameter	Type	Required	Default	Description
`bundle_or_path`	`ReportBundle \| str`	Yes	-	In-memory bundle or path to a bundle directory
`output_path`	`str`	Yes	-	Destination file path (`.xlsx` or `.pdf`)
`format`	`str`	No	`"xlsx"`	Output format: `"xlsx"`, `"pdf"`. (`"image"` is reserved; raises `NotImplementedError`)
`**options`	-	No	-	Sizing and format-specific options. See Export Options

Returns: None for "fidelity" XLSX and all PDF exports. list[str] for "streaming" XLSX: one workbook path when no split is needed, or one .zip path when the export is split across workbooks.

6. Template Placeholders

Mark cells in your .xlsx template using the {{key:type}} syntax. The extractor reads these markers and builds the input contract.

{{report_title:string}}
{{invoice_number:number}}
{{generated_on:date}}
{{line_items:dataframe}}
{{line_items:dataframe-header}}
{{line_items:dataframe-content}}
{{reports:repeat-start}}
  ...
{{reports:repeat-end}}

Placeholder Types

Scalar Types

Type	Accepted Python values
`string`	`str`
`number`	`int`, `float`
`int`	`int`
`float`	`float`
`date`	`datetime.date`, `datetime.datetime`
`boolean`	`bool`

The placeholder cell is replaced in-place with the supplied value, inheriting all cell styles from the template.

Dataframe Types

Type	What it writes	Typical use
`dataframe`	Headers on the anchor row, content starting the next row	All-in-one table drop-in
`dataframe-header`	Column headers only, on the anchor row	Styled header row defined separately from content
`dataframe-content`	Data rows only, starting at the anchor row	Content area below a separately-styled header

The anchor cell inherits its style (font, fill, border, alignment) and applies it to all generated cells. Column names become header text.

Streaming note: dataframe-content placeholders support streaming from Parquet. Only one dataframe-content placeholder is allowed per non-repeat sheet in streaming mode.

Manual Page Breaks

Templates may also contain manual Excel print breaks.

row_page_breaks: 1-based template row indexes after which a new printed page begins
column_page_breaks: 1-based template column indexes after which Excel starts a new printed page

These are extracted from Excel's manual print-break metadata, not placeholder syntax.

During compile(), breaks are resolved against the rendered layout after dataframe expansion and dataframe_shift
PDF uses resolved row breaks only, inserting a new PDF page before later rows
XLSX preserves both resolved row and column breaks in fidelity and streaming exports

Repeat Types

Used in pairs to define a block that is rendered once per record in an ordered list payload.

Type	Description
`repeat-start`	Marks the first row of the repeating block (control row, not rendered)
`repeat-end`	Marks the last row of the repeating block (control row, not rendered)

See Repeat Sections for usage.

7. Data Contract

Payloads are sheet-scoped. The top-level key must match the sheet name in the template.

Static Sheet

{
    "Invoice": {
        "customer_name": "Acme Industries",   # string
        "invoice_number": 1024,               # number
        "due_date": datetime.date(2026, 5, 1),# date
        "line_items": polars_dataframe,       # dataframe / LazyFrame
    }
}

Dynamic Sheet Group

When a template sheet name is exactly {{key}}, it becomes a template for multiple output sheets. Pass a dict of output_sheet_name -> payload keyed under that placeholder key.

{
    "region_sheet": {                          # sheet-name placeholder key
        "North Sheet": {                       # â†’ output sheet name
            "region_name": "North",
            "owner": "Alice",
            "sales_rows": north_df,
        },
        "South Sheet": {
            "region_name": "South",
            "owner": "Bob",
            "sales_rows": south_df,
        },
    }
}

Output sheet order follows the payload dict insertion order.

inputs(schema) reports dynamic sheet groups under the same placeholder key:

{
    "region_sheet": {
        "*": {
            "region_name": "string",
            "owner": "string",
            "sales_rows": "dataframe",
        }
    }
}

Repeat Section

{
    "Sheet1": {
        "reports": [                           # key must match repeat-start/end key
            {"customer_name": "Acme", "line_items": acme_df},
            {"customer_name": "Globex", "line_items": globex_df},
        ]
    }
}

Using Polars LazyFrames (Recommended for Large Data)

import polars as pl

rows = pl.scan_parquet("sales.parquet").select(["product", "units", "revenue"])

bundle = mo_dataport.compile(schema, {"Sheet1": {"sales_rows": rows}})

Polars LazyFrame inputs remain disk-backed until export time; rows are never fully materialised in memory.

8. Export Options

All options are passed as keyword arguments to export().

9. Dataframe Column Layout

Use dataframe_options during compile() to control how dataframe columns occupy template columns and to override horizontal alignment per generated column.

The structure is:

dataframe_options = {
    "Sheet Name": {
        "placeholder_key": {
            "columns": {
                "Column Name": {"occupation": 2, "alignment": "left"},
            }
        }
    }
}

For templates that split headers and rows across separate placeholders, configure each placeholder independently:

dataframe_options = {
    "Column Layout": {
        "headers": {
            "columns": {
                "Employee Name": {"occupation": 2, "alignment": "center"},
                "Department": {"occupation": 2, "alignment": "center"},
                "Amount": {"occupation": 1, "alignment": "center"},
            }
        },
        "rows": {
            "columns": {
                "Employee Name": {"occupation": 2, "alignment": "left"},
                "Department": {"occupation": 2, "alignment": "center"},
                "Amount": {"occupation": 1, "alignment": "right"},
            }
        },
    }
}

Rules:

occupation must be a positive integer
alignment must be one of "left", "center", or "right"
Options are keyed by resolved output sheet name, then placeholder key
Unconfigured dataframe columns default to occupation=1 and keep the template cell alignment

Dataframe Collision Shifting

When dataframe output expands into adjacent template space, compile() can move normal-sheet template cells and merged regions out of the dataframe range before XLSX or PDF export.

bundle = mo_dataport.compile(
    schema,
    data,
    dataframe_shift="both",  # "both", "horizontal", "vertical", or "none"
)

Mode	Behavior
`"both"`	Shift right-side cells/merges horizontally and lower cells/merges vertically
`"horizontal"`	Shift only cells/merges to the right of dataframe output
`"vertical"`	Shift only cells/merges below dataframe output
`"none"`	Do not shift; template merges that overlap dataframe output raise `ValueError`

The shift is metadata-only: dataframe rows remain in Parquet, report.json stores compact anchors, and streaming export still reads rows in batches. The same shifted bundle layout is used by XLSX and PDF. Repeat sections keep their stricter merge rules.

See examples/dataframe_shift/xlsx.py and examples/dataframe_shift/pdf.py.

Manual Page Breaks

Excel manual print breaks from the template are extracted into schema metadata and resolved again after compile-time dataframe expansion.

row_page_breaks start a new printed page after the given 1-based template row
column_page_breaks start a new printed page after the given 1-based template column in XLSX output
PDF uses resolved row breaks as manual page boundaries and ignores column breaks

See examples/page_break/xlsx.py and examples/page_break/pdf.py. For opt-in repeated dataframe headers in PDF (including repeat blocks), see examples/repeat_dataframe_headers/xlsx.py and examples/repeat_dataframe_headers/pdf.py.

XLSX Options

Option	Type	Default	Description
`export_mode`	`str`	`"fidelity"`	`"fidelity"`: full in-memory render (supports all features). `"streaming"`: row-by-row write (lower memory, limited features â€” see constraints below)
`column_width_mode`	`str`	schema value	`"fixed"`, `"even"`, or `"hug"`. Overrides the value stored in the template schema
`row_height_mode`	`str`	schema value	`"fixed"`, `"even"`, or `"hug"`. Overrides the value stored in the template schema
`default_column_width`	`float`	schema value	Fallback column width in Excel character units when mode is `"even"` or no width stored
`default_row_height`	`float`	schema value	Fallback row height in points when mode is `"even"` or no height stored
`streaming_chunk_rows`	`int`	`50000`	Number of Parquet rows read per batch during streaming
`max_rows_per_workbook`	`int`	`1048576`	Split output into multiple `.xlsx` parts when this row limit is reached
`auto_delete_bundle`	`bool`	`False`	Delete the bundle directory after a successful export

Streaming mode constraints:

No hug sizing
No merged cells may remain intersecting dataframe-content output rows after compile-time dataframe_shift
Only one dataframe-content placeholder per non-repeat sheet

Split output: When max_rows_per_workbook is exceeded in streaming mode, export() writes workbook parts, bundles them into output.zip, deletes the individual part files, and returns a one-item list[str] containing the zip path.

PDF Options

PDF-specific options are passed as keyword arguments alongside sizing options.

Option	Type	Default	Description
`page_size`	`str`	`"A4"`	Paper size: `"A4"`, `"LETTER"`, or `"LEGAL"`
`orientation`	`str`	`"portrait"`	Page orientation: `"portrait"` or `"landscape"`
`margin`	`float`	`36`	Page margin in points (â‰¥ 0). Applied equally on all four sides
`streaming_chunk_rows`	`int`	`50000`	Rows read per batch for `dataframe-content` and repeat sections
`fonts`	`dict \| None`	`None`	Custom TrueType / OpenType font families. See Custom Fonts for PDF
`repeat_dataframe_headers`	`bool`	`False`	Opt-in: repeat dataframe header rows across later PDF table chunks/pages when matching `dataframe-header` anchors exist
`column_width_mode`	`str`	schema value	Same as XLSX. For sheets with `dataframe-content`, PDF supports `"fixed"` and `"even"` only
`row_height_mode`	`str`	schema value	Same as XLSX. PDF also supports `"hug"` for `dataframe-content` row height
`default_column_width`	`float`	schema value	Same as XLSX
`default_row_height`	`float`	schema value	Same as XLSX

export_mode is ignored for PDF; PDF always paginates automatically.

10. Sizing Options

Sizing modes control how column widths and row heights are computed at render time.

Column Width Modes

Mode	Source	Limitation
`"fixed"`	Reads widths stored in the template schema per column	Requires widths to be set in the template
`"even"`	Applies `default_column_width` uniformly to all columns	Ignores per-column template widths
`"hug"`	Computes width from cell content at render time	Not available in streaming mode

For PDF sheets that render dataframe-content, column_width_mode="hug" is not supported because it would require buffering all rows before sizing.

Row Height Modes

Mode	Source	Limitation
`"fixed"`	Reads heights stored in the template schema per row	Requires heights to be set in the template
`"even"`	Applies `default_row_height` uniformly to all rows	Ignores per-row template heights
`"hug"`	Auto-fits row height to content	Not available in streaming mode

For PDF sheets that render dataframe-content, row_height_mode="hug" is supported and auto-sizes each streamed row chunk.

Width and Height Units

Parameter	Unit	Default in schema
`default_column_width`	Excel character units	`15.0`
`default_row_height`	Points	`15.0`
`margin` (PDF)	Points (1pt = 1/72 inch)	`36`

Kwargs passed to export() override values stored in the template schema.

11. Supported Styling

Styles are defined in the .xlsx template itself. The library extracts them during extract() and reapplies them faithfully at export time. No runtime style configuration is needed.

Font Properties

Property	Values / Range	Notes
`name`	Any font family name	Falls back to Helvetica in PDF if not registered as a custom font
`size`	`float` (points)	Default `11.0`
`bold`	`True` / `False`
`italic`	`True` / `False`
`underline`	`"single"`, `"double"`, `None`	Rendered in PDF via `<u>` markup
`color`	Hex ARGB string or `theme:<index>:<tint>`	PDF falls back to the default Office theme palette for theme colors

Fill Properties

Property	Values	Notes
`bg_color`	Hex ARGB string, `theme:<index>:<tint>`, or None	Solid fills only (`fgColor` in openpyxl)

Patterned fills are not extracted or rendered.

Alignment Properties

Property	Values
`horizontal`	`"left"`, `"center"`, `"right"`, `"centerContinuous"`
`vertical`	`"top"`, `"center"`, `"bottom"`
`wrap_text`	`True` / `False`

In PDF output, newline characters render as line breaks only when wrap_text=True; otherwise they are flattened to spaces.

Border Properties

Each cell has four border sides: top, bottom, left, right. Each side has a style and optional color.

Border Style	Rendered Width (PDF points)
`hair`	0.25
`thin`	0.5
`medium`	1.0
`thick`	1.5
`dashed`	0.75
`dotted`	0.5
`double`	1.25

Borders on merged cells are drawn around the full merged region, not only the anchor cell.

Merged Cells

Merged regions are extracted from the template and preserved in both XLSX and PDF output. During XLSX fidelity export, the full merged region is re-applied. During PDF export, merged cells are rendered as SPAN table commands.

Sheet Gridlines

The template's show_gridlines property is preserved in XLSX output.

12. Custom Fonts for PDF

By default the PDF renderer maps all cell fonts to ReportLab's built-in Helvetica family. To use your own TrueType or OpenType fonts, pass a fonts dict to export().

Shorthand â€” Regular Only

Provide a single file path when you only have a regular weight:

mo_dataport.export(
    bundle,
    "report.pdf",
    format="pdf",
    fonts={
        "Inter": "/path/to/fonts/Inter-Regular.ttf",
    },
)

Any cell whose template font name is "Inter" will use this file. Bold and italic variants fall back to the regular file.

Full Variant Map

Provide a dict with regular, bold, italic, and bold_italic keys to enable distinct variants:

mo_dataport.export(
    bundle,
    "report.pdf",
    format="pdf",
    fonts={
        "Inter": {
            "regular":     "/path/to/fonts/Inter-Regular.ttf",
            "bold":        "/path/to/fonts/Inter-Bold.ttf",
            "italic":      "/path/to/fonts/Inter-Italic.ttf",
            "bold_italic": "/path/to/fonts/Inter-BoldItalic.ttf",
        }
    },
)

Font Config Reference

Key	Required	Description
`regular`	Yes	Path to the regular (normal weight, upright) font file
`bold`	No	Path to the bold variant; falls back to `regular` if absent
`italic`	No	Path to the italic variant; falls back to `regular` if absent
`bold_italic`	No	Path to bold-italic; falls back to `bold` then `regular`

Matching Behaviour

The renderer matches the font.name stored in the template cell against the keys in the fonts dict (case-sensitive). If no match is found, Helvetica is used. Multiple font families can be registered in one call:

fonts={
    "Inter": {...},
    "Roboto Mono": "/path/to/RobotoMono-Regular.ttf",
}

Requirements and Errors

Font files must exist on disk at the time export() is called; a missing file raises ValueError
Each family must supply a regular file; omitting it raises ValueError
Font files are registered with ReportLab once per process; re-registering the same path is a no-op

13. ReportBundle Directory

When bundle_path is passed to compile(), the bundle is persisted as a directory. The same directory can be re-loaded and re-exported without rerunning compile().

report_bundle/
â”œâ”€â”€ manifest.json      # bundle version, inputs, sheet metadata, dataframe sources, capabilities
â”œâ”€â”€ report.json        # resolved scalar cells and dataframe anchor/repeat plans
â””â”€â”€ data/
    â””â”€â”€ *.parquet      # dataframe sources materialised from Polars inputs

report.json stores dataframe anchors (column names, start row/column, style), not the expanded row data. Rows stay in Parquet and are read at export time.

Loading a persisted bundle:

mo_dataport.export("report_bundle/", "output.xlsx")
# or load manually:
from mindoff_dataport import ReportBundle
bundle = ReportBundle.load("report_bundle/")

Setting auto_delete_bundle=True in export() deletes the bundle directory after a successful export.

14. Recipes

Scalar Values + Dataframe Table

import datetime as dt
import polars as pl
from mindoff_dataport import mo_dataport

schema = mo_dataport.extract("template.xlsx")
rows   = pl.scan_parquet("sales.parquet").select(["product", "units", "revenue"])

bundle = mo_dataport.compile(
    schema,
    {
        "Sales Summary": {
            "report_title": "Q1 2026 Sales",
            "generated_on": dt.date(2026, 4, 28),
            "sales_rows":   rows,
        }
    },
)
mo_dataport.export(bundle, "report.xlsx", export_mode="streaming")

Repeat Sections (per-customer invoice blocks)

Template cells:

{{reports:repeat-start}}
Customer: {{customer_name:string}}
{{line_items:dataframe-header}}
{{line_items:dataframe-content}}
{{reports:repeat-end}}

Code:

bundle = mo_dataport.compile(
    schema,
    {
        "Sheet1": {
            "reports": [
                {"customer_name": "Acme",   "line_items": acme_df},
                {"customer_name": "Globex", "line_items": globex_df},
            ]
        }
    },
)
mo_dataport.export(bundle, "combined.xlsx", export_mode="streaming")
mo_dataport.export(bundle, "combined.pdf",  format="pdf")

Repeat section constraints:

One or more non-overlapping sibling vertical sections per sheet
Static rows are allowed before, between, and after sections
Repeat keys must be unique per sheet
Merged cells are supported in fixed/static rows, but not over dataframe-content rows
No nested repeats

Dynamic Sheets (one sheet per region)

bundle = mo_dataport.compile(
    schema,
    {
        "region_sheet": {           # sheet-name placeholder key
            "North Sheet": {"region_name": "North", "owner": "Alice", "sales_rows": north_df},
            "South Sheet": {"region_name": "South", "owner": "Bob",   "sales_rows": south_df},
        }
    },
)
mo_dataport.export(bundle, "regions.xlsx", export_mode="streaming")

Dataframe Column Occupation and Alignment

rows = pl.scan_parquet("data.parquet").select(
    ["Employee Name", "Department", "Amount"]
)

bundle = mo_dataport.compile(
    schema,
    {
        "Column Layout": {
            "report_title": "Dataframe Column Occupation",
            "headers": rows,
            "rows": rows,
        }
    },
    dataframe_options={
        "Column Layout": {
            "headers": {
                "columns": {
                    "Employee Name": {"occupation": 2, "alignment": "center"},
                    "Department": {"occupation": 2, "alignment": "center"},
                    "Amount": {"occupation": 1, "alignment": "center"},
                }
            },
            "rows": {
                "columns": {
                    "Employee Name": {"occupation": 2, "alignment": "left"},
                    "Department": {"occupation": 2, "alignment": "center"},
                    "Amount": {"occupation": 1, "alignment": "right"},
                }
            },
        }
    },
)
mo_dataport.export(bundle, "column_layout.xlsx", export_mode="streaming")
mo_dataport.export(
    bundle,
    "column_layout.pdf",
    format="pdf",
    orientation="portrait",
    row_height_mode="fixed",
)

See examples/dataframe_column_layout/xlsx.py and examples/dataframe_column_layout/pdf.py.

Discover Inputs Before Compiling

schema = mo_dataport.extract("template.xlsx")
import pprint
pprint.pp(mo_dataport.inputs(schema))
# {'Sales Summary': {'report_title': 'string', 'generated_on': 'date', 'sales_rows': 'dataframe'}}

Persist Bundle for Later Re-Export

bundle = mo_dataport.compile(schema, data, bundle_path="saved_bundle")

# Later in a separate process or script:
mo_dataport.export("saved_bundle", "report.xlsx")
mo_dataport.export("saved_bundle", "report.pdf", format="pdf")

Split Large Exports Across Workbooks

outputs = mo_dataport.export(
    bundle,
    "output.xlsx",
    export_mode="streaming",
    max_rows_per_workbook=500_000,  # split when a sheet exceeds this row count
)
# outputs -> list[str] with a single `.zip` path when the export is split

PDF with Custom Fonts and Landscape Layout

mo_dataport.export(
    bundle,
    "report.pdf",
    format="pdf",
    page_size="A4",
    orientation="landscape",
    margin=28,
    fonts={
        "Inter": {
            "regular":     "fonts/Inter-Regular.ttf",
            "bold":        "fonts/Inter-Bold.ttf",
            "italic":      "fonts/Inter-Italic.ttf",
            "bold_italic": "fonts/Inter-BoldItalic.ttf",
        }
    },
)

15. Current Scope

Feature	Status
Template input	`.xlsx`
Canonical intermediate	`ReportBundle` directory
XLSX export (fidelity)	Supported
XLSX export (streaming)	Supported
PDF export	Supported (ReportLab)
Image export	Reserved â€” raises `NotImplementedError` in v1
Nested repeat sections	Not supported in v1
Patterned fills	Not extracted or rendered

16. License

Released under the MIT License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mindoffwork

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.1

May 15, 2026

0.6.0

May 14, 2026

This version

0.5.0

May 12, 2026

0.4.0

May 7, 2026

0.3.0

May 2, 2026

0.2.0

Apr 30, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mindoff_dataport-0.5.0.tar.gz (76.8 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mindoff_dataport-0.5.0-py3-none-any.whl (69.2 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file mindoff_dataport-0.5.0.tar.gz.

File metadata

Download URL: mindoff_dataport-0.5.0.tar.gz
Upload date: May 12, 2026
Size: 76.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mindoff_dataport-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`5582d789c44fbd2ee40791b73ea899e85f7d0aec836c278fb0ac6db28d55708a`
MD5	`3ee8715dae6c67d328de8f44828fdac5`
BLAKE2b-256	`3dde433651b1195ccf36eb87388f2fdf4f4a017d7807e21b5dfa1b211a2abef6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mindoff_dataport-0.5.0.tar.gz:

Publisher: cd.yml on mindoffwork/mindoff-dataport

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mindoff_dataport-0.5.0.tar.gz
- Subject digest: 5582d789c44fbd2ee40791b73ea899e85f7d0aec836c278fb0ac6db28d55708a
- Sigstore transparency entry: 1519764461
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: mindoffwork/mindoff-dataport@b446130dddf6623c069da2233a9afe2a13c6a8be
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/mindoffwork
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@b446130dddf6623c069da2233a9afe2a13c6a8be
- Trigger Event: release

File details

Details for the file mindoff_dataport-0.5.0-py3-none-any.whl.

File metadata

Download URL: mindoff_dataport-0.5.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 69.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mindoff_dataport-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`913771665373095a7dee4859ef19e1b06a9dbda6c6ffacdd31271474d70ceec5`
MD5	`f0f5ec117fb63d55fbedb4308154d742`
BLAKE2b-256	`0c66f36862b5a5b27b824be756adb35bdd07717656b93c668891dbb978ac97eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mindoff_dataport-0.5.0-py3-none-any.whl:

Publisher: cd.yml on mindoffwork/mindoff-dataport

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mindoff_dataport-0.5.0-py3-none-any.whl
- Subject digest: 913771665373095a7dee4859ef19e1b06a9dbda6c6ffacdd31271474d70ceec5
- Sigstore transparency entry: 1519764496
- Sigstore integration time: May 12, 2026
Source repository:
- Permalink: mindoffwork/mindoff-dataport@b446130dddf6623c069da2233a9afe2a13c6a8be
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/mindoffwork
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@b446130dddf6623c069da2233a9afe2a13c6a8be
- Trigger Event: release

mindoff-dataport 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Mindoff Dataport

Key Features

Documentation

Table of Contents

1. Purpose

2. Install

3. Quick Start

Examples

4. Core Concepts

Workflow

Import Alias

5. API Reference

Template Extraction API

Input Discovery API

Bundle Compilation API

Bundle Export API

6. Template Placeholders

Placeholder Types

Scalar Types

Dataframe Types

Manual Page Breaks

Repeat Types

7. Data Contract

Static Sheet

Dynamic Sheet Group

Repeat Section

Using Polars LazyFrames (Recommended for Large Data)

8. Export Options

9. Dataframe Column Layout

Dataframe Collision Shifting

Manual Page Breaks

XLSX Options

PDF Options

10. Sizing Options

Column Width Modes

Row Height Modes

Width and Height Units

11. Supported Styling

Font Properties

Fill Properties

Alignment Properties

Border Properties

Merged Cells

Sheet Gridlines

12. Custom Fonts for PDF

Shorthand â€” Regular Only

Full Variant Map

Font Config Reference

Matching Behaviour

Requirements and Errors

13. ReportBundle Directory

14. Recipes

Scalar Values + Dataframe Table

Repeat Sections (per-customer invoice blocks)

Dynamic Sheets (one sheet per region)

Dataframe Column Occupation and Alignment

Discover Inputs Before Compiling

Persist Bundle for Later Re-Export

Split Large Exports Across Workbooks

PDF with Custom Fonts and Landscape Layout

15. Current Scope

16. License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers