Skip to main content

Declarative data diff engine for tables, powered by Polars. Output Excel, HTML, or Typst PDF.

Project description

diffino

Declarative data diff engine for tables and documents, powered by Polars. Compare Excel, CSV, Parquet, DuckDB, or DOCX files and generate detailed reports with character-level inline diffs.

Output formats: Excel, HTML, Typst PDF, DOCX track-changes, Changelog.

Supports cumulative changelog generation across multiple versions.

Installation

pip install diffino-cli

Quick Start

  1. Prepare a config file:
sources:
  left:
    type: excel
    path: data/v0.2.5.xlsx
  right:
    type: excel
    path: data/v0.2.6.xlsx
    version: "0.2.6"

compare:
  - left_sheet: Sheet1
    key_columns:
      - ID
    ignore_columns:
      - Notes

output:
  project: 我的项目
  formats:
    - excel
    - changelog
  changelog:
    split: true
  report_dir: ./diffs
  1. Run diffino:
diffino run config.yaml
  1. Or generate changelog standalone from saved reports:
diffino changelog generate --input-dir ./diffs --releases releases.yaml --split

Features

  • Multi-format sources: Excel, CSV, Parquet, DuckDB, DOCX
  • Key-based or fingerprint matching: Compare by composite keys or full-row hashes
  • Column preprocessing: Decimal rounding, text normalization, case sensitivity control
  • Character-level inline diff: Red strikethrough for deleted text, green bold for inserted text
  • DOCX paragraph diff: Diff DOCX body text paragraph-by-paragraph with inline diffs
  • Six output formats:
    • Excel: Side-by-side old/new rows, yellow-highlighted changed cells with rich-text inline diffs; final, side_by_side, track styles
    • HTML: Self-contained report with <del>/<ins> tags and JS filtering
    • Typst: Typst PDF with cover page, colored tables, character-level inline diffs
    • DOCX track-changes: Native Word revision tracking (<w:ins>/<w:del>) — in-place for DOCX sources, generated table for others
    • Changelog: Cumulative changelog Typst files (_summary.typ + _detail.typ) — auto-generated with each run, includes the current report plus all previously saved reports in the same directory
  • Typst cover page: Configurable project name — {{PROJECT}}对比报告
  • DiffReport persistence: Auto-save JSON reports ({old}__{new}.json) for changelog accumulation
  • Version auto-detection: Parses name-vX.Y.Z.ext patterns, with manual override in YAML
  • Changelog generation: diffino changelog generate — version summary table + detailed per-version diffs, with --split to separate summary/detail into two files
  • Release date config: releases.yaml maps versions to release dates for changelog display
  • Parallel processing: ThreadPoolExecutor with configurable max_workers for multi-sheet comparisons

CLI Commands

diffino run

diffino run config.yaml               # Run comparison

Add changelog to formats to generate cumulative changelog Typst files with each run:

output:
  formats:
    - excel
    - changelog        # Auto-generates changelog_summary.typ + changelog_detail.typ
  changelog:
    path: changelog.typ     # default
    split: true             # default: true
    summary_keep: 3         # default: 3
    max_summary_items: 3    # default: 3
    releases: releases.yaml # release dates config

When changelog is in formats, save_report is implied — the DiffReport JSON is always saved.

diffino validate

diffino validate config.yaml          # Validate config only

diffino changelog generate

diffino changelog generate                # Generate changelog.typ
  --input-dir ./diffs                     #   Directory of diff JSON files
  --output changelog.typ                  #   Output Typst file
  --releases releases.yaml                #   Release dates config
  --summary-keep 3                        #   Versions shown in summary (default: 3)
  --max-summary-items 3                   #   Max entries per version (default: 3)
  --split                                 #   Split into _summary.typ + _detail.typ

Source Types

Type Key config fields
excel path
csv path
parquet path
duckdb database, query
docx path

DOCX mode

When sources.*.type is docx, sheets are matched by table caption (exact, fuzzy, or 1-based index). Use content: paragraphs in a compare unit to diff document body text instead of tables:

compare:
  # Table diff by caption
  - left_sheet: 表1-客户列表
    key_columns:
      - 客户ID
  # Paragraph diff
  - content: paragraphs

Configuration Reference

See config.example.yaml for a complete example.

Section Field Description
sources.left/right type Source type: excel, csv, parquet, duckdb, docx
sources.left/right version Manual version override (auto-parsed from filename)
compare[] left_sheet / right_sheet Sheet names (or DOCX table captions); right_sheet defaults to left_sheet
compare[] content Set to paragraphs for DOCX body text diff
compare[] key_columns Column names used for row matching
compare[] fingerprint Use full-row hash instead of key columns
compare[] ignore_columns Columns to exclude from comparison
compare[] column_rules Preprocessing rules (decimal, text)
compare[] label Human-readable label for this comparison unit
output project Project name for Typst cover (default: 数据)
output title Report title (default: 更新说明)
output formats List of: excel, html, typst, docx_track, changelog
output save_report Persist DiffReport as JSON for changelog
output report_dir Directory for saved reports (default: ./diffs)
output max_workers Thread pool size (default: 4)
output release_date Override release date (ISO format)
output releases Inline releases config (alternative to file)
output.excel path Output Excel path
output.excel style track, final, or side_by_side
output.html path Output HTML path
output.typst path Output Typst path
output.typst template Custom Typst template path
output.docx_track path Output DOCX path
output.changelog path Output Typst path (default: changelog.typ)
output.changelog split Split into _summary.typ + _detail.typ (default: true)
output.changelog summary_keep Versions in summary table (default: 3)
output.changelog max_summary_items Max entries per version (default: 3)
output.changelog releases Path to releases config (default: releases.yaml)

Column preprocessing rules

column_rules:
  - column: 金额
    type: decimal
    precision: 2
  - column: 名称
    type: text
    normalize_whitespace: true
    case_sensitive: false

Releases config (releases.yaml)

Two formats are supported:

With project name (recommended):

name: 穿透监管规则明细表
releases:
  - version: "v1.0"
    date: 2026-02-13
  - version: "v1.1"
    date: 2026-05-19

Legacy flat list:

releases:
  - version: "0.2.5"
    date: 2026-05-19
  - version: "0.2.6"
    date: 2026-05-20

Releases config can also be specified inline under output.releases in the main config.

Changelog

Via diffino run (recommended)

Add changelog to output.formats. DiffReport JSONs are auto-saved to output.report_dir (default ./diffs) as {old_version}__{new_version}.json. After each run, all JSON reports in the directory are loaded and cumulative changelog_summary.typ + changelog_detail.typ are generated.

output:
  formats:
    - changelog
  changelog:
    split: true

Via diffino changelog generate (standalone)

diffino changelog generate --input-dir ./diffs --split

Changelog output

  • Summary (_summary.typ): Version table listing each version, date, and notable change counts
  • Detail (_detail.typ): Per-version sections showing every added/deleted/modified row with old/new value inline diffs

Without --split (or split: false in config), the output is a single changelog.typ.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffino_cli-0.4.1.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffino_cli-0.4.1-py3-none-any.whl (38.1 kB view details)

Uploaded Python 3

File details

Details for the file diffino_cli-0.4.1.tar.gz.

File metadata

  • Download URL: diffino_cli-0.4.1.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for diffino_cli-0.4.1.tar.gz
Algorithm Hash digest
SHA256 911a5721b3d64f71f4d75f0240614fe1baad095c7d5d3e07e6551ff9fa66c290
MD5 ee98b826f0c30a7217a1b482d02bda4d
BLAKE2b-256 d961dd56366806d2e07bcb5f791471def66675301c89bd9e1a72add43a77c893

See more details on using hashes here.

File details

Details for the file diffino_cli-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: diffino_cli-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 38.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for diffino_cli-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 584797e84a58d44aaea0ebb4575bafd168a9daff0939bc20a3da4f0ab947449f
MD5 de0283e74a8cd1a9e833883e9469a6d4
BLAKE2b-256 5a4c5385e92c9e5ddade8bab53346ed4f05df0e863adf343ea02e3cc6428753c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page