Declarative data diff engine for tables, powered by Polars. Output Excel, HTML, or Typst PDF.
Project description
diffino
Declarative data diff engine for tables and documents, powered by Polars. Compare Excel, CSV, Parquet, DuckDB, or DOCX files and generate detailed reports with character-level inline diffs.
Output formats: Excel, HTML, Typst PDF, DOCX track-changes, Changelog.
Supports cumulative changelog generation across multiple versions.
Installation
pip install diffino-cli
Quick Start
- Prepare a config file:
sources:
left:
type: excel
path: data/v0.2.5.xlsx
right:
type: excel
path: data/v0.2.6.xlsx
version: "0.2.6"
compare:
- left_sheet: Sheet1
key_columns:
- ID
ignore_columns:
- Notes
output:
project: 我的项目
formats:
- excel
- changelog
changelog:
split: true
report_dir: ./diffs
- Run diffino:
diffino run config.yaml
- Or generate changelog standalone from saved reports:
diffino changelog generate --input-dir ./diffs --releases releases.yaml --split
Features
- Multi-format sources: Excel, CSV, Parquet, DuckDB, DOCX
- Key-based or fingerprint matching: Compare by composite keys or full-row hashes
- Column preprocessing: Decimal rounding, text normalization, case sensitivity control
- Character-level inline diff: Red strikethrough for deleted text, green bold for inserted text
- DOCX paragraph diff: Diff DOCX body text paragraph-by-paragraph with inline diffs
- Six output formats:
- Excel: Side-by-side old/new rows, yellow-highlighted changed cells with rich-text inline diffs;
final,side_by_side,trackstyles - HTML: Self-contained report with
<del>/<ins>tags and JS filtering - Typst: Typst PDF with cover page, colored tables, character-level inline diffs
- DOCX track-changes: Native Word revision tracking (
<w:ins>/<w:del>) — in-place for DOCX sources, generated table for others - Changelog: Cumulative changelog Typst files (
_summary.typ+_detail.typ) — auto-generated with each run, includes the current report plus all previously saved reports in the same directory
- Excel: Side-by-side old/new rows, yellow-highlighted changed cells with rich-text inline diffs;
- Typst cover page: Configurable project name —
{{PROJECT}}对比报告 - DiffReport persistence: Auto-save JSON reports (
{old}__{new}.json) for changelog accumulation - Version auto-detection: Parses
name-vX.Y.Z.extpatterns, with manual override in YAML - Changelog generation:
diffino changelog generate— version summary table + detailed per-version diffs, with--splitto separate summary/detail into two files - Release date config:
releases.yamlmaps versions to release dates for changelog display - Parallel processing: ThreadPoolExecutor with configurable
max_workersfor multi-sheet comparisons
CLI Commands
diffino run
diffino run config.yaml # Run comparison
Add changelog to formats to generate cumulative changelog Typst files with each run:
output:
formats:
- excel
- changelog # Auto-generates changelog_summary.typ + changelog_detail.typ
changelog:
path: changelog.typ # default
split: true # default: true
summary_keep: 3 # default: 3
max_summary_items: 3 # default: 3
releases: releases.yaml # release dates config
When changelog is in formats, save_report is implied — the DiffReport JSON is always saved.
diffino validate
diffino validate config.yaml # Validate config only
diffino changelog generate
diffino changelog generate # Generate changelog.typ
--input-dir ./diffs # Directory of diff JSON files
--output changelog.typ # Output Typst file
--releases releases.yaml # Release dates config
--summary-keep 3 # Versions shown in summary (default: 3)
--max-summary-items 3 # Max entries per version (default: 3)
--split # Split into _summary.typ + _detail.typ
Source Types
| Type | Key config fields |
|---|---|
excel |
path |
csv |
path |
parquet |
path |
duckdb |
database, query |
docx |
path |
DOCX mode
When sources.*.type is docx, sheets are matched by table caption (exact, fuzzy, or 1-based index). Use content: paragraphs in a compare unit to diff document body text instead of tables:
compare:
# Table diff by caption
- left_sheet: 表1-客户列表
key_columns:
- 客户ID
# Paragraph diff
- content: paragraphs
Configuration Reference
See config.example.yaml for a complete example.
| Section | Field | Description |
|---|---|---|
sources.left/right |
type |
Source type: excel, csv, parquet, duckdb, docx |
sources.left/right |
version |
Manual version override (auto-parsed from filename) |
compare[] |
left_sheet / right_sheet |
Sheet names (or DOCX table captions); right_sheet defaults to left_sheet |
compare[] |
content |
Set to paragraphs for DOCX body text diff |
compare[] |
key_columns |
Column names used for row matching |
compare[] |
fingerprint |
Use full-row hash instead of key columns |
compare[] |
ignore_columns |
Columns to exclude from comparison |
compare[] |
column_rules |
Preprocessing rules (decimal, text) |
compare[] |
label |
Human-readable label for this comparison unit |
output |
project |
Project name for Typst cover (default: 数据) |
output |
title |
Report title (default: 更新说明) |
output |
formats |
List of: excel, html, typst, docx_track, changelog |
output |
save_report |
Persist DiffReport as JSON for changelog |
output |
report_dir |
Directory for saved reports (default: ./diffs) |
output |
max_workers |
Thread pool size (default: 4) |
output |
release_date |
Override release date (ISO format) |
output |
releases |
Inline releases config (alternative to file) |
output.excel |
path |
Output Excel path |
output.excel |
style |
track, final, or side_by_side |
output.html |
path |
Output HTML path |
output.typst |
path |
Output Typst path |
output.typst |
template |
Custom Typst template path |
output.docx_track |
path |
Output DOCX path |
output.changelog |
path |
Output Typst path (default: changelog.typ) |
output.changelog |
split |
Split into _summary.typ + _detail.typ (default: true) |
output.changelog |
summary_keep |
Versions in summary table (default: 3) |
output.changelog |
max_summary_items |
Max entries per version (default: 3) |
output.changelog |
releases |
Path to releases config (default: releases.yaml) |
Column preprocessing rules
column_rules:
- column: 金额
type: decimal
precision: 2
- column: 名称
type: text
normalize_whitespace: true
case_sensitive: false
Releases config (releases.yaml)
Two formats are supported:
With project name (recommended):
name: 穿透监管规则明细表
releases:
- version: "v1.0"
date: 2026-02-13
- version: "v1.1"
date: 2026-05-19
Legacy flat list:
releases:
- version: "0.2.5"
date: 2026-05-19
- version: "0.2.6"
date: 2026-05-20
Releases config can also be specified inline under output.releases in the main config.
Changelog
Via diffino run (recommended)
Add changelog to output.formats. DiffReport JSONs are auto-saved to output.report_dir (default ./diffs) as {old_version}__{new_version}.json. After each run, all JSON reports in the directory are loaded and cumulative changelog_summary.typ + changelog_detail.typ are generated.
output:
formats:
- changelog
changelog:
split: true
Via diffino changelog generate (standalone)
diffino changelog generate --input-dir ./diffs --split
Changelog output
- Summary (
_summary.typ): Version table listing each version, date, and notable change counts - Detail (
_detail.typ): Per-version sections showing every added/deleted/modified row with old/new value inline diffs
Without --split (or split: false in config), the output is a single changelog.typ.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diffino_cli-0.4.1.tar.gz.
File metadata
- Download URL: diffino_cli-0.4.1.tar.gz
- Upload date:
- Size: 33.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
911a5721b3d64f71f4d75f0240614fe1baad095c7d5d3e07e6551ff9fa66c290
|
|
| MD5 |
ee98b826f0c30a7217a1b482d02bda4d
|
|
| BLAKE2b-256 |
d961dd56366806d2e07bcb5f791471def66675301c89bd9e1a72add43a77c893
|
File details
Details for the file diffino_cli-0.4.1-py3-none-any.whl.
File metadata
- Download URL: diffino_cli-0.4.1-py3-none-any.whl
- Upload date:
- Size: 38.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
584797e84a58d44aaea0ebb4575bafd168a9daff0939bc20a3da4f0ab947449f
|
|
| MD5 |
de0283e74a8cd1a9e833883e9469a6d4
|
|
| BLAKE2b-256 |
5a4c5385e92c9e5ddade8bab53346ed4f05df0e863adf343ea02e3cc6428753c
|