Skip to main content

Local-first, rule-based semantic data reconciliation CLI tool

Project description

Reconlify CLI

Semantic data reconciliation for the command line.

Validate structured datasets using declarative YAML rules and produce deterministic JSON reconciliation reports suitable for CI/CD pipelines.

Fully local. No data leaves your machine.

Typical use cases:

  • ETL validation
  • Data migration verification
  • Financial transaction reconciliation
  • CI pipeline dataset checks
  • Log comparison with normalization rules

Quick Example

source.csv

txn_id,amount
1,100
2,200

target.csv

txn_id,amount
1,100
2,210

config.yaml

type: tabular
source: source.csv
target: target.csv
keys:
  - txn_id
reconlify run config.yaml

Exit code: 1 (differences found). Report highlights:

rows_with_mismatches: 1
missing_in_source:    0
missing_in_target:    0

Why Reconlify

diff / fc Excel (VLOOKUP) Beyond Compare Reconlify CLI
Comparison model Line-by-line text Manual formulas Visual file diff Key-based semantic matching
Key-based matching No Manual No Declarative keys
Missing-row detection No concept of rows Manual Manual inspection Automatic, both directions
Column-level comparison No Manual per-cell Limited Automatic with include/exclude
Normalization rules No No Limited Trim, case, nulls, regex, virtual columns
Deterministic JSON output No No No Yes
CI/CD integration Barely No No Exit codes 0/1/2 + JSON report
Local execution Yes Yes Yes Yes — zero network calls

Features

  • Key-based dataset reconciliation (single or composite keys)
  • Automatic missing-row detection (both directions)
  • Column-level mismatch detection with include/exclude control
  • Numeric tolerance support (per-column absolute tolerance)
  • Normalization rules (trim, case-insensitive, null handling, regex, virtual columns)
  • Row filters and exclusions
  • Deterministic JSON reconciliation reports
  • Machine-readable exit codes (0 / 1 / 2)
  • Two engines: tabular (CSV/TSV) and text (line-by-line / unordered)
  • CI/CD pipeline friendly
  • Fully local execution — no network calls

Performance

Reconlify uses DuckDB-backed tabular processing, streaming text comparison, and a local-first architecture. It processes large datasets locally without requiring a database or external service.

Dataset Rows / Lines Mode Time
CSV reconciliation (exact match) 200k rows tabular ~2 s
CSV reconciliation (high mismatch) 200k rows tabular ~12 s
Log comparison (positional diffs) 500k lines line_by_line ~3 s
Log comparison (unordered) 250k lines unordered_lines < 1 s

Benchmarks were executed on a MacBook (Apple Silicon / Python 3.11) with default fixture settings.

Performance depends on dataset structure, rule complexity, and system hardware.

Full benchmark methodology and results: PERF_TESTING.md

Installation

Requires Python 3.11+.

pip install reconlify-cli

Or with pipx for isolated installs:

pipx install reconlify-cli

Package name on PyPI: reconlify-cli

For development:

git clone https://github.com/testuteab/reconlify-cli.git && cd reconlify-cli
make install          # runs: poetry install
reconlify --help

CLI Usage

reconlify run <config.yaml>                # default output: report.json
reconlify run <config.yaml> --out out.json # custom output path

Options

Option Default Description
--out PATH report.json Output path for the JSON report
--include-line-numbers / --no-include-line-numbers on Include original line numbers in text report samples
--max-line-numbers N 0 (unlimited) Max line numbers per distinct line in unordered mode
--debug-report off Include processed line numbers in text report samples

Exit Codes

Code Meaning
0 No differences found
1 Differences found
2 Error (config validation, file not found, runtime failure)

Version

reconlify --version
# reconlify 0.1.0

Reconlify follows Semantic Versioning: MAJOR.MINOR.PATCH.

Real-World Example: Accounting Transaction Reconciliation

A finance team exports transactions from two systems (ERP and bank ledger) and needs to reconcile them nightly. Here's how Reconlify handles it.

1. Create sample data

cat <<'EOF' > source.csv
txn_id,booking_date,account,amount,currency,counterparty,status,memo
TXN-001,2026-01-15,4100,1500.00,USD,Acme Corp,BOOKED,Invoice 9201
TXN-002,2026-01-16,4100,320.50,EUR,Globex Inc,BOOKED,Wire transfer
TXN-003,2026-01-17,4200,75.00,USD,Jane Doe,BOOKED,Expense report
TXN-004,2026-01-18,4100,10000.00,USD,Initech,CANCELLED,Reversed
TXN-005,2026-01-19,4300,249.99,USD,Umbrella Ltd,BOOKED,Subscription
TXN-006,2026-01-20,4100,,USD,Soylent Corp,BOOKED,Pending allocation
EOF

cat <<'EOF' > target.csv
txn_id,booking_date,account,amount,currency,counterparty,status,memo
TXN-001,2026-01-15,4100,1500.00,USD,  acme corp  ,BOOKED,Invoice 9201
TXN-002,2026-01-16,4100,320.75,EUR,Globex Inc,BOOKED,Wire transfer
TXN-003,2026-01-17,4200,75.00,USD,Jane Doe,BOOKED,Expense report
TXN-005,2026-01-19,4300,249.99,USD,Umbrella Ltd,BOOKED,Subscription
TXN-006,2026-01-20,4100,NULL,USD,Soylent Corp,BOOKED,Pending allocation
TXN-007,2026-01-21,4100,430.00,USD,Wayne Ent,BOOKED,New deposit
EOF

What's different between the two files:

Scenario Row Detail
Matches after normalization TXN-001 counterparty has extra spaces + lowercase in target — should match with trim + case rules
Value mismatch TXN-002 amount is 320.50 vs 320.75 (exceeds tolerance)
Exact match TXN-003 Identical
Filtered out TXN-004 Status CANCELLED — excluded by row filter
Exact match TXN-005 Identical
NULL normalization TXN-006 Source has blank amount, target has NULL string — should match
Missing in source TXN-007 Exists only in target

2. Create the reconciliation config

cat <<'EOF' > recon.yaml
type: tabular
source: source.csv
target: target.csv

keys:
  - txn_id

csv:
  delimiter: ","
  header: true
  encoding: utf-8

compare:
  trim_whitespace: true
  case_insensitive: true
  normalize_nulls: true
  exclude_columns:
    - memo

filters:
  row_filters:
    - column: status
      op: not_equals
      value: CANCELLED
EOF

Key config choices:

  • keys: [txn_id] — match rows by transaction ID
  • trim_whitespace + case_insensitive — ignore formatting noise
  • normalize_nulls — treat blank, NULL, and null as equivalent
  • exclude_columns: [memo] — skip free-text fields
  • row_filters — drop CANCELLED transactions before comparison

3. Run the reconciliation

reconlify run recon.yaml --out report.json
echo $?

Expected exit code: 1 (differences found).

The report will contain:

  • missing_in_target: 0 — TXN-004 was filtered out, so not counted
  • missing_in_source: 1 — TXN-007 exists only in target
  • rows_with_mismatches: 1 — TXN-002 has an amount mismatch (320.50 vs 320.75)

TXN-001 and TXN-006 match cleanly thanks to normalization rules.

The report is written to report.json. It is deterministic — identical inputs and config always produce identical output (except the generated_at timestamp).

Text Engine

Reconlify also compares text files line-by-line or as unordered line sets:

type: text
source: expected.log
target: actual.log
mode: unordered_lines
normalize:
  trim_lines: true
  ignore_blank_lines: true
reconlify run text_recon.yaml

In unordered_lines mode, line order is ignored — Reconlify compares occurrence counts of each distinct line. In line_by_line mode (default), lines are compared positionally.

See the examples/ directory for config samples.

Engines

Tabular Engine

  • Key-based reconciliation — Single or composite keys. Detects missing rows on either side and cell-level value mismatches.
  • Column control — Include or exclude specific columns from comparison.
  • Numeric tolerance — Absolute tolerance per column (e.g. amount: 0.01).
  • String rules — Per-column normalization: trim, case_insensitive, contains, regex_extract.
  • Source-side normalization — Virtual columns via ops: map, concat, substr, add, sub, mul, div, coalesce, date_format, upper, lower, trim, round.
  • Row filters — Exclude specific key values or filter rows by column rules.
  • TSV support — Set csv.delimiter: "\t".

Text Engine

  • Two comparison modes: line_by_line (positional) and unordered_lines (multiset).
  • Normalization: trim_lines, collapse_whitespace, case_insensitive, ignore_blank_lines, normalize_newlines.
  • Regex rules: drop_lines_regex to remove lines, replace_regex to transform lines before comparison.

Report Format

Every run produces a JSON report with a consistent structure:

Section Description
summary Aggregate counts (rows, mismatches, missing). Zero differences = exit code 0
details Metadata: keys used, columns compared, filters applied, per-column mismatch stats
samples Concrete examples of differences (tabular: missing_in_target, missing_in_source, value_mismatches, excluded; text: flat list or samples_agg)
error Present only on exit code 2. Machine-readable code, human-readable message, and details
warnings Optional list of warning strings (e.g. large line-number arrays in unordered mode)

CI Usage

reconlify run recon.yaml --out report.json
rc=$?

if [ $rc -eq 2 ]; then
  echo "ERROR: config or runtime failure" >&2
  exit 1
elif [ $rc -eq 1 ]; then
  echo "WARN: differences found — see report.json" >&2
fi

Exit code 1 means differences were found — your pipeline decides whether that's a warning or a failure. Exit code 2 is always an error.

GitHub Actions

- name: Reconcile data
  run: |
    reconlify run recon.yaml --out report.json
    exit_code=$?
    if [ $exit_code -eq 2 ]; then
      echo "::error::Reconciliation failed with error"
      exit 1
    elif [ $exit_code -eq 1 ]; then
      echo "::warning::Differences found — see report.json"
    fi

- name: Upload report
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: recon-report
    path: report.json

Documentation

Reconlify Desktop

Reconlify Desktop is a graphical interface for Reconlify CLI. It allows users to:

  • Visually build YAML reconciliation configs
  • Run reconciliations without using the terminal
  • Inspect reconciliation reports interactively

Reconlify CLI remains the core reconciliation engine.

Development

make install       # install dependencies
make test          # unit + integration tests (excludes e2e and perf)
make e2e           # end-to-end CLI tests
make lint          # ruff linter
make format        # auto-fix lint + format
make clean         # remove build artifacts and caches

Performance Testing

make perf          # generate fixtures + run full benchmark suite
make perf-smoke    # lightweight perf smoke tests
make perf-clean    # remove generated fixtures

See Performance Testing for details and baseline results.

Changelog

See CHANGELOG.md for release history.


Reconlify sits between simple file diff tools and heavy enterprise reconciliation systems, providing a deterministic, developer-friendly workflow for validating structured data locally.

License

Released under the MIT License. Copyright (c) 2026 Testute AB.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reconlify_cli-0.1.0.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reconlify_cli-0.1.0-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file reconlify_cli-0.1.0.tar.gz.

File metadata

  • Download URL: reconlify_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.11 Darwin/22.5.0

File hashes

Hashes for reconlify_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b1aa3a15627c27a889dbadcfe6239558266087c180a2db62a4c7127e01d5ce14
MD5 cdf8a7e91aef340820c2d764fd5bb178
BLAKE2b-256 fa02ef996d397c72bdf53e99e76135c7f9b4a2b5f3fe6edbcc4eeee30473ce76

See more details on using hashes here.

File details

Details for the file reconlify_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: reconlify_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.11 Darwin/22.5.0

File hashes

Hashes for reconlify_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5b72f8d8b259d41fea9b38ba2b05bbbfabad00fe4a8f0efb2e5f469f556bb816
MD5 3569f4e81a7da8bf9f83283be781aa01
BLAKE2b-256 fe63d664f0b1b31d92a47ea0e6793162eb775b77f0546d0f708296f688c863e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page