Skip to main content

Minimal tool to compare CSV files and generate diff reports

Project description

minimal-csv-diff

A minimal tool to compare CSV files and generate diff reports for data validation.

Features

  • Compare two CSV files with common column names
  • Interactive selection of key fields for comparison
  • Generate detailed diff reports when differences are found
  • Command-line interface for quick data validation
  • Identifies unique rows and column-level differences
  • Exports results to CSV format for further analysis

Installation

pip install minimal-csv-diff

Usage

Command Line Interface

Navigate to the directory containing your CSV files and run:

minimal-csv-diff

The tool will guide you through:

  1. Selecting the working directory
  2. Choosing file delimiter
  3. Picking two CSV files to compare
  4. Selecting columns for the surrogate key
  5. Generating a diff.csv report if differences exist

With uvx (no installation needed)

Run directly without installing:

uvx minimal-csv-diff

Programmatic Usage

from minimal_csv_diff.main import main

# Run the interactive comparison
main()

Output

When differences are found, the tool generates a diff.csv file with:

  • surrogate_key: Concatenated key fields for row identification
  • source: Which file the row comes from
  • failed_columns: Which columns differ or "UNIQUE ROW" for rows that exist in only one file
  • All original columns: Complete data for comparison

Example Workflow

  1. Place your CSV files in a directory
  2. Run minimal-csv-diff
  3. Follow the prompts to select files and key columns
  4. Review the generated diff.csv for validation results

Development

This project uses uv for dependency management.

git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff

Requirements

  • Python >= 3.10
  • pandas >= 2.0.0

License

MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimal_csv_diff-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimal_csv_diff-0.1.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file minimal_csv_diff-0.1.0.tar.gz.

File metadata

  • Download URL: minimal_csv_diff-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for minimal_csv_diff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 57dbe0a518214a5cf8e28bbc9980a10c6eefe9b755767433c58f89cd81dbd3f5
MD5 23db5c89607e454e91036358585ec7ef
BLAKE2b-256 dd065dc0a997461c730b959c4a31c84ab3dd28b502ae62ca76408aaf55fe8a56

See more details on using hashes here.

File details

Details for the file minimal_csv_diff-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for minimal_csv_diff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 36ff251772cad18bfbc4befc63074ba3e7e836380c397bc1cea4c460008d6e18
MD5 75ee084ce44820c0a919d93e7b78e17f
BLAKE2b-256 18442e43140ae095ddc56896ca9e14b8582b0d8903c4cebd4e28fd62befb1c0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page