Skip to main content

Minimal tool to compare CSV files and generate diff reports

Project description

CI

📊 minimal-csv-diff

A minimal tool to compare CSV files and generate diff reports for data validation.

✨ Features

  • 🔍 Compare two CSV files with common column names
  • 🎯 Interactive selection of key fields for comparison
  • 📋 Generate detailed diff reports when differences are found
  • ⚡ Command-line interface for quick data validation
  • 🔎 Identifies unique rows and column-level differences
  • 📁 Exports results to CSV format for further analysis

🚀 Quick Start

Option 1: Run Instantly (No Installation) ⭐

uvx minimal-csv-diff

Option 2: Install & Run

pip install minimal-csv-diff
minimal-csv-diff

🎮 Try the Demo

Want to see it in action? Check out the demo directory:

cd demo/
minimal-csv-diff
# Follow prompts: select files 0,1 and choose a key column
# See the magic happen! ✨

The demo includes sample CSV files and shows how the tool identifies:

  • 🔴 Unique rows (exist in only one file)
  • 🟡 Column differences (same record, different values)
  • Matching records (excluded from output)

📖 How It Works

  1. 📂 Select directory containing your CSV files
  2. ⚙️ Choose delimiter (comma, semicolon, etc.)
  3. 📄 Pick two files to compare
  4. 🔑 Select key columns for row matching
  5. 📊 Get diff.csv report if differences exist

📤 Output

When differences are found, generates a diff.csv with:

  • 🔑 surrogate_key: Concatenated key fields for row identification
  • 📁 source: Which file the row comes from
  • ❌ failed_columns: Which columns differ or "UNIQUE ROW"
  • 📋 All original columns: Complete data for comparison

💡 Use Cases

  • 🔄 Data validation between different data sources
  • 🔧 ETL pipeline testing - compare before/after transformations
  • 🗄️ Database migration verification - ensure data integrity
  • 📊 Looker dashboard validation - compare query results across environments
  • 🧪 A/B testing data analysis - identify differences in datasets

🛠️ Development

This project uses uv for dependency management.

git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff

📋 Requirements

  • Python >= 3.10
  • pandas >= 2.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimal_csv_diff-0.4.0.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimal_csv_diff-0.4.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file minimal_csv_diff-0.4.0.tar.gz.

File metadata

  • Download URL: minimal_csv_diff-0.4.0.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for minimal_csv_diff-0.4.0.tar.gz
Algorithm Hash digest
SHA256 718d301dd5211309d60db08782e4c0b4f938e5e3d1549a6edfb5da3c9d0c5de4
MD5 554416a05fb7833389c9ed10263d512d
BLAKE2b-256 6b3feda0dabc9c0272e40c3cb559fc9a307c945289d15f1048bbc5f717fd6fde

See more details on using hashes here.

File details

Details for the file minimal_csv_diff-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for minimal_csv_diff-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 333f90c583987251565127451219ca7f6a6cea82c8ffeaa184cb41fa1fe5459f
MD5 6942996ca6fd11472666db8fa49a85e8
BLAKE2b-256 5c3383db2e4707b1b519d99358fb5d3c82711752c49ae7eb5be680ad2c82636d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page