Skip to main content

Minimal tool to compare CSV files and generate diff reports

Project description

📊 minimal-csv-diff

A minimal tool to compare CSV files and generate diff reports for data validation.

✨ Features

  • 🔍 Compare two CSV files with common column names
  • 🎯 Interactive selection of key fields for comparison
  • 📋 Generate detailed diff reports when differences are found
  • ⚡ Command-line interface for quick data validation
  • 🔎 Identifies unique rows and column-level differences
  • 📁 Exports results to CSV format for further analysis

🚀 Quick Start

Option 1: Run Instantly (No Installation) ⭐

uvx minimal-csv-diff

Option 2: Install & Run

pip install minimal-csv-diff
minimal-csv-diff

🎮 Try the Demo

Want to see it in action? Check out the demo directory:

cd demo/
minimal-csv-diff
# Follow prompts: select files 0,1 and choose a key column
# See the magic happen! ✨

The demo includes sample CSV files and shows how the tool identifies:

  • 🔴 Unique rows (exist in only one file)
  • 🟡 Column differences (same record, different values)
  • Matching records (excluded from output)

📖 How It Works

  1. 📂 Select directory containing your CSV files
  2. ⚙️ Choose delimiter (comma, semicolon, etc.)
  3. 📄 Pick two files to compare
  4. 🔑 Select key columns for row matching
  5. 📊 Get diff.csv report if differences exist

📤 Output

When differences are found, generates a diff.csv with:

  • 🔑 surrogate_key: Concatenated key fields for row identification
  • 📁 source: Which file the row comes from
  • ❌ failed_columns: Which columns differ or "UNIQUE ROW"
  • 📋 All original columns: Complete data for comparison

💡 Use Cases

  • 🔄 Data validation between different data sources
  • 🔧 ETL pipeline testing - compare before/after transformations
  • 🗄️ Database migration verification - ensure data integrity
  • 📊 Looker dashboard validation - compare query results across environments
  • 🧪 A/B testing data analysis - identify differences in datasets

🛠️ Development

This project uses uv for dependency management.

git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff

📋 Requirements

  • Python >= 3.10
  • pandas >= 2.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimal_csv_diff-0.2.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimal_csv_diff-0.2.0-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file minimal_csv_diff-0.2.0.tar.gz.

File metadata

  • Download URL: minimal_csv_diff-0.2.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for minimal_csv_diff-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3acff49f5762c3f85ea7f92f31d3cd56146dcdd43c35ce583cc1827ee9efee4b
MD5 588223b00ad9f954b6a834f626d1cb84
BLAKE2b-256 134ac3ee4212d87167447220ee2b01becd5b565d56aa8b94f913cea7f0f8b171

See more details on using hashes here.

File details

Details for the file minimal_csv_diff-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for minimal_csv_diff-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2d9aebc3022dc9a703d7849a71ae5aa087cd6a2336c7eea1c8bd466192b6d4a
MD5 1c0d57b9e4f0642765c2d04925fb1405
BLAKE2b-256 1aadf0d2be4407f1e68114f8e2e498f0f14fa9745ae0041559a890d7afd0df89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page