Skip to main content

Minimal tool to compare CSV files and generate diff reports

Project description

CI

📊 minimal-csv-diff

A minimal tool to compare CSV files and generate diff reports for data validation.

✨ Features

  • 🔍 Compare two CSV files with common column names
  • 🎯 Interactive selection of key fields for comparison
  • 📋 Generate detailed diff reports when differences are found
  • ⚡ Command-line interface for quick data validation
  • 🔎 Identifies unique rows and column-level differences
  • 📁 Exports results to CSV format for further analysis

🚀 Quick Start

Option 1: Run Instantly (No Installation) ⭐

uvx minimal-csv-diff

Option 2: Install & Run

pip install minimal-csv-diff
minimal-csv-diff

🎮 Try the Demo

Want to see it in action? Check out the demo directory:

cd demo/
minimal-csv-diff
# Follow prompts: select files 0,1 and choose a key column
# See the magic happen! ✨

The demo includes sample CSV files and shows how the tool identifies:

  • 🔴 Unique rows (exist in only one file)
  • 🟡 Column differences (same record, different values)
  • Matching records (excluded from output)

📖 How It Works

  1. 📂 Select directory containing your CSV files
  2. ⚙️ Choose delimiter (comma, semicolon, etc.)
  3. 📄 Pick two files to compare
  4. 🔑 Select key columns for row matching
  5. 📊 Get diff.csv report if differences exist

📤 Output

When differences are found, generates a diff.csv with:

  • 🔑 surrogate_key: Concatenated key fields for row identification
  • 📁 source: Which file the row comes from
  • ❌ failed_columns: Which columns differ or "UNIQUE ROW"
  • 📋 All original columns: Complete data for comparison

💡 Use Cases

  • 🔄 Data validation between different data sources
  • 🔧 ETL pipeline testing - compare before/after transformations
  • 🗄️ Database migration verification - ensure data integrity
  • 📊 Looker dashboard validation - compare query results across environments
  • 🧪 A/B testing data analysis - identify differences in datasets

🛠️ Development

This project uses uv for dependency management.

git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff

📋 Requirements

  • Python >= 3.10
  • pandas >= 2.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimal_csv_diff-0.4.1.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimal_csv_diff-0.4.1-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file minimal_csv_diff-0.4.1.tar.gz.

File metadata

  • Download URL: minimal_csv_diff-0.4.1.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for minimal_csv_diff-0.4.1.tar.gz
Algorithm Hash digest
SHA256 dc8f190a3a44a5767cf80cc93141bd4a7d8d46f7911e646cab558a95dd591173
MD5 1e29c0d649ba1fe35a47501124f17672
BLAKE2b-256 d0dc59e445ee9b18a108b4bfe3ef4a23fe01db4d5612f4331b2475237a0c044c

See more details on using hashes here.

File details

Details for the file minimal_csv_diff-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for minimal_csv_diff-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2516d7731fb32db531f7eade0ca8110e60249044a21e6c755d96ec4af521dc1d
MD5 e0ef8b998ebcc668fb1caddce6d3e62c
BLAKE2b-256 bf6ffc334741044bdfbd29b023951e2f157cd9eee6b76e6e421ccec4af95447f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page