Skip to main content

Minimal tool to compare CSV files and generate diff reports

Project description

CI

📊 minimal-csv-diff

A minimal tool to compare CSV files and generate diff reports for data validation.

✨ Features

  • 🔍 Compare two CSV files with common column names
  • 🎯 Interactive selection of key fields for comparison
  • 📋 Generate detailed diff reports when differences are found
  • ⚡ Command-line interface for quick data validation
  • 🔎 Identifies unique rows and column-level differences
  • 📁 Exports results to CSV format for further analysis

🚀 Quick Start

Option 1: Run Instantly (No Installation) ⭐

uvx minimal-csv-diff

Option 2: Install & Run

pip install minimal-csv-diff
minimal-csv-diff

🎮 Try the Demo

Want to see it in action? Check out the demo directory:

cd demo/
minimal-csv-diff
# Follow prompts: select files 0,1 and choose a key column
# See the magic happen! ✨

The demo includes sample CSV files and shows how the tool identifies:

  • 🔴 Unique rows (exist in only one file)
  • 🟡 Column differences (same record, different values)
  • Matching records (excluded from output)

📖 How It Works

  1. 📂 Select directory containing your CSV files
  2. ⚙️ Choose delimiter (comma, semicolon, etc.)
  3. 📄 Pick two files to compare
  4. 🔑 Select key columns for row matching
  5. 📊 Get diff.csv report if differences exist

📤 Output

When differences are found, generates a diff.csv with:

  • 🔑 surrogate_key: Concatenated key fields for row identification
  • 📁 source: Which file the row comes from
  • ❌ failed_columns: Which columns differ or "UNIQUE ROW"
  • 📋 All original columns: Complete data for comparison

💡 Use Cases

  • 🔄 Data validation between different data sources
  • 🔧 ETL pipeline testing - compare before/after transformations
  • 🗄️ Database migration verification - ensure data integrity
  • 📊 Looker dashboard validation - compare query results across environments
  • 🧪 A/B testing data analysis - identify differences in datasets

🛠️ Development

This project uses uv for dependency management.

git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff

📋 Requirements

  • Python >= 3.10
  • pandas >= 2.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimal_csv_diff-0.3.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minimal_csv_diff-0.3.0-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file minimal_csv_diff-0.3.0.tar.gz.

File metadata

  • Download URL: minimal_csv_diff-0.3.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for minimal_csv_diff-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1dcce08316dbb65c5565340baafa848d3777f69505567e3fc8b9c78365197bb2
MD5 bf8626f97e75d5aecd6cf2f2bcf674aa
BLAKE2b-256 ce21655c2efdf1f441d5bf4fd44c8c6fd1419552f032e00d90fd31c3e2e4a975

See more details on using hashes here.

File details

Details for the file minimal_csv_diff-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for minimal_csv_diff-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 201752f4d165d99d0c161817ee456854c4f2343c2ab7bb14dfd9006a8ebd14e0
MD5 fbae506dd73f8a561df84201e9cc7421
BLAKE2b-256 8adc0585611e43eb5a9dade7a3a9e713dd20de2409229a1ae8270f94849e43ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page