Minimal tool to compare CSV files and generate diff reports
Project description
📊 minimal-csv-diff
A minimal tool to compare CSV files and generate diff reports for data validation.
✨ Features
- 🔍 Compare two CSV files with common column names
- 🎯 Interactive selection of key fields for comparison
- 📋 Generate detailed diff reports when differences are found
- ⚡ Command-line interface for quick data validation
- 🔎 Identifies unique rows and column-level differences
- 📁 Exports results to CSV format for further analysis
🚀 Quick Start
Option 1: Run Instantly (No Installation) ⭐
uvx minimal-csv-diff
Option 2: Install & Run
pip install minimal-csv-diff
minimal-csv-diff
🎮 Try the Demo
Want to see it in action? Check out the demo directory:
cd demo/
minimal-csv-diff
# Follow prompts: select files 0,1 and choose a key column
# See the magic happen! ✨
The demo includes sample CSV files and shows how the tool identifies:
- 🔴 Unique rows (exist in only one file)
- 🟡 Column differences (same record, different values)
- ✅ Matching records (excluded from output)
📖 How It Works
- 📂 Select directory containing your CSV files
- ⚙️ Choose delimiter (comma, semicolon, etc.)
- 📄 Pick two files to compare
- 🔑 Select key columns for row matching
- 📊 Get diff.csv report if differences exist
📤 Output
When differences are found, generates a diff.csv with:
- 🔑 surrogate_key: Concatenated key fields for row identification
- 📁 source: Which file the row comes from
- ❌ failed_columns: Which columns differ or "UNIQUE ROW"
- 📋 All original columns: Complete data for comparison
💡 Use Cases
- 🔄 Data validation between different data sources
- 🔧 ETL pipeline testing - compare before/after transformations
- 🗄️ Database migration verification - ensure data integrity
- 📊 Looker dashboard validation - compare query results across environments
- 🧪 A/B testing data analysis - identify differences in datasets
🛠️ Development
This project uses uv for dependency management.
git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff
📋 Requirements
- Python >= 3.10
- pandas >= 2.0.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minimal_csv_diff-0.3.0.tar.gz.
File metadata
- Download URL: minimal_csv_diff-0.3.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dcce08316dbb65c5565340baafa848d3777f69505567e3fc8b9c78365197bb2
|
|
| MD5 |
bf8626f97e75d5aecd6cf2f2bcf674aa
|
|
| BLAKE2b-256 |
ce21655c2efdf1f441d5bf4fd44c8c6fd1419552f032e00d90fd31c3e2e4a975
|
File details
Details for the file minimal_csv_diff-0.3.0-py3-none-any.whl.
File metadata
- Download URL: minimal_csv_diff-0.3.0-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
201752f4d165d99d0c161817ee456854c4f2343c2ab7bb14dfd9006a8ebd14e0
|
|
| MD5 |
fbae506dd73f8a561df84201e9cc7421
|
|
| BLAKE2b-256 |
8adc0585611e43eb5a9dade7a3a9e713dd20de2409229a1ae8270f94849e43ae
|