Minimal tool to compare CSV files and generate diff reports
Project description
minimal-csv-diff
A minimal tool to compare CSV files and generate diff reports for data validation.
Features
- Compare two CSV files with common column names
- Interactive selection of key fields for comparison
- Generate detailed diff reports when differences are found
- Command-line interface for quick data validation
- Identifies unique rows and column-level differences
- Exports results to CSV format for further analysis
Installation
pip install minimal-csv-diff
Usage
Command Line Interface
Navigate to the directory containing your CSV files and run:
minimal-csv-diff
The tool will guide you through:
- Selecting the working directory
- Choosing file delimiter
- Picking two CSV files to compare
- Selecting columns for the surrogate key
- Generating a diff.csv report if differences exist
With uvx (no installation needed)
Run directly without installing:
uvx minimal-csv-diff
Programmatic Usage
from minimal_csv_diff.main import main
# Run the interactive comparison
main()
Output
When differences are found, the tool generates a diff.csv file with:
- surrogate_key: Concatenated key fields for row identification
- source: Which file the row comes from
- failed_columns: Which columns differ or "UNIQUE ROW" for rows that exist in only one file
- All original columns: Complete data for comparison
Example Workflow
- Place your CSV files in a directory
- Run
minimal-csv-diff - Follow the prompts to select files and key columns
- Review the generated
diff.csvfor validation results
Development
This project uses uv for dependency management.
git clone https://github.com/joon-solutions/looker_data_validation
cd looker_data_validation
uv sync
uv run minimal-csv-diff
Requirements
- Python >= 3.10
- pandas >= 2.0.0
License
MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minimal_csv_diff-0.1.0.tar.gz.
File metadata
- Download URL: minimal_csv_diff-0.1.0.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57dbe0a518214a5cf8e28bbc9980a10c6eefe9b755767433c58f89cd81dbd3f5
|
|
| MD5 |
23db5c89607e454e91036358585ec7ef
|
|
| BLAKE2b-256 |
dd065dc0a997461c730b959c4a31c84ab3dd28b502ae62ca76408aaf55fe8a56
|
File details
Details for the file minimal_csv_diff-0.1.0-py3-none-any.whl.
File metadata
- Download URL: minimal_csv_diff-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36ff251772cad18bfbc4befc63074ba3e7e836380c397bc1cea4c460008d6e18
|
|
| MD5 |
75ee084ce44820c0a919d93e7b78e17f
|
|
| BLAKE2b-256 |
18442e43140ae095ddc56896ca9e14b8582b0d8903c4cebd4e28fd62befb1c0d
|