Command-line tool and Python library to efficiently diff rows across two different databases.
Project description
data-diff: compare datasets fast, within or across SQL databases
Use cases
Data Migration & Replication Testing
Compare source to target and check for discrepancies when moving data between systems:
- Migrating to a new data warehouse (e.g., Oracle > Snowflake)
- Converting SQL to a new transformation framework (e.g., stored procedures > dbt)
- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift)
Install data-diff
with specific database adapters, e.g.:
pip install data-diff 'data-diff[postgresql,snowflake ]' -U
Run data-diff
with connection URIs to compare tables:
data-diff \
postgresql://<username>:'<password>'@localhost:5432/<database> \
<table> \
"snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
<TABLE> \
-k activity_id \
-c activity \
-w "event_timestamp < '2022-10-10'"
Check out documentation for full command reference.
Data Development Testing
Test SQL code and preview changes by comparing development/staging environment data to production:
- Make a change to some SQL code
- Run the SQL code to create a new dataset
- Compare the dataset with its production version or another iteration
data-diff
integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets.
:eyes: Watch 4-min demo video
Get started with data-diff & dbt
Reach out on the dbt Slack in #tools-datafold for advice and support
Supported databases
- PostgreSQL >=10
- MySQL
- Snowflake
- BigQuery
- Redshift
- Oracle
- Presto
- Databricks
- Trino
- Clickhouse
- Vertica
- DuckDB >=0.6
- SQLite (coming soon)
Contributors
We thank everyone who contributed so far!
Analytics
License
This project is licensed under the terms of the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data_diff-0.8.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33611be3ab55c30a5cd373c267a84449b36f69819175abb014674e8b32633a1a |
|
MD5 | 3b0700f587b06b9fa2a8b853046546fa |
|
BLAKE2b-256 | 67c645ca16d7356cb72e114b7931a5dc7c082ff3c6769e22a3bb45a28bc145a2 |