Skip to main content

Command-line tool and Python library to efficiently diff rows across two different databases.

Project description

Datafold

data-diff: compare datasets fast, within or across SQL databases


Use cases

Data Migration & Replication Testing

Compare source to target and check for discrepancies when moving data between systems:

  • Migrating to a new data warehouse (e.g., Oracle > Snowflake)
  • Converting SQL to a new transformation framework (e.g., stored procedures > dbt)
  • Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift)

Install data-diff with specific database adapters, e.g.:

pip install data-diff 'data-diff[postgresql,snowflake	]' -U

Run data-diff with connection URIs to compare tables:

data-diff \
  postgresql://<username>:'<password>'@localhost:5432/<database> \
  <table> \
  "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
  <TABLE> \
  -k activity_id \
  -c activity \
  -w "event_timestamp < '2022-10-10'"

Check out documentation for full command reference.

Data Development Testing

Test SQL code and preview changes by comparing development/staging environment data to production:

  1. Make a change to some SQL code
  2. Run the SQL code to create a new dataset
  3. Compare the dataset with its production version or another iteration

dbt

data-diff integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets.

:eyes: Watch 4-min demo video

Get started with data-diff & dbt

Reach out on the dbt Slack in #tools-datafold for advice and support

Supported databases

  • PostgreSQL >=10
  • MySQL
  • Snowflake
  • BigQuery
  • Redshift
  • Oracle
  • Presto
  • Databricks
  • Trino
  • Clickhouse
  • Vertica
  • DuckDB >=0.6
  • SQLite (coming soon)

Contributors

We thank everyone who contributed so far!


Analytics


License

This project is licensed under the terms of the MIT License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_diff-0.8.0.tar.gz (98.4 kB view hashes)

Uploaded Source

Built Distribution

data_diff-0.8.0-py3-none-any.whl (130.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page