CLI tool for data diffing between two tables
Project description
tablediff
CLI tool for data diffing between two tables:
Installation
The package is available in PyPi and can be installed by specifying the package name and the adapter for your database.
Currently it was tested with DuckDB and Snowflake, but technically should support all adapters that reladiff supports:
| Adapter | Command |
|---|---|
| DuckDB | pip install tablediff-cli[duckdb] |
| Snowflake | pip install tablediff-cli[snowflake] |
To install all available adapters try:
pip install tablediff-cli[all]
Requires Python 3.10+. Technically can be downported to earlier versions, let me know if you need it.
Usage
Once installed, use command-line to run the diffing process:
tablediff TABLE_A TABLE_B --pk PRIMARY_KEY --conn CONNECTION_STRING [OPTIONS]
- tables should be in format
table_nameorschema.table_nameordatabase.schema.table_name- for Snowflake use all identifiers in UPPERCASE
--pkis the primary key column (should exist in both tables)--connis the database connection string
Here are a could of examples of connection strings:
-
DuckDB
duckdb://<file_path> -
Snowflake
"snowflake://<user>[:<password>]@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<role>[&authenticator=externalbrowser]"
For other databases check docs for reladiff.
Examples
Diffing in DuckDB:
tablediff users_prod users_dev --pk id --conn duckdb://./sample.duckdb
Diffing in Snowflake:
tablediff DEV.MART.USERS PROD.MART.USERS \
--pk USER_ID \
--conn "snowflake://..."
Additional flags
--extended
If you pass --extended flag you'll get an extended output that will show you:
- Common and unique columns in both tables
- For rows, it will return:
- 5 id's that are not matching
- 5 id's that exist only in table A and B
--where
Allows to pass additional WHERE condition that will be applied to both tables:
tablediff table_a table_b \
--pk id \
--conn snowflake://... \
--where "created_at >= CURRENT_DATE - 7 and status = 'active'"
Package development
Clone the repo:
git clone git@github.com:oleg-agapov/tablediff.git
Now setup the local environment (I'm using uv):
# Setup virtual environment
uv sync --extra dev
source .venv/bin/activate
# Run tests
pytest
Generating sample DuckDB for local testing
Use Python script:
python scripts/generate_duckdb_test_data.py \
--db-path sample.duckdb \
--prod-rows 23753 \
--dev-remove-rows 342 \
--dev-add-rows 30 \
--dev-null-status-rows 578
And then:
tablediff users_dev users_prod --pk id --conn duckdb://./sample.duckdb
Future roadmap
- WHERE conditions
- Add tests
- Schema-only comparison (with data types)
- Column-by-column comparison (# of rows that are different)
- Add pre-commit hooks (check vesion bump?)
- Add dbt support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tablediff_cli-0.3.0.tar.gz.
File metadata
- Download URL: tablediff_cli-0.3.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
086ebd8cf618644e5336fb87d9bd990c8b60d8c596303d64e59b18c18c3c7a0b
|
|
| MD5 |
35a6165520cbeafa4931ed9ebb173122
|
|
| BLAKE2b-256 |
f63f797121b294aa7f0afebc566c03cc445808f74e5d796ec02ac9a1cfebcca7
|
Provenance
The following attestation bundles were made for tablediff_cli-0.3.0.tar.gz:
Publisher:
release.yml on oleg-agapov/tablediff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tablediff_cli-0.3.0.tar.gz -
Subject digest:
086ebd8cf618644e5336fb87d9bd990c8b60d8c596303d64e59b18c18c3c7a0b - Sigstore transparency entry: 813320706
- Sigstore integration time:
-
Permalink:
oleg-agapov/tablediff@b453af8e7052a348bca2c5a7dced3f6c9f287fbe -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/oleg-agapov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b453af8e7052a348bca2c5a7dced3f6c9f287fbe -
Trigger Event:
release
-
Statement type:
File details
Details for the file tablediff_cli-0.3.0-py3-none-any.whl.
File metadata
- Download URL: tablediff_cli-0.3.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3c5d949a5b94eb93cb3231f20036eab8c0576512f31c151454cc0b88cf14558
|
|
| MD5 |
52270282022cb546063bf60876f22946
|
|
| BLAKE2b-256 |
03ced3d640b044ed8ac502c8b1e91648a2bbccbe8bdb65c19e25fd3ff2eaebc6
|
Provenance
The following attestation bundles were made for tablediff_cli-0.3.0-py3-none-any.whl:
Publisher:
release.yml on oleg-agapov/tablediff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tablediff_cli-0.3.0-py3-none-any.whl -
Subject digest:
a3c5d949a5b94eb93cb3231f20036eab8c0576512f31c151454cc0b88cf14558 - Sigstore transparency entry: 813320707
- Sigstore integration time:
-
Permalink:
oleg-agapov/tablediff@b453af8e7052a348bca2c5a7dced3f6c9f287fbe -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/oleg-agapov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b453af8e7052a348bca2c5a7dced3f6c9f287fbe -
Trigger Event:
release
-
Statement type: