Skip to main content

CLI tool for data diffing between two tables

Project description

tablediff

CLI tool for data diffing between two tables:

Screenshot

Installation

The package is available in PyPi and can be installed by specifying the package name and the adapter for your database.

Currently it was tested with DuckDB and Snowflake, but should work with all adapters that reladiff supports:

Adapter Command
DuckDB pip install tablediff-cli
Snowflake pip install tablediff-cli[snowflake]
MySQL pip install tablediff-cli[mysql]
PostgreSQL pip install tablediff-cli[postgresql]
BigQuery pip install tablediff-cli[bigquery]
Clickhouse pip install tablediff-cli[clickhouse]
Presto pip install tablediff-cli[presto]
Oracle pip install tablediff-cli[oracle]
Trino pip install tablediff-cli[trino]
Vertica pip install tablediff-cli[vertica]

To install all available adapters try:

pip install tablediff-cli[all]

Requires Python 3.10+. Technically can be downported to earlier versions, let me know if you need it.

Usage

Once installed, use command-line to run the diffing process:

tablediff compare \
  TABLE_A \
  TABLE_B \
  --pk PRIMARY_KEY \
  --conn CONNECTION_STRING [OPTIONS]
  • tables should be in format table_name or schema.table_name or database.schema.table_name
    • for Snowflake use all identifiers in UPPERCASE
  • --pk is the primary key column (should exist in both tables)
  • --conn is the database connection string

Schema-only comparison:

tablediff schema \
  TABLE_A \
  TABLE_B \
  --conn CONNECTION_STRING [--conn2 CONNECTION_STRING]

CSV comparison (same options as for compare):

tablediff files \
  FILE_A \
  FILE_B \
  --pk PRIMARY_KEY [OPTIONS]

Cross-Database Comparison

You can compare tables across different databases using --conn and --conn2:

tablediff compare \
  TABLE_A \
  TABLE_B \
  --pk PRIMARY_KEY \
  --conn CONNECTION_A \
  --conn2 CONNECTION_B [OPTIONS]
  • --conn - connection string for TABLE_A
  • --conn2 - connection string for TABLE_B

Connection strings

Here are a could of examples of connection strings:

  • DuckDB

    duckdb://<file_path>
    
  • Snowflake

    "snowflake://<user>[:<password>]@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<role>[&authenticator=externalbrowser]"
    

For other databases check docs for reladiff.

Examples

Diffing in DuckDB (same database):

tablediff compare \
  users_prod users_dev \
  --pk id --conn duckdb://./sample.duckdb

Diffing across two DuckDB databases:

tablediff compare users users --pk id \
  --conn duckdb://./prod.duckdb \
  --conn2 duckdb://./dev.duckdb

Diffing in Snowflake:

tablediff compare DEV.MART.USERS PROD.MART.USERS \
  --pk USER_ID \
  --conn "snowflake://..."

Cross-database diffing (Snowflake to DuckDB):

tablediff compare PROD.MART.USERS local_users \
  --pk USER_ID \
  --conn "snowflake://..." \
  --conn2 duckdb://./local.duckdb

Additional options

--extended

If you pass --extended flag you'll get an extended output that will show you:

  • Common and unique columns in both tables
  • For rows, it will return:
    • 5 id's that are not matching
    • 5 id's that exist only in table A and B

--where

Allows to pass additional WHERE condition that will be applied to both tables (compare/files):

tablediff compare table_a table_b \
  --pk id \
  --conn snowflake://... \
  --where "created_at >= CURRENT_DATE - 7 and status = 'active'"

Package development

Clone the repo:

git clone git@github.com:oleg-agapov/tablediff.git

Now setup the local environment (I'm using uv):

# Setup virtual environment
uv sync --extra dev
source .venv/bin/activate

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

Pre-commit hooks are configured to run automatically on every commit. To run them manually on all files:

pre-commit run --all-files

Generating sample DuckDB for local testing

Use Python script:

python scripts/generate_duckdb_test_data.py \
  --db-path sample.duckdb \
  --prod-rows 23753 \
  --dev-remove-rows 342 \
  --dev-add-rows 30 \
  --dev-null-status-rows 578

And then:

tablediff compare users_dev users_prod --pk id --conn duckdb://./sample.duckdb

Future roadmap

  • WHERE conditions
  • Add tests
  • Cross-database comparison
  • Schema-only comparison (with data types)
  • Add pre-commit hooks
  • Column-by-column comparison (# of rows that are different)
  • Add dbt support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablediff_cli-0.4.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablediff_cli-0.4.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file tablediff_cli-0.4.0.tar.gz.

File metadata

  • Download URL: tablediff_cli-0.4.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablediff_cli-0.4.0.tar.gz
Algorithm Hash digest
SHA256 958b2a81b5e95c7f07462e802bc02d4a40eb18c32b02e21d47d08149da5f083f
MD5 dfb9271845b8631c345ee9169dabe5a1
BLAKE2b-256 d763f64c9e6a10fef58dbae9a9ba0573904576bd87d040d846c62cc4e56f0002

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablediff_cli-0.4.0.tar.gz:

Publisher: release.yml on oleg-agapov/tablediff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablediff_cli-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: tablediff_cli-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablediff_cli-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c14b9bd5527a1b32cb64153d9a70f9568db1e3947c5ab5b056f55b33fefa6c82
MD5 29c9f0ac2a9e106517598301121c230d
BLAKE2b-256 b6f74f89e4cc43f7471f152e8ba05008065754e0289b7b57dcc530c2845d3b34

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablediff_cli-0.4.0-py3-none-any.whl:

Publisher: release.yml on oleg-agapov/tablediff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page