Skip to main content

CLI tool for data diffing between two tables

Project description

tablediff

CLI tool for data diffing between two tables:

Screenshot of the tool

Installation

The package is available in PyPi and can be installed by specifying the package name and the adapter for your database.

Currently it was tested with DuckDB and Snowflake, but technically should support all adapters that reladiff supports:

Adapter Command
DuckDB pip install tablediff-cli[duckdb]
Snowflake pip install tablediff-cli[snowflake]

To install all available adapters try:

pip install tablediff-cli[all]

Requires Python 3.10+. Technically can be downported to earlier versions, let me know if you need it.

Usage

Once installed, use command-line to run the diffing process:

tablediff TABLE_A TABLE_B --pk PRIMARY_KEY --conn CONNECTION_STRING [OPTIONS]
  • tables should be in format table_name or schema.table_name or database.schema.table_name
    • for Snowflake use all identifiers in UPPERCASE
  • --pk is the primary key column (should exist in both tables)
  • --conn is the database connection string

Here are a could of examples of connection strings:

  • DuckDB

    duckdb://<file_path>
    
  • Snowflake

    "snowflake://<user>[:<password>]@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<role>[&authenticator=externalbrowser]" 
    

For other databases check docs for reladiff.

Examples

Diffing in DuckDB:

tablediff users_prod users_dev --pk id --conn duckdb://./sample.duckdb

Diffing in Snowflake:

tablediff DEV.MART.USERS PROD.MART.USERS \
  --pk USER_ID \
  --conn "snowflake://..."

Additional flags

--extended

If you pass --extended flag you'll get an extended output that will show you:

  • Common and unique columns in both tables
  • For rows, it will return:
    • 5 id's that are not matching
    • 5 id's that exist only in table A and B

--where

Allows to pass additional WHERE condition that will be applied to both tables:

tablediff table_a table_b \
  --pk id \
  --conn snowflake://... \
  --where "created_at >= CURRENT_DATE - 7 and status = 'active'"

Package development

Clone the repo:

git clone git@github.com:oleg-agapov/tablediff.git

Now setup the local environment (I'm using uv):

# Setup virtual environment
uv sync --extra dev
source .venv/bin/activate

# Run tests
pytest

Generating sample DuckDB for local testing

Use Python script:

python scripts/generate_duckdb_test_data.py \
  --db-path sample.duckdb \
  --prod-rows 23753 \
  --dev-remove-rows 342 \
  --dev-add-rows 30 \
  --dev-null-status-rows 578

And then:

tablediff users_dev users_prod --pk id --conn duckdb://./sample.duckdb

Future roadmap

  • WHERE conditions
  • Add tests
  • Schema-only comparison (with data types)
  • Column-by-column comparison (# of rows that are different)
  • Add pre-commit hooks (check vesion bump?)
  • Add dbt support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablediff_cli-0.3.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablediff_cli-0.3.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file tablediff_cli-0.3.0.tar.gz.

File metadata

  • Download URL: tablediff_cli-0.3.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablediff_cli-0.3.0.tar.gz
Algorithm Hash digest
SHA256 086ebd8cf618644e5336fb87d9bd990c8b60d8c596303d64e59b18c18c3c7a0b
MD5 35a6165520cbeafa4931ed9ebb173122
BLAKE2b-256 f63f797121b294aa7f0afebc566c03cc445808f74e5d796ec02ac9a1cfebcca7

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablediff_cli-0.3.0.tar.gz:

Publisher: release.yml on oleg-agapov/tablediff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablediff_cli-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: tablediff_cli-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablediff_cli-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a3c5d949a5b94eb93cb3231f20036eab8c0576512f31c151454cc0b88cf14558
MD5 52270282022cb546063bf60876f22946
BLAKE2b-256 03ced3d640b044ed8ac502c8b1e91648a2bbccbe8bdb65c19e25fd3ff2eaebc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablediff_cli-0.3.0-py3-none-any.whl:

Publisher: release.yml on oleg-agapov/tablediff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page