Skip to main content

CLI tool for data diffing between two tables

Project description

tablediff

CLI tool for data diffing between two tables:

Screenshot

Installation

The package is available in PyPi and can be installed by specifying the package name and the adapter for your database.

Currently it was tested with DuckDB and Snowflake, but technically should support all adapters that reladiff supports:

Adapter Command
DuckDB pip install tablediff-cli[duckdb]
Snowflake pip install tablediff-cli[snowflake]

To install all available adapters try:

pip install tablediff-cli[all]

Requires Python 3.10+. Technically can be downported to earlier versions, let me know if you need it.

Usage

Once installed, use command-line to run the diffing process:

tablediff TABLE_A TABLE_B --pk PRIMARY_KEY --conn CONNECTION_STRING [OPTIONS]
  • tables should be in format table_name or schema.table_name or database.schema.table_name
    • for Snowflake use all identifiers in UPPERCASE
  • --pk is the primary key column (should exist in both tables)
  • --conn is the database connection string

Here are a could of examples of connection strings:

  • DuckDB

    duckdb://<file_path>
    
  • Snowflake

    "snowflake://<user>[:<password>]@<account>/<database>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<role>[&authenticator=externalbrowser]" 
    

For other databases check docs for reladiff.

Examples

Diffing in DuckDB:

tablediff users_prod users_dev --pk id --conn duckdb://./sample.duckdb

Diffing in Snowflake:

tablediff DEV.MART.USERS PROD.MART.USERS \
  --pk USER_ID \
  --conn "snowflake://..."

Extended output

If you pass --extended you'll get an extended output that will show you:

  • Common and unique columns in both tables
  • For rows, it will return:
    • 5 id's that are not matching
    • 5 id's that exist only in table A and B

Package development

Clone the repo:

git clone git@github.com:oleg-agapov/tablediff.git

Now setup the local environment (I'm using uv):

# Setup virtual environment
uv sync --extra dev
source .venv/bin/activate

# Run tests
pytest

Generating sample DuckDB for local testing

Use Python script:

python scripts/generate_duckdb_test_data.py \
  --db-path sample.duckdb \
  --prod-rows 23753 \
  --dev-remove-rows 342 \
  --dev-add-rows 30 \
  --dev-null-status-rows 578

And then:

tablediff users_dev users_prod --pk id --conn duckdb://./sample.duckdb

Future roadmap

  • Add pre-commit hooks (check vesion bump?)
  • Add tests
  • Add dbt support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablediff_cli-0.2.3.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablediff_cli-0.2.3-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file tablediff_cli-0.2.3.tar.gz.

File metadata

  • Download URL: tablediff_cli-0.2.3.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablediff_cli-0.2.3.tar.gz
Algorithm Hash digest
SHA256 8fa04db5e39e43eefbbcbbcb0a0fa704bd1c791846e15aa33f9f13f8f2945014
MD5 b3a7b4eb5327fd2d8f035e9008f1feb0
BLAKE2b-256 c2182b4d03131127e6e6e52b4192b1954384f06cb99a99510a30846d15bfbcd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablediff_cli-0.2.3.tar.gz:

Publisher: release.yml on oleg-agapov/tablediff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablediff_cli-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: tablediff_cli-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tablediff_cli-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6a9fedc77e0f8e28ee3b21c868d3a65f9713acfb834fbe8063f520ac7852da82
MD5 ed627f8b94a2d5523c401dc8b7dd18fd
BLAKE2b-256 93c01f000e6342c1517fc7432c43938e9b1a1cccb9c23967b3ad871c9b0ca301

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablediff_cli-0.2.3-py3-none-any.whl:

Publisher: release.yml on oleg-agapov/tablediff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page