Skip to main content

Tool to compare tables

Project description

Table Diff

Table Diff is a Python package that provides a text-based interface for comparing two tables. It is designed to be used by data analysts and data scientists to compare two tables and identify differences between them, especially as data is modified in an ETL pipeline.

The diff between two tables is printed to stdout as Markdown, and can be saved to a Markdown file and/or PDF file.

Getting Started

  1. Install Python 3.10 or later.

  2. Install pipx, a tool to create isolated Python environments for individual packages:

pip install pipx
  1. Install this package using pipx:
pipx install table-diff[pdf]

# Optionally, install without PDF export support:
pipx install table-diff
  1. Run the either of the following to compare two tables:
table_diff <old_csv_path> <new_csv_path> -u PrimaryKeyCol1 PrimaryKeyColN

For development environment setup, please refer to the CONTRIBUTING.md guide.

Running with Docker

Running this tool with Docker is not recommended.

  1. Clone this repository.
  2. Build the docker container: docker build -t table-diff .
  3. Run the docker container with a volume mount: docker run -it -v <local_folder_path>:/files table-diff
  4. Run table_diff /files/<your_file_name_left.csv/pq> /files/<your_file_name_right.csv/pq> -u PrimaryKeyCol

To run the demo with the sample dataset bundled in this repository, run:

docker build -t table-diff .
docker run -it table-diff

# Inside the container:
table_diff tests/demo_datasets/populations/city-populations_2010.csv tests/demo_datasets/populations/city-populations_2015.csv -u location_id

Contributing

Please submit Bug Reports and Merge Requests to the GitLab project.

Please refer to the CONTRIBUTING.md file for more details about the contribution policy.

License

This project is licensed using the MIT License. For more information, see the LICENSE file.

Note that this project has been created and modified with the help of Large Language Model (LLM)-based tools like GitHub Copilot and ChatGPT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table_diff-0.1.2.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

table_diff-0.1.2-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file table_diff-0.1.2.tar.gz.

File metadata

  • Download URL: table_diff-0.1.2.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for table_diff-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9eb510aed13d642a6716f074a9b68b6449f81d3df40742ac0c9aa68c3e07eea7
MD5 efe016c3c4a3e6ef8ad495fb00e1c93f
BLAKE2b-256 af67e4c6cab8adcc764a8cac7469887043fba15d1fd067add3a1b9dd72f373cb

See more details on using hashes here.

File details

Details for the file table_diff-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: table_diff-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for table_diff-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e203feab0dd0cdd9a97d5cc3c4bd7f276c52484ddaef2d42bdb8bf1598a5656e
MD5 7974d22dab52e57ca52636120d2121c3
BLAKE2b-256 e1d8267a0710c4b8097b0904d3274d4b9cf98eccd7fb58b6f5b1618cdfd6097c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page