Tool to compare tables
Project description
Table Diff
Table Diff is a Python package that provides a text-based interface for comparing two tables. It is designed to be used by data analysts and data scientists to compare two tables and identify differences between them, especially as data is modified in an ETL pipeline.
The diff between two tables is printed to stdout as Markdown, and can be saved to a Markdown file and/or PDF file.
Getting Started
-
Install Python 3.10 or later.
-
Install pipx, a tool to create isolated Python environments for individual packages:
pip install pipx
- Install this package using pipx:
pipx install table-diff[pdf]
# Optionally, install without PDF export support:
pipx install table-diff
- Run the either of the following to compare two tables:
table_diff <old_csv_path> <new_csv_path> -u PrimaryKeyCol1 PrimaryKeyColN
For development environment setup, please refer to the CONTRIBUTING.md
guide.
Running with Docker
Running this tool with Docker is not recommended.
- Clone this repository.
- Build the docker container:
docker build -t table-diff .
- Run the docker container with a volume mount:
docker run -it -v <local_folder_path>:/files table-diff
- Run
table_diff /files/<your_file_name_left.csv/pq> /files/<your_file_name_right.csv/pq> -u PrimaryKeyCol
To run the demo with the sample dataset bundled in this repository, run:
docker build -t table-diff .
docker run -it table-diff
# Inside the container:
table_diff tests/demo_datasets/populations/city-populations_2010.csv tests/demo_datasets/populations/city-populations_2015.csv -u location_id
Contributing
Please submit Bug Reports and Merge Requests to the GitLab project.
Please refer to the CONTRIBUTING.md
file for more details about the contribution policy.
License
This project is licensed using the MIT License. For more information, see the LICENSE file.
Note that this project has been created and modified with the help of Large Language Model (LLM)-based tools like GitHub Copilot and ChatGPT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file table_diff-0.1.2.tar.gz
.
File metadata
- Download URL: table_diff-0.1.2.tar.gz
- Upload date:
- Size: 51.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9eb510aed13d642a6716f074a9b68b6449f81d3df40742ac0c9aa68c3e07eea7 |
|
MD5 | efe016c3c4a3e6ef8ad495fb00e1c93f |
|
BLAKE2b-256 | af67e4c6cab8adcc764a8cac7469887043fba15d1fd067add3a1b9dd72f373cb |
File details
Details for the file table_diff-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: table_diff-0.1.2-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e203feab0dd0cdd9a97d5cc3c4bd7f276c52484ddaef2d42bdb8bf1598a5656e |
|
MD5 | 7974d22dab52e57ca52636120d2121c3 |
|
BLAKE2b-256 | e1d8267a0710c4b8097b0904d3274d4b9cf98eccd7fb58b6f5b1618cdfd6097c |