Generate event logs of changes between two pandas DataFrames.
Project description
pandas_diff
Generate event logs of row-level changes between two pandas DataFrames.
Not a statistical comparison tool — pandas_diff tells you what changed: which rows were created, deleted, or modified, and exactly which fields changed.
Installation
pip install pandas_diff
# With Parquet support
pip install pandas_diff[parquet]
Quick start
import pandas as pd
from pandas_diff import get_diffs
before = pd.DataFrame([
{"hero": "hulk", "power": "strength"},
{"hero": "black_widow", "power": "spy"},
{"hero": "thor", "hammers": 0},
])
after = pd.DataFrame([
{"hero": "hulk", "power": "smart"},
{"hero": "captain marvel", "power": "strength"},
{"hero": "thor", "hammers": 2},
])
df = get_diffs(before, after, keys="hero")
| operation | object_keys | object_values | attribute_changed | old_value | new_value |
|---|---|---|---|---|---|
| create | [hero] | captain marvel | |||
| delete | [hero] | black_widow | |||
| modify | [hero] | hulk | power | strength | smart |
| modify | [hero] | thor | hammers | 0 | 2 |
CLI
pandas_diff before.csv after.csv --keys id
pandas_diff old.parquet new.parquet --keys name,date --format json
pandas_diff a.csv b.csv --keys id --ignore updated_at -o diff.csv
Supported file formats: CSV, JSON (flat records), Parquet.
Use cases
- Batch to event-driven migration — Detect changes between pipeline runs and stream them to Kafka.
- Audit event logs — Track how resources change over time.
- Data conciliation — Compare a CMDB against the real state of infrastructure.
- Environment sync — Propagate changes between production and disaster recovery.
API
get_diffs(
before: pd.DataFrame, # Previous state
after: pd.DataFrame, # Current state
keys: list[str] | str, # Column(s) identifying each row
ignore_columns: list[str], # Columns to skip (optional)
) -> pd.DataFrame
Returns a DataFrame with columns: operation, object_keys, object_values, object_json, attribute_changed, old_value, new_value.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandas_diff-2.0.0.tar.gz.
File metadata
- Download URL: pandas_diff-2.0.0.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f36ade91e4d40eb13256b8b88abad2e7d2a95088e650adc3c26e220edc6d3c21
|
|
| MD5 |
60b079caeade51ad65a923ae477abab1
|
|
| BLAKE2b-256 |
d1a9a646978431a376a72a9f59f3067b3029ce88b1557dda10a370cacb2e8fb6
|
Provenance
The following attestation bundles were made for pandas_diff-2.0.0.tar.gz:
Publisher:
release.yml on jaimevalero/pandas_diff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pandas_diff-2.0.0.tar.gz -
Subject digest:
f36ade91e4d40eb13256b8b88abad2e7d2a95088e650adc3c26e220edc6d3c21 - Sigstore transparency entry: 1235649565
- Sigstore integration time:
-
Permalink:
jaimevalero/pandas_diff@02fea5ddb744fc2cad269dd767050e9780e786c3 -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/jaimevalero
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@02fea5ddb744fc2cad269dd767050e9780e786c3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pandas_diff-2.0.0-py3-none-any.whl.
File metadata
- Download URL: pandas_diff-2.0.0-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
573e3fadd76fc7b22a6c7add9f5b5d754283aa81586276cb46b3c3c7956aeab0
|
|
| MD5 |
1fa6fc9112f859368c5022ac4aca3085
|
|
| BLAKE2b-256 |
2676b01d997992c145e4d29020752f6016416d4fa32129333deeb1a80a2f4c77
|
Provenance
The following attestation bundles were made for pandas_diff-2.0.0-py3-none-any.whl:
Publisher:
release.yml on jaimevalero/pandas_diff
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pandas_diff-2.0.0-py3-none-any.whl -
Subject digest:
573e3fadd76fc7b22a6c7add9f5b5d754283aa81586276cb46b3c3c7956aeab0 - Sigstore transparency entry: 1235649642
- Sigstore integration time:
-
Permalink:
jaimevalero/pandas_diff@02fea5ddb744fc2cad269dd767050e9780e786c3 -
Branch / Tag:
refs/tags/v2.0.0 - Owner: https://github.com/jaimevalero
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@02fea5ddb744fc2cad269dd767050e9780e786c3 -
Trigger Event:
push
-
Statement type: