Finds and summarizes the differences between two datasets
Project description
deltascan
deltascan is a Python package that finds and summarizes the differences between two datasets.
Installation
pip install deltascan
Main Features
The DeltaScan class compares any two supported data structures accross one or more dimensions.
Data Structures:
DataFrameSeriesLazyFrame(Polars only)
Dimensions:
Rows→ rows present in one dataset but missing in the other, aligned usingjoin_on.Columns→ differences in column names and data types.Values→ mismatched values within matching rows and columns.
Example Usage
Imports
Import the DeltaScan class.
from deltascan import DeltaScan
Create DataFrames
Create two sample DataFrame objects to compare.
import pandas as pd
import polars as pl
import datetime
# February Data
left_data = pd.DataFrame({
'id': [1, 2, 3, 4],
'date': [pd.to_datetime('2026-02-28')] * 4,
'first_name': ['Alice', 'Mike', 'John', 'Sarah'],
'flag': [True, False, True, False],
'amount': [10.0, 5.3, 33.7, 99.3],
})
# January Data
right_data = pl.DataFrame({
'id': [1, 3, 9],
'date': [datetime.date(2026, 1, 31)] * 3,
'first_name': ['Alice', 'Michael', 'Zachary'],
'color': ['Pink', 'Blue', 'Red'],
'last_name': ['Jones', 'Smith', 'Einck'],
'flag': [False, True, False],
'amount': [10, None, 14],
})
Compare DataFrames
Create a DeltaScan instance to perform the comparison. See the in-code documentation for a complete list of available arguments.
ds = DeltaScan(
left_data=left_data,
right_data=right_data,
join_on='id',
left_alias='feb',
right_alias='jan',
left_context=['first_name'],
right_context=None,
verbose=True,
)
Comparison Results
Access the comparison results using the summary and differences attributes.
print(ds.summary)
shape: (8, 6)
┌─────────────────────┬───────────┬─────────────┬─────────┬───────┬──────────────┐
│ Comparison ┆ Dimension ┆ Differences ┆ Matches ┆ Total ┆ Match Rate % │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════════════════════╪═══════════╪═════════════╪═════════╪═══════╪══════════════╡
│ jan cols not in feb ┆ columns ┆ 2 ┆ 5 ┆ 7 ┆ 0.714286 │
│ data types ┆ columns ┆ 2 ┆ 3 ┆ 5 ┆ 0.6 │
│ feb rows not in jan ┆ rows ┆ 2 ┆ 2 ┆ 4 ┆ 0.5 │
│ jan rows not in feb ┆ rows ┆ 1 ┆ 2 ┆ 3 ┆ 0.666667 │
│ amount ┆ values ┆ 1 ┆ 1 ┆ 2 ┆ 0.5 │
│ date ┆ values ┆ 2 ┆ 0 ┆ 2 ┆ 0.0 │
│ first_name ┆ values ┆ 1 ┆ 1 ┆ 2 ┆ 0.5 │
│ flag ┆ values ┆ 1 ┆ 1 ┆ 2 ┆ 0.5 │
└─────────────────────┴───────────┴─────────────┴─────────┴───────┴──────────────┘
Export to Excel
Export the results to an excel file.
ds.to_excel()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deltascan-0.3.0.tar.gz.
File metadata
- Download URL: deltascan-0.3.0.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d553c798ce8979dd651fbad1b4a8ec95e2cf15ec9935a430cb7942ea039e2ba1
|
|
| MD5 |
fe32d15f84ed9b15883dae981c421513
|
|
| BLAKE2b-256 |
4042cf8c637a399754b85a0977d9b01dc36656d4df08c38a1020dfb077a22e60
|
File details
Details for the file deltascan-0.3.0-py3-none-any.whl.
File metadata
- Download URL: deltascan-0.3.0-py3-none-any.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e50df0ee97c5146fdf8cc1d4c965d138b436ef67622350569bf0c6977606a7eb
|
|
| MD5 |
7a56639a14c082040f2663d3594e0038
|
|
| BLAKE2b-256 |
1a6e8889e8cb9f3cf3ee41135b87df7372d8c52573d18b0a04136c4311585f66
|