Skip to main content

Finds and summarizes the differences between two datasets

Project description

deltascan

Package version License

deltascan is a Python package that finds and summarizes the differences between two datasets.

Installation

pip install deltascan

Main Features

The DeltaScan class compares any two supported data structures accross one or more dimensions.

Data Structures:

  • DataFrame
  • Series
  • LazyFrame (Polars only)

Dimensions:

  • Rows → rows present in one dataset but missing in the other, aligned using join_on.
  • Columns → differences in column names and data types.
  • Values → mismatched values within matching rows and columns.

Example Usage

Imports

Import the DeltaScan class.

from deltascan import DeltaScan

Create DataFrames

Create two sample DataFrame objects to compare.

import pandas as pd
import polars as pl
import datetime


# February Data
left_data = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'date': [pd.to_datetime('2026-02-28')] * 4,
    'first_name': ['Alice', 'Mike', 'John', 'Sarah'],
    'flag': [True, False, True, False],
    'amount': [10.0, 5.3, 33.7, 99.3],
    })

# January Data
right_data = pl.DataFrame({
    'id': [1, 3, 9],
    'date': [datetime.date(2026, 1, 31)] * 3,
    'first_name': ['Alice', 'Michael', 'Zachary'],
    'color': ['Pink', 'Blue', 'Red'],
    'last_name': ['Jones', 'Smith', 'Einck'],
    'flag': [False, True, False],
    'amount': [10, None, 14],
    })

Compare DataFrames

Create a DeltaScan instance to perform the comparison. See the in-code documentation for a complete list of available arguments.

ds = DeltaScan(
    left_data=left_data,
    right_data=right_data,
    join_on='id',
    left_alias='feb',
    right_alias='jan',
    left_context=['first_name'],
    right_context=None,
    verbose=True,
    )

Comparison Results

Access the comparison results using the summary and differences attributes.

print(ds.summary)
shape: (8, 6)
┌─────────────────────┬───────────┬─────────────┬─────────┬───────┬──────────────┐
│ Comparison          ┆ Dimension ┆ Differences ┆ Matches ┆ Total ┆ Match Rate % │
│ ---                 ┆ ---       ┆ ---         ┆ ---     ┆ ---   ┆ ---          │
│ str                 ┆ str       ┆ i64         ┆ i64     ┆ i64   ┆ f64          │
╞═════════════════════╪═══════════╪═════════════╪═════════╪═══════╪══════════════╡
│ jan cols not in feb ┆ columns   ┆ 2           ┆ 5       ┆ 7     ┆ 0.714286     │
│ data types          ┆ columns   ┆ 2           ┆ 3       ┆ 5     ┆ 0.6          │
│ feb rows not in jan ┆ rows      ┆ 2           ┆ 2       ┆ 4     ┆ 0.5          │
│ jan rows not in feb ┆ rows      ┆ 1           ┆ 2       ┆ 3     ┆ 0.666667     │
│ amount              ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
│ date                ┆ values    ┆ 2           ┆ 0       ┆ 2     ┆ 0.0          │
│ first_name          ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
│ flag                ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
└─────────────────────┴───────────┴─────────────┴─────────┴───────┴──────────────┘

Export to Excel

Export the results to an excel file.

ds.to_excel()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltascan-0.3.0.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltascan-0.3.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file deltascan-0.3.0.tar.gz.

File metadata

  • Download URL: deltascan-0.3.0.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deltascan-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d553c798ce8979dd651fbad1b4a8ec95e2cf15ec9935a430cb7942ea039e2ba1
MD5 fe32d15f84ed9b15883dae981c421513
BLAKE2b-256 4042cf8c637a399754b85a0977d9b01dc36656d4df08c38a1020dfb077a22e60

See more details on using hashes here.

File details

Details for the file deltascan-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: deltascan-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deltascan-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e50df0ee97c5146fdf8cc1d4c965d138b436ef67622350569bf0c6977606a7eb
MD5 7a56639a14c082040f2663d3594e0038
BLAKE2b-256 1a6e8889e8cb9f3cf3ee41135b87df7372d8c52573d18b0a04136c4311585f66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page