Skip to main content

Finds and summarizes the differences between two datasets.

Project description

deltascan

Package version License

deltascan is a Python package that finds and summarizes the differences between two datasets.

Installation

pip install deltascan

Main Features

The DeltaScan class compares any two supported data structures accross one or more dimensions.

Data Structures:

  • DataFrame
  • Series
  • LazyFrame (Polars only)

Dimensions:

  • Rows → rows present in one dataset but missing in the other, aligned using join_on.
  • Columns → differences in column names and data types.
  • Values → mismatched values within matching rows and columns.

Example Usage

Imports

Import the DeltaScan class.

from deltascan import DeltaScan

Create DataFrames

Create two sample DataFrame objects to compare.

import pandas as pd
import polars as pl
import datetime


# February Data
left_data = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'date': [pd.to_datetime('2026-02-28')] * 4,
    'first_name': ['Alice', 'Mike', 'John', 'Sarah'],
    'flag': [True, False, True, False],
    'amount': [10.0, 5.3, 33.7, 99.3],
    })

# January Data
right_data = pl.DataFrame({
    'id': [1, 3, 9],
    'date': [datetime.date(2026, 1, 31)] * 3,
    'first_name': ['Alice', 'Michael', 'Zachary'],
    'color': ['Pink', 'Blue', 'Red'],
    'last_name': ['Jones', 'Smith', 'Einck'],
    'flag': [False, True, False],
    'amount': [10, None, 14],
    })

Compare DataFrames

Create a DeltaScan instance to perform the comparison. See the in-code documentation for a complete list of available arguments.

ds = DeltaScan(
    left_data=left_data,
    right_data=right_data,
    join_on='id',
    left_alias='feb',
    right_alias='jan',
    left_context=['first_name'],
    right_context=None,
    verbose=True,
    )

Comparison Results

Access the comparison results using the summary and differences attributes.

print(ds.summary)
shape: (8, 6)
┌─────────────────────┬───────────┬─────────────┬─────────┬───────┬──────────────┐
│ Comparison          ┆ Dimension ┆ Differences ┆ Matches ┆ Total ┆ Match Rate % │
│ ---                 ┆ ---       ┆ ---         ┆ ---     ┆ ---   ┆ ---          │
│ str                 ┆ str       ┆ i64         ┆ i64     ┆ i64   ┆ f64          │
╞═════════════════════╪═══════════╪═════════════╪═════════╪═══════╪══════════════╡
│ jan cols not in feb ┆ columns   ┆ 2           ┆ 5       ┆ 7     ┆ 0.714286     │
│ data types          ┆ columns   ┆ 2           ┆ 3       ┆ 5     ┆ 0.6          │
│ feb rows not in jan ┆ rows      ┆ 2           ┆ 2       ┆ 4     ┆ 0.5          │
│ jan rows not in feb ┆ rows      ┆ 1           ┆ 2       ┆ 3     ┆ 0.666667     │
│ amount              ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
│ date                ┆ values    ┆ 2           ┆ 0       ┆ 2     ┆ 0.0          │
│ first_name          ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
│ flag                ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
└─────────────────────┴───────────┴─────────────┴─────────┴───────┴──────────────┘

Export to Excel

Export the results to an excel file.

ds.to_excel()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltascan-0.2.1.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltascan-0.2.1-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file deltascan-0.2.1.tar.gz.

File metadata

  • Download URL: deltascan-0.2.1.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deltascan-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e28ec8c324a00bad7ba6544eb61ff6475a05b0b91fe64b13f593759f70f61381
MD5 fbe6b159be89dd7106d5403bc1399b1e
BLAKE2b-256 38ae53dcf8154edc7cfe1cd711fcf482e9fb3abb284e3c4287b15f9d4a99866c

See more details on using hashes here.

File details

Details for the file deltascan-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: deltascan-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for deltascan-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b834921f8f346962d7f2988c1322db75f620e024ff9c34015f810215e7a0a3cc
MD5 71321f9f41bc7d6de7bbbfb7f49511de
BLAKE2b-256 e329dd1b9aed3c4767e85552100c2ffee0a3931773586d940307a2af8ebb855e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page