Skip to main content

Finds and summarizes the differences between two datasets.

Project description

deltascan

Package version License

deltascan is a Python package that finds and summarizes the differences between two datasets.

Installation

pip install deltascan

Main Features

The DeltaScan class compares any two supported data structures accross one or more dimensions.

Data Structures:

  • DataFrame
  • Series
  • LazyFrame (Polars only)

Dimensions:

  • Rows → rows present in one dataset but missing in the other, aligned using join_on.
  • Columns → differences in column names and data types.
  • Values → mismatched values within matching rows and columns.

Example Usage

Imports

Import the DeltaScan class.

from deltascan import DeltaScan

Create DataFrames

Create two sample DataFrame objects to compare.

import pandas as pd
import polars as pl
import datetime


# February Data
left_data = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'date': [pd.to_datetime('2026-02-28')] * 4,
    'first_name': ['Alice', 'Mike', 'John', 'Sarah'],
    'flag': [True, False, True, False],
    'amount': [10.0, 5.3, 33.7, 99.3],
    })

# January Data
right_data = pl.DataFrame({
    'id': [1, 3, 9],
    'date': [datetime.date(2026, 1, 31)] * 3,
    'first_name': ['Alice', 'Michael', 'Zachary'],
    'color': ['Pink', 'Blue', 'Red'],
    'last_name': ['Jones', 'Smith', 'Einck'],
    'flag': [False, True, False],
    'amount': [10, None, 14],
    })

Compare DataFrames

Create a DeltaScan instance to perform the comparison. See the in-code documentation for a complete list of available arguments.

ds = DeltaScan(
    left_data=left_data,
    right_data=right_data,
    join_on='id',
    left_alias='feb',
    right_alias='jan',
    left_context=['first_name'],
    right_context=None,
    verbose=True,
    )

Comparison Results

Access the comparison results using the summary and differences attributes.

print(ds.summary)
shape: (8, 6)
┌─────────────────────┬───────────┬─────────────┬─────────┬───────┬──────────────┐
│ Comparison          ┆ Dimension ┆ Differences ┆ Matches ┆ Total ┆ Match Rate % │
│ ---                 ┆ ---       ┆ ---         ┆ ---     ┆ ---   ┆ ---          │
│ str                 ┆ str       ┆ i64         ┆ i64     ┆ i64   ┆ f64          │
╞═════════════════════╪═══════════╪═════════════╪═════════╪═══════╪══════════════╡
│ jan cols not in feb ┆ columns   ┆ 2           ┆ 5       ┆ 7     ┆ 0.714286     │
│ data types          ┆ columns   ┆ 2           ┆ 3       ┆ 5     ┆ 0.6          │
│ feb rows not in jan ┆ rows      ┆ 2           ┆ 2       ┆ 4     ┆ 0.5          │
│ jan rows not in feb ┆ rows      ┆ 1           ┆ 2       ┆ 3     ┆ 0.666667     │
│ amount              ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
│ date                ┆ values    ┆ 2           ┆ 0       ┆ 2     ┆ 0.0          │
│ first_name          ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
│ flag                ┆ values    ┆ 1           ┆ 1       ┆ 2     ┆ 0.5          │
└─────────────────────┴───────────┴─────────────┴─────────┴───────┴──────────────┘

Export to Excel

Export the results to an excel file.

ds.to_excel()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltascan-0.2.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltascan-0.2.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file deltascan-0.2.0.tar.gz.

File metadata

  • Download URL: deltascan-0.2.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.10 Windows/11

File hashes

Hashes for deltascan-0.2.0.tar.gz
Algorithm Hash digest
SHA256 73e7adfb5c47e601bd6c02d39fa13e6574d4f5468ec1c1f85c426817f0cea436
MD5 e2dedb881945484bd6bf878ef4d27dc6
BLAKE2b-256 337618f54473052f16bef93d3908a27f2f3fdbd8b7397174261ede914fe18fae

See more details on using hashes here.

File details

Details for the file deltascan-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: deltascan-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.10 Windows/11

File hashes

Hashes for deltascan-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25ceed902484d3c59c7d3c928560d98670c24b7906222135fe2e4672647b6a24
MD5 6e40ff7ea032dd070e97b0706be53373
BLAKE2b-256 cf440bc49f0982fa178276d28b4f3128c6cf4dab06d222e963d396563946b1ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page