Skip to main content

Add your description here

Project description

drift-scope analyzes the differences between data sources, i.e. how much has dataset a drifted from dataset b. The framework is engine agnostic. Each engine is required to comply with simple abstract protocols in order to enable the standard reporting.

Engines Supported:

  • Postgres
  • Anything dispatchable to Narwhals

Install with uv pip install drift-scope

Usage

Let's create 2 simple tables and compare them to one another. The fundamental question we're asking is "How much has table2 drifted from table1?"

>>> import duckdb

... with duckdb.connect() as con:
...     con.execute("CREATE TABLE table1 (city VARCHAR, state VARCHAR)")
...     con.execute("INSERT INTO table1 VALUES ('New York', 'NY'), ('Los Angeles', 'CA'), ('Chicago', 'IL')")

...     con.execute("CREATE TABLE table2 (city VARCHAR, state VARCHAR)")
...     con.execute("INSERT INTO table2 VALUES ('New York', 'NY'), ('Phoenix', 'AZ'), ('Philadelphia', 'PA')")

...     sql = SQLComparator(df1="table1", df2="table2", con=con)
...     sql.comp_freq(vars=("city", "state"))

...     comp.compile_report()  # prints reports to console

...     res_pl = pl.from_arrow(sql.data) # pull out the comparison summary

... msg = "n1 is incorrectly calculated"
... assert res_pl.filter(pl.col("city") == "New York").select("n1").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Los Angeles").select("n1").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Chicago").select("n1").item() == 1, msg

... msg = "n2 is incorrectly calculated"
... assert res_pl.filter(pl.col("city") == "New York").select("n2").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Phoenix").select("n2").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Philadelphia").select("n2").item() == 1, msg

... msg = "real_diff is incorrect"
... assert res_pl.filter(pl.col("city") == "New York").select("real_diff").item() == 0, msg
... assert res_pl.filter(pl.col("city") == "Los Angeles").select("real_diff").item() == -1, msg
... assert res_pl.filter(pl.col("city") == "Chicago").select("real_diff").item() == -1, msg
... assert res_pl.filter(pl.col("city") == "Phoenix").select("real_diff").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Philadelphia").select("real_diff").item() == 1, msg

... msg = "abs_diff is incorrect"
... assert res_pl.filter(pl.col("city") == "New York").select("abs_diff").item() == 0, msg
... assert res_pl.filter(pl.col("city") == "Los Angeles").select("abs_diff").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Chicago").select("abs_diff").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Phoenix").select("abs_diff").item() == 1, msg
... assert res_pl.filter(pl.col("city") == "Philadelphia").select("abs_diff").item() == 1, msg

... msg = "pct_diff is incorrect"
... assert res_pl.filter(pl.col("city") == "New York").select("pct_diff").item() == 0.0, msg
... assert res_pl.filter(pl.col("city") == "Los Angeles").select("pct_diff").item() == -1.0, msg
... assert res_pl.filter(pl.col("city") == "Chicago").select("pct_diff").item() == -1.0, msg
... assert res_pl.filter(pl.col("city") == "Phoenix").select("pct_diff").item() == 1.0, msg
... assert res_pl.filter(pl.col("city") == "Philadelphia").select("pct_diff").item() == 1.0, msg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drift_scope-0.1.0.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drift_scope-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file drift_scope-0.1.0.tar.gz.

File metadata

  • Download URL: drift_scope-0.1.0.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.3

File hashes

Hashes for drift_scope-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9e4a4b6e26b7283567c9d37d2077dab4c9c0327ae92dfa08d4bed5826d66662f
MD5 486b8f0bd8c151ebcf52dff7b04da39a
BLAKE2b-256 ed8633e9df394de197552abc353214e0f71b9d4b0098c9b4f29f805bcf476693

See more details on using hashes here.

File details

Details for the file drift_scope-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for drift_scope-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0acbba059e9d79f7d10e3396c708c8a9f89a2af6e81d0bc88edf9b03156595de
MD5 e76605b099a5ddb1d205aefcc516a43d
BLAKE2b-256 e2a0bba716f23b244e6f8f4842574e51336c24ddcd0f2bc5d9058191538603b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page