Skip to main content

Zero-config data quality monitoring and drift detection for pandas DataFrames, with optional Claude AI diagnosis and hosted dashboard sync.

Project description

DataSentinel

Zero-config data quality monitoring for pandas DataFrames. Catches drift, anomalies, and silent data breakage — locally, in seconds, no setup required.

pip install datasentinel

Quick start (fully local — no account needed)

from datasentinel import DataSentinel
import pandas as pd

df = pd.read_csv("orders.csv")

ds = DataSentinel()
report = ds.check(df)
print(report)
DataSentinel Report
  500 rows x 11 columns
  Overall: NONE

  First run — baseline established. Run check() again later to detect drift.

Run it again tomorrow on a new export of the same data, and DataSentinel compares it against the cached baseline automatically:

df_tomorrow = pd.read_csv("orders_tomorrow.csv")
report = ds.check(df_tomorrow)
print(report)
DataSentinel Report
  512 rows x 11 columns
  Overall: HIGH

  Flagged columns:
    [HIGH] discount_pct
      - Distribution shifted (PSI=0.342)
    [MEDIUM] country
      - Distinct value count changed from 7 to 11

With a hosted account (history, scheduling, Slack alerts, AI diagnosis)

ds = DataSentinel(api_key="ds_...")
report = ds.check(df, pipeline_name="Orders")  # profiles locally AND syncs to your dashboard

When synced, report.diagnosis contains a plain-English root-cause explanation generated by Claude, and report.pipeline_url links straight to the dashboard.

Get an API key by creating a free account at datasentinel-eight.vercel.app.

Connecting a live database (hosted only)

ds = DataSentinel(api_key="ds_...")
pipeline = ds.connect_postgres(
    dsn="postgresql://user:pass@host:5432/db",
    table="orders",
    pipeline_name="Orders Pipeline",
)
result = ds.run_pipeline(pipeline["id"])

This registers a scheduled pipeline identical to one created from the dashboard — it'll run automatically on its configured interval and alert you via Slack when something breaks.

What it checks

  • Null rate drift — sudden spikes or drops in missing data
  • Distribution drift (PSI) — numeric and categorical distributions shifting over time
  • Cardinality drift — new or disappearing categories
  • Volume drift — unexpected row count changes

Why DataSentinel

Most data quality tools are either too simple (just schema checks) or too heavy (enterprise platforms requiring a deployment team). DataSentinel sits in between: zero config to start, statistically rigorous under the hood, and — when synced — explains why something broke in plain English instead of just flagging that it did.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasentinel_saxon-0.1.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasentinel_saxon-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file datasentinel_saxon-0.1.0.tar.gz.

File metadata

  • Download URL: datasentinel_saxon-0.1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for datasentinel_saxon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e73428f1d98c39d7f448fd89c1f407c84354dd5e03732c075ec34d264faa1c7c
MD5 28a3921a52b6e3b3fe5b2234cf72f514
BLAKE2b-256 0e8c84a5a3049541ac335ca1ef1bce76de4f12f3f8cda957315a69293ea2f53a

See more details on using hashes here.

File details

Details for the file datasentinel_saxon-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datasentinel_saxon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a990dd083c4c29ad419d167d6ba9547edd99799a7ce70172040246b84a03225a
MD5 64d819f51763793c8a1ef7b116b2f409
BLAKE2b-256 5d1f05e8fc113c262ed07ed0e4c9c0050f4a64ffa9cd67f12c7dc136cbb903e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page