Zero-config data quality monitoring and drift detection for pandas DataFrames, with optional Claude AI diagnosis and hosted dashboard sync.
Project description
DataSentinel
Zero-config data quality monitoring for pandas DataFrames. Catches drift, anomalies, and silent data breakage — locally, in seconds, no setup required.
pip install datasentinel
Quick start (fully local — no account needed)
from datasentinel import DataSentinel
import pandas as pd
df = pd.read_csv("orders.csv")
ds = DataSentinel()
report = ds.check(df)
print(report)
DataSentinel Report
500 rows x 11 columns
Overall: NONE
First run — baseline established. Run check() again later to detect drift.
Run it again tomorrow on a new export of the same data, and DataSentinel compares it against the cached baseline automatically:
df_tomorrow = pd.read_csv("orders_tomorrow.csv")
report = ds.check(df_tomorrow)
print(report)
DataSentinel Report
512 rows x 11 columns
Overall: HIGH
Flagged columns:
[HIGH] discount_pct
- Distribution shifted (PSI=0.342)
[MEDIUM] country
- Distinct value count changed from 7 to 11
With a hosted account (history, scheduling, Slack alerts, AI diagnosis)
ds = DataSentinel(api_key="ds_...")
report = ds.check(df, pipeline_name="Orders") # profiles locally AND syncs to your dashboard
When synced, report.diagnosis contains a plain-English root-cause explanation generated by Claude, and report.pipeline_url links straight to the dashboard.
Get an API key by creating a free account at datasentinel-eight.vercel.app.
Connecting a live database (hosted only)
ds = DataSentinel(api_key="ds_...")
pipeline = ds.connect_postgres(
dsn="postgresql://user:pass@host:5432/db",
table="orders",
pipeline_name="Orders Pipeline",
)
result = ds.run_pipeline(pipeline["id"])
This registers a scheduled pipeline identical to one created from the dashboard — it'll run automatically on its configured interval and alert you via Slack when something breaks.
What it checks
- Null rate drift — sudden spikes or drops in missing data
- Distribution drift (PSI) — numeric and categorical distributions shifting over time
- Cardinality drift — new or disappearing categories
- Volume drift — unexpected row count changes
Why DataSentinel
Most data quality tools are either too simple (just schema checks) or too heavy (enterprise platforms requiring a deployment team). DataSentinel sits in between: zero config to start, statistically rigorous under the hood, and — when synced — explains why something broke in plain English instead of just flagging that it did.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasentinel_saxon-0.1.0.tar.gz.
File metadata
- Download URL: datasentinel_saxon-0.1.0.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e73428f1d98c39d7f448fd89c1f407c84354dd5e03732c075ec34d264faa1c7c
|
|
| MD5 |
28a3921a52b6e3b3fe5b2234cf72f514
|
|
| BLAKE2b-256 |
0e8c84a5a3049541ac335ca1ef1bce76de4f12f3f8cda957315a69293ea2f53a
|
File details
Details for the file datasentinel_saxon-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datasentinel_saxon-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a990dd083c4c29ad419d167d6ba9547edd99799a7ce70172040246b84a03225a
|
|
| MD5 |
64d819f51763793c8a1ef7b116b2f409
|
|
| BLAKE2b-256 |
5d1f05e8fc113c262ed07ed0e4c9c0050f4a64ffa9cd67f12c7dc136cbb903e2
|