Enterprise Data Trust — Chapter 4: Unified Drift Monitoring Application

These details have not been verified by PyPI

Project description

DriftSentinel

Three Control Patterns. Multiple Datasets. One Platform That Proves All of Them Are Working.

Enterprise Data Trust — Chapter 4: DriftSentinel

Built by Anthony Johnson | EthereaLogic LLC

The first three chapters of Enterprise Data Trust prove three things: data can be certified at intake, distribution drift can be gated before publication, and control effectiveness can be measured against known failure scenarios. Each chapter solves one problem in isolation.

DriftSentinel solves the next one: running all three control patterns together, across multiple registered datasets, in a production Databricks environment — with append-only evidence for every run and an operator dashboard the platform team can actually use.

Three modules. One registry. Queryable evidence. No assumption that any run passed unless the artifact says so.

Executive Summary

Leadership question	Answer
What business risk does this address?	Enterprises with mature data quality patterns still lack a unified platform for running intake certification, drift gating, and control benchmarking across multiple datasets with governed, queryable evidence.
What does this application prove?	A Databricks-deployable application that registers datasets against versioned contracts, runs all three control patterns through a single orchestration layer, and surfaces queryable evidence artifacts in a read-only operator dashboard.
Why does it matter?	Proving that controls work on one dataset, once, is a starting point. This chapter proves that the same evidence discipline applies across multiple datasets, in a governed deployment, with an audit trail the platform team can inspect without writing scripts.

Key Exhibits

Exhibit 1: Dataset Contract Registry

The operator dashboard loads the active registry and shows every registered dataset with its contract version and Unity Catalog location. Two datasets registered — meridian_labor_hours and meridian_project_costs — both at version 1.0.0, bound to the driftsentinel catalog.

Registry tab showing 2 datasets registered with contract version and Unity Catalog location

Exhibit 2: Control Run Evidence at a Glance

The Run Status tab surfaces 48 evidence artifacts across intake, drift, and benchmark runs — 27 PASS, 20 FAIL, and 1 other. Every row is one artifact: dataset, kind, timestamp, verdict, and run ID. The verdict column makes failures immediately visible without opening a single file.

Run Status tab showing 48 artifacts with PASS and FAIL verdicts, dataset names, run kinds, and run IDs

Exhibit 3: Full Evidence Artifact Inspection

The Evidence Explorer loads a single evidence artifact by filename and renders the complete JSON payload inline. The artifact shown is a drift run against meridian_project_costs with a FAIL verdict: health score 0.2, four columns drifted, gate results visible with measured values recorded alongside thresholds.

Evidence Explorer showing full JSON payload for a drift FAIL artifact including health score, columns drifted, and gate verdicts

Exhibit 4: Visual Analytics Across All Control Runs

The Analytics tab scans the evidence directory and renders four charts: verdict distribution, runs by kind (intake, drift, benchmark), daily activity volume, and daily health trend. 48 artifacts analyzed from a single shared evidence directory — no manual aggregation.

Analytics tab showing verdict distribution bar chart and runs-by-kind donut chart across 48 control run artifacts

The Business Problem

Enterprises that reach this level of data trust maturity have solved the individual problems. They still face three operational gaps:

Multiple datasets require coordinated governance. Running intake certification for one dataset is a pattern. Running it across several — with version-aware contracts, consistent policy bindings, and a shared evidence store — is a platform problem.
Evidence is only useful if it is queryable. Append-only JSON artifacts are the right evidence discipline, but they are not useful to a platform team without a way to filter by dataset, kind, verdict, and run ID without writing scripts.
A Databricks deployment needs to be reproducible. A control pattern that runs locally in Python is a proof of concept. One that deploys with a validated Asset Bundle, runs on Databricks compute, and exposes a governed Databricks App is a product.

These are not hypothetical gaps. They are the operational reality of any team that has graduated from individual control patterns to platform-level governance.

What This Repository Contains

Surface	Purpose
`src/driftsentinel/intake/`	Schema drift detection and contract validation (Chapter 1 pattern)
`src/driftsentinel/drift/`	Distribution drift detection and gate logic (Chapter 2 pattern)
`src/driftsentinel/benchmark/`	Control effectiveness benchmarking against known failure scenarios (Chapter 3 pattern)
`src/driftsentinel/evidence/`	Append-only evidence artifact writing shared across all three modules
`src/driftsentinel/orchestration/`	Workflow sequencing — runs all three control patterns in order for each registered dataset
`src/driftsentinel/config/`	Dataset contract and policy configuration
`app/`	Databricks App (Gradio) — four-tab read-only operator dashboard
`notebooks/`	Onboarding, execution, and evidence-review notebooks for Databricks
`resources/`	Databricks Asset Bundle pipeline, job, and app resource definitions
`templates/`	Dataset contract, drift policy, and benchmark policy templates
`specs/`	Canonical SDLC documents governing the product
`tests/`	297-test suite covering domain logic, packaging, and governance

Every directory above contains a README.md describing its contents, including each submodule under src/driftsentinel/.

What This Repository Proves

Verified outcome	Evidence from this repository
Multi-dataset registry operates with version-aware contracts	2 datasets registered at v1.0.0 — visible in the Registry tab with catalog and table location
All three control patterns produce evidence through one orchestration layer	Intake, drift, and benchmark artifacts present in a single shared evidence directory
Evidence artifacts are queryable without writing scripts	Run Status tab filters 48 artifacts by dataset, kind, verdict, and run ID
FAIL verdicts surface measurable gate results	Drift FAIL artifact shows health score 0.2, 4 columns drifted, gate thresholds recorded
Databricks deployment is reproducible from source	Asset Bundle validates, deploys, and runs via `make bundle-validate` and `make app-deploy`

Decision / KPI Contract

Business decision: is the full data control pipeline healthy across all registered datasets?

KPI	Meaning
`overall_verdict`	PASS / WARN / FAIL for each control run artifact
`health_score`	Distribution stability score from drift runs (0.0–1.0)
`quality_recall`	Percentage of injected quality failures caught by benchmark runs
`columns_drifted`	Count of columns whose distribution shifted beyond threshold in a drift run
`artifacts_pass_rate`	Ratio of PASS artifacts to total artifacts in the evidence directory

Control rule: no run is assumed to have passed unless a PASS artifact exists in the evidence directory. FAIL artifacts carry gate results with measured values and thresholds so the platform team can triage without opening raw files.

Why This Pattern

Gap 1. Control patterns must share an evidence discipline. Intake, drift, and benchmark runs each write to the same append-only evidence directory using the same artifact schema. A single dashboard surfaces all three without special-casing any module.
Gap 2. The registry must be version-aware. As contracts evolve, older run artifacts retain the contract version that was active when they were produced. Evidence is never retroactively modified.
Gap 3. The operator dashboard must be read-only. The Gradio app exposes no write surfaces. Evidence is queried, never edited through the UI. This keeps the audit trail clean and the deployment governance simple.

How It Works

Register datasets. Each dataset is registered with a contract YAML specifying the Unity Catalog location, schema contract, drift policy, and benchmark policy.
Run the control pipeline. The orchestration layer runs intake certification, drift gating, and control benchmarking in sequence. Each module writes an append-only evidence artifact to the shared evidence directory.
Inspect run status. The Run Status tab surfaces all artifacts with verdicts, timestamps, and run IDs. FAIL rows are immediately visible — no file parsing required.
Explore individual artifacts. The Evidence Explorer loads any single artifact by filename and renders the full JSON payload, including gate results with measured values and thresholds.
Review analytics. The Analytics tab scans the evidence directory and renders verdict distribution, runs-by-kind breakdown, daily activity volume, and health trend over time.

Databricks Fit

Databricks Asset Bundles for source-controlled deployment of pipeline, job, and app resource definitions — validated and deployed from the repo with a single make target.
Databricks Apps (Gradio) for a governed, read-only operator dashboard with no custom web infrastructure required.
Unity Catalog for governed table publication and the evidence volume backing the operator dashboard.
Databricks Lakeflow / Jobs for scheduled control pipeline execution across registered datasets.
DriftSentinel is dataset-agnostic — the registry and policy binding handle per-dataset configuration regardless of schema shape or source system.

Quickstart

git clone https://github.com/Org-EthereaLogic/DriftSentinel.git
cd DriftSentinel

make sync   # installs runtime + dev dependencies via uv
make test   # runs the 297-test suite

Databricks Bundle and App Deployment

# First prove the catalog exists for your profile.
make bundle-catalog-check CATALOG=my_catalog PROFILE=<profile>

# Validate bundle wiring against that catalog.
make bundle-validate CATALOG=my_catalog PROFILE=<profile>

# Deploy bundle resources and start the Databricks App.
make app-deploy CATALOG=my_catalog PROFILE=<profile>

Direct CLI path:

databricks catalogs get my_catalog -p <profile>
databricks bundle validate -p <profile> --target dev --var="catalog=my_catalog"
databricks bundle deploy -p <profile> --target dev --var="catalog=my_catalog"
databricks apps start driftsentinel -p <profile>
databricks apps deploy -p <profile> --target dev --var="catalog=my_catalog"
databricks apps get driftsentinel -p <profile> -o json

bundle validate proves bundle, auth, and resource resolution. databricks apps get is the proof surface for SUCCEEDED plus RUNNING.

Notebook Import

Import the notebooks/ directory into your Databricks workspace to run the control pipeline from the deployed bundle or standalone from GitHub. Notebooks ship with bundled example templates. 01_register_dataset.py and 05_run_control_benchmark.py also accept optional workspace YAML paths for customized contracts from templates/.

AI-Assisted Setup

If you use an AI coding agent (Claude Code, Cursor, GitHub Copilot Workspace, or similar), paste the prompt below directly into your agent session. The agent will clone the repository, install dependencies, run the full test suite, and walk you through the Databricks deployment — no manual steps required.

Before you start, have these ready:

Python 3.11+
uv package manager — install with pip install uv
Databricks CLI configured with a valid profile — run databricks auth login if needed
A Databricks workspace with Unity Catalog enabled
The name of your Unity Catalog catalog

Copy and paste this prompt into your AI coding agent:

I want to set up DriftSentinel — a Databricks-deployable data trust platform that
unifies intake certification, drift detection, and control benchmarking in a single
application with a four-tab operator dashboard.

Repository: https://github.com/Org-EthereaLogic/DriftSentinel

Please complete these steps in order. Stop at any failure and report it before continuing.

1. Clone the repository:
   git clone https://github.com/Org-EthereaLogic/DriftSentinel.git
   cd DriftSentinel

2. Install dependencies (requires uv):
   make sync
   If uv is not installed: pip install uv

3. Run the full test suite. All 297 tests must pass before proceeding:
   make test

4. Read these files to understand the configuration model before deployment:
   - README.md
   - templates/dataset_contract.yaml
   - templates/drift_policy.yaml
   - templates/benchmark_policy.yaml

5. Ask me for my Databricks setup details:
   - My Unity Catalog catalog name
   - My Databricks CLI profile name
   Then confirm the catalog is reachable:
   make bundle-catalog-check CATALOG=<my_catalog> PROFILE=<my_profile>

6. Validate the Asset Bundle against my workspace:
   make bundle-validate CATALOG=<my_catalog> PROFILE=<my_profile>

7. If validation passes, deploy the bundle and start the Databricks App:
   make app-deploy CATALOG=<my_catalog> PROFILE=<my_profile>

8. Verify the deployment succeeded:
   databricks apps get driftsentinel -p <my_profile> -o json
   Confirm the status shows SUCCEEDED and RUNNING, then report the app URL.

After every step, report what happened. Do not skip a step or proceed past any
error without explaining it and asking me how to continue.

Scope Boundary

DriftSentinel validates the unified control platform using registered datasets in a local and Databricks environment. It does not constitute production-scale proof across arbitrary schema shapes or multi-workspace deployments. The registry, evidence model, and orchestration pattern are dataset-agnostic; the Databricks deployment path requires a workspace with Unity Catalog enabled.

Engineering Signals

GitHub Actions workflow: ci.yml

Additional Documentation

Part of a Series

This is Chapter 4 of the Enterprise Data Trust portfolio — a four-chapter body of work addressing the full lifecycle of data reliability in enterprise Databricks platforms.

Chapter	Focus	Repository
1. Trusted Source Intake	Validate and certify data before downstream consumption	trusted-source-intake
2. Silent Failure Prevention	Detect distribution drift before it reaches executive dashboards	silent-failure-prevention
3. Measurable Control Effectiveness	Prove that your data controls work against known failure scenarios	measurable-control-effectiveness
4. DriftSentinel	Unified application combining all three control patterns	← You are here

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

Apr 10, 2026

0.4.2

Apr 9, 2026

0.4.1

Apr 4, 2026

0.4.0

Apr 4, 2026

0.3.0

Apr 4, 2026

0.2.0

Apr 4, 2026

This version

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etherealogic_driftsentinel-0.1.0.tar.gz (2.2 MB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

etherealogic_driftsentinel-0.1.0-py3-none-any.whl (46.7 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file etherealogic_driftsentinel-0.1.0.tar.gz.

File metadata

Download URL: etherealogic_driftsentinel-0.1.0.tar.gz
Upload date: Apr 3, 2026
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etherealogic_driftsentinel-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`712bf493cd1deda4577ea068308c7c974991b12e680709001ed688a116fdee1a`
MD5	`eff3e8f279c12e2d7eba283ae5136d17`
BLAKE2b-256	`80e9530ec001cd1f339c21673f4f11c0e1c63c634a6f4c9ee7fdb9b897555af8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for etherealogic_driftsentinel-0.1.0.tar.gz:

Publisher: publish.yml on Org-EthereaLogic/DriftSentinel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: etherealogic_driftsentinel-0.1.0.tar.gz
- Subject digest: 712bf493cd1deda4577ea068308c7c974991b12e680709001ed688a116fdee1a
- Sigstore transparency entry: 1228568560
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: Org-EthereaLogic/DriftSentinel@6bfec772b21a226eb1b33988dcd46398d815a6a6
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Org-EthereaLogic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6bfec772b21a226eb1b33988dcd46398d815a6a6
- Trigger Event: workflow_dispatch

File details

Details for the file etherealogic_driftsentinel-0.1.0-py3-none-any.whl.

File metadata

Download URL: etherealogic_driftsentinel-0.1.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 46.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for etherealogic_driftsentinel-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`de242b9bf7305687060906460a9ae826a6013118b1f1fa12cc79c870e156a6c3`
MD5	`8b6a1001d2650c4eea6e4b8c1d71adcb`
BLAKE2b-256	`f571c77667929019174d8f44a11d2a11f803cdd39097ee204fca91235d676daa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for etherealogic_driftsentinel-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Org-EthereaLogic/DriftSentinel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: etherealogic_driftsentinel-0.1.0-py3-none-any.whl
- Subject digest: de242b9bf7305687060906460a9ae826a6013118b1f1fa12cc79c870e156a6c3
- Sigstore transparency entry: 1228568577
- Sigstore integration time: Apr 3, 2026
Source repository:
- Permalink: Org-EthereaLogic/DriftSentinel@6bfec772b21a226eb1b33988dcd46398d815a6a6
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Org-EthereaLogic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6bfec772b21a226eb1b33988dcd46398d815a6a6
- Trigger Event: workflow_dispatch

etherealogic-driftsentinel 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Three Control Patterns. Multiple Datasets. One Platform That Proves All of Them Are Working.

Executive Summary

Key Exhibits

Exhibit 1: Dataset Contract Registry

Exhibit 2: Control Run Evidence at a Glance

Exhibit 3: Full Evidence Artifact Inspection

Exhibit 4: Visual Analytics Across All Control Runs

The Business Problem

What This Repository Contains

What This Repository Proves

Decision / KPI Contract

Why This Pattern

How It Works

Databricks Fit

Quickstart

Databricks Bundle and App Deployment

Notebook Import

AI-Assisted Setup

Scope Boundary

Engineering Signals

Additional Documentation

Part of a Series

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance