A Python utility for DataFrame snapshots, schema tracking, and drift detection
Project description
Drift Detective ๐ต๏ธโโ๏ธ๐
โDid the structure of my data change, and should I care?โ
Drift Detective is a Python library for tracking schema evolution and detecting structural drift in tabular datasets using versioned JSON snapshots.
It is designed for data workflows where table schemas evolve over time.
The library focuses on schema-level changes and not row-level.
Key Features
- JSON snapshot-based schema tracking
- Added and removed column detection
- Historical timeline of schema evolution
- Structured (dictionary) and human-readable reports
- JSON file based design (no database backedn required)
- Comprehensive schema evolution report
Use cases
- Tracking table evolution over time
- Auditable history of schema changes
- Lightweit detection
API Reference
Drift Detective is built around four core components, each responsible for a specific part of schema tracking and reporting:
- DfSnapshot: Captures the schema state of a pandas DataFrame at a specific point in time and stores it as a versioned snapshot.
- SnapshotHistory: Creates a schema evolution timeline listing version and schema changes.
- SnapshotDiff: Compares schema changes between two snapshot versions, listing all added and removed columns across intermediate versions.
- SchemaReport: Integrates all components into a complete report to tell the full story
Each snapshot is stored as a JSON file containing metadata and schema information:
{
"table_name": "netflix_titles",
"filepath": "netflix_titles.csv",
"timestamp": "20251230_161527",
"version": 1,
"column_count": 12,
"row_count": 8807,
"schema": {
"show_id": "object",
"type": "object",
"title": "object",
"director": "object",
"cast": "object",
"country": "object",
"date_added": "object",
"release_year": "int64",
"rating": "object",
"duration": "object",
"listed_in": "object",
"description": "object"
},
"columns_added": [],
"columns_removed": []
}
You can print a human-readable timeline of all schema versions.
Snapshot Timeline for table: netflix_titles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
v1 โ 20251230_162126
โ columns: 12
โ rows: 8807
โ initial snapshot
v2 โ 20251230_163649
โ columns: 11
โ rows: 8807
โ - removed columns: title
v3 โ 20251230_163729
โ columns: 10
โ rows: 8807
โ - removed columns: listed_in
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Drift Detective allows you to compare any two schema versions:
Snapshot Diff for table: netflix_titles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Old snapshot v1 โ 20251230_162126
New snapshot v3 โ 20251230_163729
Added columns (new โ old): No added column(s)
Removed columns (new โ old): listed_in, title
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
For a complete view of schema evolution you can generate a structured report.
Schema Change Report for table: netflix_titles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Snapshots directory: docs/snapshots/netflix_titles
Latest snapshot version: 3
Available versions: 1, 2, 3
Total snapshots: 3
First snapshot Created: 20251230_162126
Latest snapshot Created: 20251230_163729
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Latest Snapshot for table: netflix_titles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
v3 โ 20251230_163729
| columns: 10
| rows: 8807
| current columns: show_id, type, director, cast, country, date_added, release_year, rating, duration, description
| + all added columns:
โ - all removed columns: title, listed_in
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Snapshot Timeline for table: netflix_titles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
v1 โ 20251230_162126
โ columns: 12
โ rows: 8807
โ initial snapshot
v2 โ 20251230_163649
โ columns: 11
โ rows: 8807
โ - removed columns: title
v3 โ 20251230_163729
โ columns: 10
โ rows: 8807
โ - removed columns: listed_in
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Snapshot Diff for table: netflix_titles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Old snapshot v1 โ 20251230_162126
New snapshot v3 โ 20251230_163729
Added columns (new โ old): No added column(s)
Removed columns (new โ old): listed_in, title
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Project Status and Roadmap
This project is in an early stage.
The core functionality for schema snapshotting, history tracking, comparison, and reporting is complete and usable.
Planned improvements:
- Add unit test for core components
- SQL snapshot support (PostgreSQL)
- Expanded documentation and examples
๐งฐ Tech Stack
- Python
๐ References
-
Python https://www.python.org/doc/
-
Pandas https://pandas.pydata.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drift_detective-0.1.0.tar.gz.
File metadata
- Download URL: drift_detective-0.1.0.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3526e9fac5d2f74d9517f07f7862167064699ae0dd362f2fb2b521c9f3e5ce4d
|
|
| MD5 |
3d1c44242f67da8d5d4270356e652cac
|
|
| BLAKE2b-256 |
e584ac0f5abf052fc4bf0ecc83f4f4bcf9afdb2fb2712e5106f4f48f724e9ef4
|
File details
Details for the file drift_detective-0.1.0-py3-none-any.whl.
File metadata
- Download URL: drift_detective-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f715de6a81117f8a357c5ff923b50c8fb79461742f3683db7224601c3477ba66
|
|
| MD5 |
c04cb7f9a98173d29e2b2dacab8b46d6
|
|
| BLAKE2b-256 |
c0450dc08b1ee3f33a57b2f5e1e8b50c3faab2f92efe836c89c3b96bc1b7fe14
|