Skip to main content

A Python utility for DataFrame snapshots, schema tracking, and drift detection

Project description

Drift Detective ๐Ÿ•ต๏ธโ€โ™‚๏ธ๐Ÿ“Š

โ€œDid the structure of my data change, and should I care?โ€

Drift Detective is a Python library for tracking schema evolution and detecting structural drift in tabular datasets using versioned JSON snapshots.

It is designed for data workflows where table schemas evolve over time.

The library focuses on schema-level changes and not row-level.

Key Features

  • JSON snapshot-based schema tracking
  • Added and removed column detection
  • Historical timeline of schema evolution
  • Structured (dictionary) and human-readable reports
  • JSON file based design (no database backedn required)
  • Comprehensive schema evolution report

Use cases

  • Tracking table evolution over time
  • Auditable history of schema changes
  • Lightweit detection

API Reference

Drift Detective is built around four core components, each responsible for a specific part of schema tracking and reporting:

  • DfSnapshot: Captures the schema state of a pandas DataFrame at a specific point in time and stores it as a versioned snapshot.
  • SnapshotHistory: Creates a schema evolution timeline listing version and schema changes.
  • SnapshotDiff: Compares schema changes between two snapshot versions, listing all added and removed columns across intermediate versions.
  • SchemaReport: Integrates all components into a complete report to tell the full story

Each snapshot is stored as a JSON file containing metadata and schema information:

{
    "table_name": "netflix_titles",
    "filepath": "netflix_titles.csv",
    "timestamp": "20251230_161527",
    "version": 1,
    "column_count": 12,
    "row_count": 8807,
    "schema": {
        "show_id": "object",
        "type": "object",
        "title": "object",
        "director": "object",
        "cast": "object",
        "country": "object",
        "date_added": "object",
        "release_year": "int64",
        "rating": "object",
        "duration": "object",
        "listed_in": "object",
        "description": "object"
    },
    "columns_added": [],
    "columns_removed": []
}

You can print a human-readable timeline of all schema versions.

Snapshot Timeline for table: netflix_titles
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

v1  โ—  20251230_162126
    โ”‚ columns: 12
    โ”‚ rows: 8807
    โ”‚ initial snapshot

v2  โ—  20251230_163649
    โ”‚ columns: 11
    โ”‚ rows: 8807
    โ”‚ - removed columns: title

v3  โ—  20251230_163729
    โ”‚ columns: 10
    โ”‚ rows: 8807
    โ”‚ - removed columns: listed_in
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Drift Detective allows you to compare any two schema versions:

Snapshot Diff for table: netflix_titles
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Old snapshot v1  โ—  20251230_162126
New snapshot v3  โ—  20251230_163729
Added columns (new โ†’ old): No added column(s)
Removed columns (new โ†’ old): listed_in, title
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

For a complete view of schema evolution you can generate a structured report.

Schema Change Report for table: netflix_titles
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Snapshots directory: docs/snapshots/netflix_titles
Latest snapshot version: 3 
Available versions: 1, 2, 3
Total snapshots: 3
First snapshot Created: 20251230_162126
Latest snapshot Created: 20251230_163729
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

 Latest Snapshot for table: netflix_titles
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
v3  โ—  20251230_163729
    |   columns: 10
    |   rows: 8807
    |   current columns: show_id, type, director, cast, country, date_added, release_year, rating, duration, description
    | + all added columns: 
    โ”‚ - all removed columns: title, listed_in
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Snapshot Timeline for table: netflix_titles
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

v1  โ—  20251230_162126
    โ”‚ columns: 12
    โ”‚ rows: 8807
    โ”‚ initial snapshot

v2  โ—  20251230_163649
    โ”‚ columns: 11
    โ”‚ rows: 8807
    โ”‚ - removed columns: title

v3  โ—  20251230_163729
    โ”‚ columns: 10
    โ”‚ rows: 8807
    โ”‚ - removed columns: listed_in
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€


Snapshot Diff for table: netflix_titles
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Old snapshot v1  โ—  20251230_162126
New snapshot v3  โ—  20251230_163729
Added columns (new โ†’ old): No added column(s)
Removed columns (new โ†’ old): listed_in, title
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Project Status and Roadmap

This project is in an early stage.

The core functionality for schema snapshotting, history tracking, comparison, and reporting is complete and usable.

Planned improvements:

  • Add unit test for core components
  • SQL snapshot support (PostgreSQL)
  • Expanded documentation and examples

๐Ÿงฐ Tech Stack

  • Python

๐Ÿ”— References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drift_detective-0.1.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drift_detective-0.1.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file drift_detective-0.1.0.tar.gz.

File metadata

  • Download URL: drift_detective-0.1.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for drift_detective-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3526e9fac5d2f74d9517f07f7862167064699ae0dd362f2fb2b521c9f3e5ce4d
MD5 3d1c44242f67da8d5d4270356e652cac
BLAKE2b-256 e584ac0f5abf052fc4bf0ecc83f4f4bcf9afdb2fb2712e5106f4f48f724e9ef4

See more details on using hashes here.

File details

Details for the file drift_detective-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for drift_detective-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f715de6a81117f8a357c5ff923b50c8fb79461742f3683db7224601c3477ba66
MD5 c04cb7f9a98173d29e2b2dacab8b46d6
BLAKE2b-256 c0450dc08b1ee3f33a57b2f5e1e8b50c3faab2f92efe836c89c3b96bc1b7fe14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page