DataFrame schema drift detection and alerting
Project description
dfdrift
A DataFrame schema drift detection and alerting library for pandas DataFrames.
Features
- Schema Tracking: Automatically save DataFrame schemas with location information (file:line)
- Change Detection: Detect schema changes between executions and alert when differences are found
- Configurable Storage: Support for local file storage with extensible interface for future cloud storage (GCS, etc.)
- Configurable Alerting: Built-in stderr alerter with extensible interface for future integrations (Slack, etc.)
Installation
# Install in development mode
uv pip install -e .
Usage
dfdrift offers two ways to validate DataFrames:
1. Import Replacement
Simply replace your pandas import with dfdrift.pandas:
import dfdrift.pandas as pd
# Configure validation (optional - uses default settings if omitted)
pd.configure_validation()
# All DataFrame operations are automatically validated
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['Tokyo', 'Osaka', 'Kyoto']
})
# Schema automatically saved with location info
2. Explicit Validation
import pandas as pd
import dfdrift
# Create a validator instance
validator = dfdrift.DfValidator()
# Validate a DataFrame manually
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['Tokyo', 'Osaka', 'Kyoto']
})
validator.validate(df)
Configuration
Custom Storage Path
import dfdrift
# Method 1: Import replacement with custom storage
import dfdrift.pandas as pd
pd.configure_validation(
storage=dfdrift.LocalFileStorage("./my_schemas")
)
# Method 2: Explicit validation with custom storage
validator = dfdrift.DfValidator(
storage=dfdrift.LocalFileStorage("./my_schemas")
)
Custom Alerter
import dfdrift
# Built-in stderr alerter (default)
import dfdrift.pandas as pd
pd.configure_validation(alerter=dfdrift.StderrAlerter())
# Or implement your own alerter
class SlackAlerter(dfdrift.Alerter):
def alert(self, message, location_key, old_schema, new_schema):
# Send to Slack
pass
pd.configure_validation(alerter=SlackAlerter())
Schema Change Detection
When a DataFrame schema changes between executions, dfdrift will automatically detect and alert:
- Added columns: New columns that weren't in the previous schema
- Removed columns: Columns that existed before but are now missing
- Type changes: When a column's dtype changes (e.g., int64 → object)
- Shape changes: When the DataFrame dimensions change
Example alert output:
WARNING: DataFrame schema changed at /path/to/file.py:25. Changes: Added columns: ['new_col']; Column 'age' dtype changed: int64 → object
Location: /path/to/file.py:25
Examples
See the samples/ directory for usage examples:
samples/sample.py: Explicit validationsamples/sample_custom_path.py: Custom storage pathsamples/sample_changing_schema.py: Schema change detection demosamples/sample_pandas_import.py: Import replacement
Architecture
Storage Interface
class SchemaStorage(ABC):
def save_schema(self, location_key: str, schema: Dict[str, Any]) -> None:
pass
def load_schemas(self) -> Dict[str, Any]:
pass
Alerter Interface
class Alerter(ABC):
def alert(self, message: str, location_key: str, old_schema: Dict[str, Any], new_schema: Dict[str, Any]) -> None:
pass
Schema Format
Schemas are stored as JSON with the following structure:
{
"/path/to/file.py:line_number": {
"columns": {
"column_name": {
"dtype": "int64",
"null_count": 0,
"total_count": 100
}
},
"shape": [100, 3]
}
}
Development
Run the samples to test functionality:
# Import replacement
uv run python samples/sample_pandas_import.py # Run twice to see alerts
# Explicit validation
uv run python samples/sample.py
# Test schema change detection
uv run python samples/sample_changing_schema.py # Run twice to see alerts
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dfdrift-0.1.0.tar.gz.
File metadata
- Download URL: dfdrift-0.1.0.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
941c2d1652370af23f34f5c76195afcbd4cc58df784bf1f64e3a865c0bc6a3ae
|
|
| MD5 |
ae76ff483795beaac425d376285ab929
|
|
| BLAKE2b-256 |
b0fe89b9c71f432f39ca6723146463242e5e79441277cd38aa31814978cb899e
|
File details
Details for the file dfdrift-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dfdrift-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68dd9f45ad30894aaad6e8ab33781d4623841c7df9b6e7f831d15828ea0ba441
|
|
| MD5 |
05c340abdb12e1f68bfd5bfd512b5a99
|
|
| BLAKE2b-256 |
81b01728da187d98892708e9f5a73241ad02d538385df02b0fa26bc59c68a172
|