Catch training ↔ serving feature skew before you ship to production
Project description
SkewSentry
Catch training ↔ serving feature skew before you ship to production
Prevent ML model failures with automated feature parity validation
🚀 Why SkewSentry?
SkewSentry transforms fragile ML deployments into reliable production systems through automated feature parity validation.
💰 Prevent Costly ML Failures
- 70% of ML failures stem from training/serving skew
- Months of silent degradation before detection
- Lost revenue and customer trust from broken predictions
⚡ Production-Ready Validation
- Pre-deployment detection - Catch issues in CI before they ship
- Configurable tolerances - Handle expected differences intelligently
- Multi-source support - Python functions, HTTP APIs, any feature pipeline
- Rich reporting - HTML reports with detailed mismatch analysis
🔧 Developer-First Design
- Zero configuration - Works out of the box with intelligent defaults
- CI integration - Exit codes for automated validation gates
- Multiple formats - Text, JSON, and HTML reports for different use cases
📦 Installation
Production
pip install skewsentry
Development
uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
⚡ Quickstart
Basic Feature Parity Check
# Initialize spec from your data
skewsentry init features.yml --data validation.parquet --keys user_id timestamp
# Run parity check
skewsentry check \
--spec features.yml \
--offline training.pipeline:extract_features \
--online serving.api:get_features \
--data validation.parquet \
--html report.html
# ✅ Exit 0: Features match within tolerance
# ❌ Exit 1: Parity violations detected (fails CI)
# 🚨 Exit 2: Configuration error
Realistic Example: E-commerce Features
# features.yml
version: 1
keys: ["user_id", "timestamp"]
features:
- name: total_spend_7d
dtype: float
tolerance:
abs: 0.01 # $0.01 absolute tolerance
rel: 0.001 # 0.1% relative tolerance
- name: order_count_30d
dtype: int
tolerance:
abs: 1 # Allow 1 order difference
# Offline pipeline (training)
def extract_features(df):
return df.assign(
total_spend_7d=df.groupby('user_id')['amount'].rolling('7D').sum(),
order_count_30d=df.groupby('user_id').size().rolling('30D').sum()
)
# Online pipeline (serving) - subtle differences
def get_features(df):
return df.assign(
total_spend_7d=df.groupby('user_id')['amount'].rolling('7D', closed='right').sum(), # Different windowing!
order_count_30d=df.groupby('user_id').size().rolling('30D').sum()
)
SkewSentry catches the windowing difference:
❌ Feature parity violations detected:
- total_spend_7d: mismatch_rate=0.1200 rows=5000 mean_abs_diff=0.0845
🏗️ Feature Adapters
SkewSentry works with any feature pipeline through adapters:
Python Functions
# Direct Python function integration
from skewsentry.adapters import PythonFunctionAdapter
adapter = PythonFunctionAdapter("mymodule:extract_features")
features = adapter.get_features(input_data)
HTTP APIs
# REST API integration with automatic batching
from skewsentry.adapters import HTTPAdapter
adapter = HTTPAdapter("http://api.example.com/features", timeout=30.0)
features = adapter.get_features(input_data)
Usage
Command Line Interface
Initialize Feature Spec
skewsentry init features.yml \
--data sample_data.parquet \
--keys user_id timestamp
Run Parity Check
skewsentry check \
--spec features.yml \
--offline module.offline:build_features \
--online module.online:get_features \
--data validation.parquet \
--sample 10000 \
--seed 42 \
--html artifacts/report.html \
--json artifacts/results.json
Python API
from skewsentry import FeatureSpec
from skewsentry.adapters.python import PythonFunctionAdapter
from skewsentry.adapters.http import HTTPAdapter
from skewsentry.runner import run_check
# Define feature comparison rules
spec = FeatureSpec.from_yaml("features.yml")
# Set up adapters for your pipelines
offline_adapter = PythonFunctionAdapter("training.pipeline:extract_features")
online_adapter = HTTPAdapter("https://api.myservice.com/features")
# Run comparison
report = run_check(
spec=spec,
data="validation_data.parquet", # or DataFrame
offline=offline_adapter,
online=online_adapter,
sample=5000,
seed=42,
html_out="report.html",
json_out="results.json"
)
# Check results
if report.ok:
print("✅ All features match within tolerance")
else:
print("❌ Feature parity violations detected:")
print(report.to_text(max_rows=10))
# Fail CI/CD pipeline
raise SystemExit(1)
Feature Specification
SkewSentry uses YAML configuration to define feature comparison rules:
version: 1
keys: ["user_id", "timestamp"] # Row alignment keys
null_policy: "same" # "same" | "allow_both_null"
features:
# Numeric features with tolerance
- name: spend_7d
dtype: float
nullable: true
tolerance:
abs: 0.01 # Absolute tolerance (optional)
rel: 0.001 # Relative tolerance (optional)
window:
lookback_days: 7
timestamp_col: "timestamp"
closed: "right"
# Categorical features with validation
- name: country
dtype: category
categories: ["US", "UK", "DE", "FR"] # Expected values
nullable: false
# Integer features with range validation
- name: age
dtype: int
nullable: false
range: [0, 120] # [min, max] bounds
# String features (exact match)
- name: user_segment
dtype: string
nullable: true
# DateTime features (exact match)
- name: last_login
dtype: datetime
nullable: true
Supported Data Types
| Type | Comparison | Tolerance | Notes |
|---|---|---|---|
int |
Numeric | ✅ abs/rel | Coerced to float for comparison |
float |
Numeric | ✅ abs/rel | NaN handling per null_policy |
bool |
Exact | ❌ | True/False only |
string |
Exact | ❌ | Case sensitive |
category |
Exact + Unknown detection | ❌ | Validates against expected categories |
datetime |
Exact | ❌ | Timezone aware |
Tolerance Configuration
Absolute Tolerance: |offline_value - online_value| ≤ abs_tolerance
Relative Tolerance: |offline_value - online_value| ≤ rel_tolerance × max(|offline_value|, |online_value|, ε)
Either or both can be specified. If both are provided, the comparison passes if either tolerance is satisfied.
Adapters
SkewSentry supports multiple adapter types to connect with different feature pipeline architectures:
Python Function Adapter
For in-process Python functions:
from skewsentry.adapters.python import PythonFunctionAdapter
# Your feature function signature
def extract_features(df: pd.DataFrame) -> pd.DataFrame:
"""Extract features from input DataFrame.
Args:
df: Input DataFrame with raw data
Returns:
DataFrame with feature columns + key columns
"""
return df[["user_id", "timestamp", "spend_7d", "country"]]
# Reference by module:function string
adapter = PythonFunctionAdapter("mypackage.features:extract_features")
HTTP Adapter
For REST API endpoints:
from skewsentry.adapters.http import HTTPAdapter
adapter = HTTPAdapter(
url="https://features.myservice.com/batch",
method="POST",
headers={"Authorization": "Bearer token"},
batch_size=1000, # Records per request
timeout=30.0,
max_retries=3
)
Expected API Contract:
- Request: JSON array of input records
- Response: JSON array of feature records (same order)
- Status: 200 for success, 4xx/5xx for errors
Reporting
SkewSentry generates multiple report formats for different use cases:
Text Report
# Console-friendly summary
print(report.to_text(max_rows=10))
OK: False
Missing rows — offline: 0, online: 3
Per-feature mismatch rates:
- spend_7d: mismatch_rate=0.1200 rows=1000 mean_abs_diff=0.0845
- country: mismatch_rate=0.0000 rows=1000 mean_abs_diff=None
JSON Report
# Machine-readable results
report.to_json("results.json")
{
"ok": false,
"keys": ["user_id", "timestamp"],
"missing_in_online": 3,
"missing_in_offline": 0,
"features": [
{
"name": "spend_7d",
"mismatch_rate": 0.12,
"num_rows": 1000,
"mean_abs_diff": 0.0845,
"unknown_categories": null
}
],
"failing_features": ["spend_7d"]
}
HTML Report
# Rich visual report for stakeholders
report.to_html("report.html")
Interactive HTML report includes:
- Executive summary with pass/fail status
- Per-feature mismatch statistics
- Sample mismatched rows with differences highlighted
- Missing row analysis
- Feature distribution comparisons
CI Integration
GitHub Actions
name: Feature Parity Check
on: [push, pull_request]
jobs:
parity-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -e ".[dev]"
- name: Run feature parity check
run: |
skewsentry check \
--spec features.yml \
--offline training.pipeline:extract_features \
--online serving.api:get_features \
--data tests/fixtures/validation.parquet \
--html artifacts/parity-report.html \
--json artifacts/parity-results.json
- name: Upload report artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: parity-reports
path: artifacts/
Exit Codes
- 0: All features match within specified tolerances ✅
- 1: Feature parity violations detected ❌
- 2: Configuration error or runtime failure 🚨
Integration Patterns
Pre-deployment Gate:
# Block deployment if parity check fails
skewsentry check --spec features.yml --offline offline:fn --online online:fn --data validation.parquet
if [ $? -eq 1 ]; then
echo "❌ Feature parity violations detected. Blocking deployment."
exit 1
fi
Model Registry Integration:
# Validate features before model registration
report = run_check(spec, data, offline_adapter, online_adapter)
if report.ok:
model_registry.register_model(model, features=spec.features)
else:
raise ValueError(f"Feature parity check failed: {report.failing_features}")
Examples
Real-World Bug Caught by SkewSentry
This is the exact type of production bug SkewSentry prevents:
# Training pipeline (offline) - Spark/Python
def extract_features(df):
# Rolling 7-day sum with pandas semantics
spend_7d = df.groupby("user_id")["amount"] \
.rolling(7, min_periods=1) \
.sum() \
.round(2)
return df.assign(spend_7d=spend_7d)
# Serving pipeline (online) - Java/Kafka Streams
# Translated to Python equivalent for illustration
def get_features(df):
# Rolling 7-day sum with different window semantics
spend_7d = df.groupby("user_id")["amount"] \
.rolling(7, closed="left") \
.sum() \
.apply(lambda x: math.floor(x * 100) / 100)
return df.assign(spend_7d=spend_7d)
The Differences:
- Window boundaries:
min_periods=1vsclosed="left" - Rounding logic:
round(2)vsfloor() * 100 / 100
The Impact: 12% of feature values differed by 0.01-0.15, causing model accuracy to drop from 94% to 89% in production.
The Solution: SkewSentry with tolerance: {abs: 0.01} caught this in CI:
❌ Feature parity violations detected:
- spend_7d: mismatch_rate=0.1200 rows=5000 mean_abs_diff=0.0845
Complete Example
See examples/python/ for a runnable demonstration showing how SkewSentry catches windowing and rounding differences between offline and online pipelines.
Development
Setup
git clone https://github.com/your-org/skewsentry.git
cd skewsentry
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"
Testing
# Run all tests
uv run pytest
# With coverage (enforces 85%+)
uv run pytest --cov=skewsentry --cov-fail-under=85
# Run specific test categories
uv run pytest -k test_spec # Specification tests
uv run pytest -k test_adapter # Adapter tests
uv run pytest -m "e2e" # End-to-end integration tests
Project Architecture
skewsentry/
├── __init__.py # Package exports
├── spec.py # FeatureSpec Pydantic models
├── inputs.py # Data loading and sampling
├── adapters/ # Pipeline adapters
│ ├── __init__.py
│ ├── base.py # FeatureAdapter protocol
│ ├── python.py # Python function adapter
│ ├── http.py # HTTP/REST API adapter
├── align.py # Row alignment by keys
├── compare.py # Feature comparison logic
├── runner.py # Pipeline orchestration
├── report.py # Report generation
├── cli.py # Command-line interface
├── errors.py # Exception classes
└── utils.py # Logging utilities
Contributing
- Issues: Report bugs or request features via GitHub Issues
- Pull Requests: Fork, create feature branch, add tests, submit PR
- Testing: All changes must include tests and maintain 85%+ coverage
- Documentation: Update README and docstrings for new features
Roadmap
v0.2.0 - Enhanced Analysis
- Statistical significance testing (KS-test, chi-square)
- Feature drift detection over time
- SQL adapter for database sources
- Streaming data support
v0.3.0 - Scale & Performance
- Spark/Dask backends for large datasets
- Distributed comparison for high-volume pipelines
- Advanced sampling strategies
- Performance benchmarking suite
v4.0.0 - Production Features
- Web dashboard for monitoring
- Alert integrations (Slack, PagerDuty)
- Model performance correlation analysis
- Enterprise security features
License: MIT | Python: 3.9+ | Maintained by: Yasser El Haddar
Prevent ML model failures before they reach production. Start validating your feature pipelines today.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skewsentry-0.1.1.tar.gz.
File metadata
- Download URL: skewsentry-0.1.1.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ba0d0e80b17772bbccbb9d05923b8f066e70a11e95763e11b94cf73055fb851
|
|
| MD5 |
2db31f8c37131bfd5f0af4330e2f3d45
|
|
| BLAKE2b-256 |
95e36f60c81c34e51affb880d51e53d105270b6c3ca5575582d44e52fcaf0dec
|
File details
Details for the file skewsentry-0.1.1-py3-none-any.whl.
File metadata
- Download URL: skewsentry-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04d8091fb8a82ab8fff5315b3b509929a586311c15b155b07afff40245fb46f5
|
|
| MD5 |
a965213e2663de85edec472fce4c71a9
|
|
| BLAKE2b-256 |
da6e9bda141b3a8dccee4fe49b8e28a54e0a1141d7c5288ace5b59de303066b7
|