Skip to main content

Catch training ↔ serving feature skew before you ship to production

Project description

SkewSentry Logo

SkewSentry

Catch training ↔ serving feature skew before you ship to production

Python 3.9+ License: MIT Test Coverage

Prevent ML model failures with automated feature parity validation


🚀 Why SkewSentry?

SkewSentry transforms fragile ML deployments into reliable production systems through automated feature parity validation.

💰 Prevent Costly ML Failures

  • 70% of ML failures stem from training/serving skew
  • Months of silent degradation before detection
  • Lost revenue and customer trust from broken predictions

Production-Ready Validation

  • Pre-deployment detection - Catch issues in CI before they ship
  • Configurable tolerances - Handle expected differences intelligently
  • Multi-source support - Python functions, HTTP APIs, any feature pipeline
  • Rich reporting - HTML reports with detailed mismatch analysis

🔧 Developer-First Design

  • Zero configuration - Works out of the box with intelligent defaults
  • CI integration - Exit codes for automated validation gates
  • Multiple formats - Text, JSON, and HTML reports for different use cases

📦 Installation

Production

pip install skewsentry

Development

uv venv .venv
source .venv/bin/activate
uv pip install -e ".[dev]"

⚡ Quickstart

Basic Feature Parity Check

# Initialize spec from your data
skewsentry init features.yml --data validation.parquet --keys user_id timestamp

# Run parity check
skewsentry check \
  --spec features.yml \
  --offline training.pipeline:extract_features \
  --online serving.api:get_features \
  --data validation.parquet \
  --html report.html

# ✅ Exit 0: Features match within tolerance
# ❌ Exit 1: Parity violations detected (fails CI)
# 🚨 Exit 2: Configuration error

Realistic Example: E-commerce Features

# features.yml
version: 1
keys: ["user_id", "timestamp"]

features:
  - name: total_spend_7d
    dtype: float
    tolerance:
      abs: 0.01  # $0.01 absolute tolerance
      rel: 0.001  # 0.1% relative tolerance
      
  - name: order_count_30d
    dtype: int
    tolerance:
      abs: 1  # Allow 1 order difference
# Offline pipeline (training)
def extract_features(df):
    return df.assign(
        total_spend_7d=df.groupby('user_id')['amount'].rolling('7D').sum(),
        order_count_30d=df.groupby('user_id').size().rolling('30D').sum()
    )

# Online pipeline (serving) - subtle differences
def get_features(df):
    return df.assign(
        total_spend_7d=df.groupby('user_id')['amount'].rolling('7D', closed='right').sum(),  # Different windowing!
        order_count_30d=df.groupby('user_id').size().rolling('30D').sum()
    )

SkewSentry catches the windowing difference:

 Feature parity violations detected:
  - total_spend_7d: mismatch_rate=0.1200 rows=5000 mean_abs_diff=0.0845

🏗️ Feature Adapters

SkewSentry works with any feature pipeline through adapters:

Python Functions

# Direct Python function integration
from skewsentry.adapters import PythonFunctionAdapter

adapter = PythonFunctionAdapter("mymodule:extract_features")
features = adapter.get_features(input_data)

HTTP APIs

# REST API integration with automatic batching
from skewsentry.adapters import HTTPAdapter

adapter = HTTPAdapter("http://api.example.com/features", timeout=30.0)
features = adapter.get_features(input_data)

Usage

Command Line Interface

Initialize Feature Spec

skewsentry init features.yml \
  --data sample_data.parquet \
  --keys user_id timestamp

Run Parity Check

skewsentry check \
  --spec features.yml \
  --offline module.offline:build_features \
  --online module.online:get_features \
  --data validation.parquet \
  --sample 10000 \
  --seed 42 \
  --html artifacts/report.html \
  --json artifacts/results.json

Python API

from skewsentry import FeatureSpec
from skewsentry.adapters.python import PythonFunctionAdapter
from skewsentry.adapters.http import HTTPAdapter
from skewsentry.runner import run_check

# Define feature comparison rules
spec = FeatureSpec.from_yaml("features.yml")

# Set up adapters for your pipelines
offline_adapter = PythonFunctionAdapter("training.pipeline:extract_features")
online_adapter = HTTPAdapter("https://api.myservice.com/features")

# Run comparison
report = run_check(
    spec=spec,
    data="validation_data.parquet",  # or DataFrame
    offline=offline_adapter,
    online=online_adapter,
    sample=5000,
    seed=42,
    html_out="report.html",
    json_out="results.json"
)

# Check results
if report.ok:
    print("✅ All features match within tolerance")
else:
    print("❌ Feature parity violations detected:")
    print(report.to_text(max_rows=10))
    
    # Fail CI/CD pipeline
    raise SystemExit(1)

Feature Specification

SkewSentry uses YAML configuration to define feature comparison rules:

version: 1
keys: ["user_id", "timestamp"]  # Row alignment keys
null_policy: "same"              # "same" | "allow_both_null"

features:
  # Numeric features with tolerance
  - name: spend_7d
    dtype: float
    nullable: true
    tolerance:
      abs: 0.01      # Absolute tolerance (optional)
      rel: 0.001     # Relative tolerance (optional)
    window:
      lookback_days: 7
      timestamp_col: "timestamp"
      closed: "right"
      
  # Categorical features with validation
  - name: country
    dtype: category
    categories: ["US", "UK", "DE", "FR"]  # Expected values
    nullable: false
    
  # Integer features with range validation
  - name: age
    dtype: int
    nullable: false
    range: [0, 120]  # [min, max] bounds
    
  # String features (exact match)
  - name: user_segment
    dtype: string
    nullable: true
    
  # DateTime features (exact match)
  - name: last_login
    dtype: datetime
    nullable: true

Supported Data Types

Type Comparison Tolerance Notes
int Numeric ✅ abs/rel Coerced to float for comparison
float Numeric ✅ abs/rel NaN handling per null_policy
bool Exact True/False only
string Exact Case sensitive
category Exact + Unknown detection Validates against expected categories
datetime Exact Timezone aware

Tolerance Configuration

Absolute Tolerance: |offline_value - online_value| ≤ abs_tolerance

Relative Tolerance: |offline_value - online_value| ≤ rel_tolerance × max(|offline_value|, |online_value|, ε)

Either or both can be specified. If both are provided, the comparison passes if either tolerance is satisfied.

Adapters

SkewSentry supports multiple adapter types to connect with different feature pipeline architectures:

Python Function Adapter

For in-process Python functions:

from skewsentry.adapters.python import PythonFunctionAdapter

# Your feature function signature
def extract_features(df: pd.DataFrame) -> pd.DataFrame:
    """Extract features from input DataFrame.
    
    Args:
        df: Input DataFrame with raw data
        
    Returns:
        DataFrame with feature columns + key columns
    """
    return df[["user_id", "timestamp", "spend_7d", "country"]]

# Reference by module:function string
adapter = PythonFunctionAdapter("mypackage.features:extract_features")

HTTP Adapter

For REST API endpoints:

from skewsentry.adapters.http import HTTPAdapter

adapter = HTTPAdapter(
    url="https://features.myservice.com/batch",
    method="POST",
    headers={"Authorization": "Bearer token"},
    batch_size=1000,  # Records per request
    timeout=30.0,
    max_retries=3
)

Expected API Contract:

  • Request: JSON array of input records
  • Response: JSON array of feature records (same order)
  • Status: 200 for success, 4xx/5xx for errors

Reporting

SkewSentry generates multiple report formats for different use cases:

Text Report

# Console-friendly summary
print(report.to_text(max_rows=10))
OK: False
Missing rows — offline: 0, online: 3
Per-feature mismatch rates:
  - spend_7d: mismatch_rate=0.1200 rows=1000 mean_abs_diff=0.0845
  - country: mismatch_rate=0.0000 rows=1000 mean_abs_diff=None

JSON Report

# Machine-readable results
report.to_json("results.json")
{
  "ok": false,
  "keys": ["user_id", "timestamp"],
  "missing_in_online": 3,
  "missing_in_offline": 0,
  "features": [
    {
      "name": "spend_7d",
      "mismatch_rate": 0.12,
      "num_rows": 1000,
      "mean_abs_diff": 0.0845,
      "unknown_categories": null
    }
  ],
  "failing_features": ["spend_7d"]
}

HTML Report

# Rich visual report for stakeholders
report.to_html("report.html")

Interactive HTML report includes:

  • Executive summary with pass/fail status
  • Per-feature mismatch statistics
  • Sample mismatched rows with differences highlighted
  • Missing row analysis
  • Feature distribution comparisons

CI Integration

GitHub Actions

name: Feature Parity Check
on: [push, pull_request]

jobs:
  parity-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          pip install -e ".[dev]"
          
      - name: Run feature parity check
        run: |
          skewsentry check \
            --spec features.yml \
            --offline training.pipeline:extract_features \
            --online serving.api:get_features \
            --data tests/fixtures/validation.parquet \
            --html artifacts/parity-report.html \
            --json artifacts/parity-results.json
            
      - name: Upload report artifacts
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: parity-reports
          path: artifacts/

Exit Codes

  • 0: All features match within specified tolerances ✅
  • 1: Feature parity violations detected ❌
  • 2: Configuration error or runtime failure 🚨

Integration Patterns

Pre-deployment Gate:

# Block deployment if parity check fails
skewsentry check --spec features.yml --offline offline:fn --online online:fn --data validation.parquet
if [ $? -eq 1 ]; then
  echo "❌ Feature parity violations detected. Blocking deployment."
  exit 1
fi

Model Registry Integration:

# Validate features before model registration
report = run_check(spec, data, offline_adapter, online_adapter)
if report.ok:
    model_registry.register_model(model, features=spec.features)
else:
    raise ValueError(f"Feature parity check failed: {report.failing_features}")

Examples

Real-World Bug Caught by SkewSentry

This is the exact type of production bug SkewSentry prevents:

# Training pipeline (offline) - Spark/Python
def extract_features(df):
    # Rolling 7-day sum with pandas semantics
    spend_7d = df.groupby("user_id")["amount"] \
                 .rolling(7, min_periods=1) \
                 .sum() \
                 .round(2)
    return df.assign(spend_7d=spend_7d)

# Serving pipeline (online) - Java/Kafka Streams  
# Translated to Python equivalent for illustration
def get_features(df):
    # Rolling 7-day sum with different window semantics
    spend_7d = df.groupby("user_id")["amount"] \
                 .rolling(7, closed="left") \
                 .sum() \
                 .apply(lambda x: math.floor(x * 100) / 100)
    return df.assign(spend_7d=spend_7d)

The Differences:

  1. Window boundaries: min_periods=1 vs closed="left"
  2. Rounding logic: round(2) vs floor() * 100 / 100

The Impact: 12% of feature values differed by 0.01-0.15, causing model accuracy to drop from 94% to 89% in production.

The Solution: SkewSentry with tolerance: {abs: 0.01} caught this in CI:

 Feature parity violations detected:
  - spend_7d: mismatch_rate=0.1200 rows=5000 mean_abs_diff=0.0845

Complete Example

See examples/python/ for a runnable demonstration showing how SkewSentry catches windowing and rounding differences between offline and online pipelines.

Development

Setup

git clone https://github.com/your-org/skewsentry.git
cd skewsentry
uv venv .venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Testing

# Run all tests
uv run pytest

# With coverage (enforces 85%+)
uv run pytest --cov=skewsentry --cov-fail-under=85

# Run specific test categories
uv run pytest -k test_spec              # Specification tests
uv run pytest -k test_adapter           # Adapter tests  
uv run pytest -m "e2e"                  # End-to-end integration tests

Project Architecture

skewsentry/
├── __init__.py                    # Package exports
├── spec.py                        # FeatureSpec Pydantic models
├── inputs.py                      # Data loading and sampling
├── adapters/                      # Pipeline adapters
│   ├── __init__.py
│   ├── base.py                    # FeatureAdapter protocol
│   ├── python.py                  # Python function adapter
│   ├── http.py                    # HTTP/REST API adapter
├── align.py                       # Row alignment by keys
├── compare.py                     # Feature comparison logic
├── runner.py                      # Pipeline orchestration
├── report.py                      # Report generation
├── cli.py                         # Command-line interface
├── errors.py                      # Exception classes
└── utils.py                       # Logging utilities

Contributing

  1. Issues: Report bugs or request features via GitHub Issues
  2. Pull Requests: Fork, create feature branch, add tests, submit PR
  3. Testing: All changes must include tests and maintain 85%+ coverage
  4. Documentation: Update README and docstrings for new features

Roadmap

v0.2.0 - Enhanced Analysis

  • Statistical significance testing (KS-test, chi-square)
  • Feature drift detection over time
  • SQL adapter for database sources
  • Streaming data support

v0.3.0 - Scale & Performance

  • Spark/Dask backends for large datasets
  • Distributed comparison for high-volume pipelines
  • Advanced sampling strategies
  • Performance benchmarking suite

v4.0.0 - Production Features

  • Web dashboard for monitoring
  • Alert integrations (Slack, PagerDuty)
  • Model performance correlation analysis
  • Enterprise security features

License: MIT | Python: 3.9+ | Maintained by: Yasser El Haddar

Prevent ML model failures before they reach production. Start validating your feature pipelines today.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skewsentry-0.1.1.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skewsentry-0.1.1-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file skewsentry-0.1.1.tar.gz.

File metadata

  • Download URL: skewsentry-0.1.1.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for skewsentry-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5ba0d0e80b17772bbccbb9d05923b8f066e70a11e95763e11b94cf73055fb851
MD5 2db31f8c37131bfd5f0af4330e2f3d45
BLAKE2b-256 95e36f60c81c34e51affb880d51e53d105270b6c3ca5575582d44e52fcaf0dec

See more details on using hashes here.

File details

Details for the file skewsentry-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: skewsentry-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for skewsentry-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 04d8091fb8a82ab8fff5315b3b509929a586311c15b155b07afff40245fb46f5
MD5 a965213e2663de85edec472fce4c71a9
BLAKE2b-256 da6e9bda141b3a8dccee4fe49b8e28a54e0a1141d7c5288ace5b59de303066b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page