Lightweight OLS-based concept drift detector for regression and classification

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Linear Drift Detector

A lightweight, explainable concept drift detection library based on linear coefficient analysis using OLS (Ordinary Least Squares).
This package is designed for both regression and classification models to detect when the underlying data relationship between features and targets has changed over time — i.e., when concept drift occurs.

Introduction

In many deployed ML systems, the relationship between inputs (X) and target (y) evolves over time.
This evolution — often subtle — causes concept drift, where a model trained on historical data no longer reflects the true structure of incoming (production) data.

Instead of retraining blindly, it’s essential to detect when this drift occurs.
That’s where the linear-drift-detector helps: it quantifies the change in feature relationships using OLS regression coefficients.

Core Idea

Even if your production model is nonlinear (like Random Forest or XGBoost),
we can still proxy the structural relationship between features and targets using a simple linear fit.

We fit an OLS model on:

The training dataset (X_train, y_train)
The production dataset (X_prod, y_prod_prediction — actual labels or predicted outputs)

Then, we compare the learned coefficients.

If the coefficients shift significantly between the two datasets,
it indicates a potential concept drift in the data-generating process.

Mathematical Foundation

Let the relationship between target y and features X be modeled as:

$$ y = X\beta + \epsilon $$

Where:

$X$: Feature matrix
$\beta$: Coefficient vector
$\epsilon$: Random noise term

We fit two models:

$$ \hat{\beta}{train} = (X{train}^T X_{train})^{-1} X_{train}^T y_{train} $$

$$ \hat{\beta}{prod} = (X{prod}^T X_{prod})^{-1} X_{prod}^T y_{prod} $$

Then compute:

$$ \Delta \beta = \hat{\beta}{prod} - \hat{\beta}{train} $$

To statistically test if the difference is significant:

$$ Z_i = \frac{\hat{\beta}{prod,i} - \hat{\beta}{train,i}}{\sqrt{SE_{train,i}^2 + SE_{prod,i}^2}} $$

Where $SE$ is the standard error of each coefficient.

The two-tailed p-value is computed as:

$$ p_i = 2(1 - \Phi(|Z_i|)) $$

Algorithm Overview

Input:
- X_train, y_train: historical (training) data
- X_prod, y_prod_prediction: production data and outputs (actual or predicted)
Fit two OLS models:
- model_train = OLS(y_train, X_train)
- model_prod = OLS(y_prod_prediction, X_prod)
Extract coefficients and standard errors
Compute difference metrics:
- Δβ (coefficient shift)
- L2 norm distance
- Z-test and p-values for statistical significance
Return diagnostic report

Installation

pip install linear-drift-detector

Quick Start Example (Regression)

import numpy as np
from linear_drift_detector import linear_coefficient_shift

# Generate training data
np.random.seed(42)
X_train = np.random.randn(200, 3)
y_train = 3*X_train[:,0] - 2.5*X_train[:,1] + 4*X_train[:,2] + np.random.randn(200)*0.5

# Generate production data (shifted relationships)
X_prod = np.random.randn(200, 3)
y_prod_pred = 4*X_prod[:,0] - 1.5*X_prod[:,1] + 5*X_prod[:,2] + np.random.randn(200)*0.5

# Run drift detection
result = linear_coefficient_shift(X_train, y_train, X_prod, y_prod_pred)

# Print diagnostic outputs
print(result["z_test"])
print("L2 Distance:", result["l2_distance"])

Example Output

	coef_train	coef_prod	diff	z_value	p_value
const	0.01234	0.02345	0.01111	0.28	0.776
x1	2.98456	3.99123	1.00667	5.12	0.000
x2	-2.48721	-1.49834	0.98887	4.97	0.000
x3	4.01245	4.98678	0.97433	4.54	0.000

L2 Distance: 1.71

Interpretation: Significant p-values (< 0.05) and large L2 distance indicate a strong concept drift.

Example (Classification)

Even for classification tasks, OLS can be used as a proxy detector for internal data structure shifts.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from linear_drift_detector import linear_coefficient_shift

# Training data
X_train, y_train = make_classification(
    n_samples=200, n_features=3, n_informative=3, n_redundant=0, random_state=42
)

# Production data with changed separation
X_prod, y_prod = make_classification(
    n_samples=200, n_features=3, n_informative=3, n_redundant=0, class_sep=1.5, random_state=99
)

# Simulate model predictions
clf = LogisticRegression()
clf.fit(X_train, y_train)
y_prod_pred = clf.predict_proba(X_prod)[:, 1]

# Detect drift
result = linear_coefficient_shift(X_train, y_train, X_prod, y_prod_pred)
print(result["z_test"])
print("L2 Distance:", result["l2_distance"])

Here, the production dataset has a different internal structure, and the drift detector highlights this through coefficient divergence. The output is similar to regression.

Output Details

The function returns a dictionary:

Key	Description
`coef_train`	Coefficients from training OLS
`coef_prod`	Coefficients from production OLS
`coef_diff`	Difference vector (production - training)
`l2_distance`	Magnitude of coefficient drift
`z_test`	DataFrame with z-values and p-values for each coefficient

Interpretation

High L2 Distance: overall structural shift in data

Low p-values (< 0.05): statistically significant coefficient drift

Large Δβ: feature relationship changed

Stable coefficients: no significant drift

When to Use

Monitor deployed regression or classification models

Detect data drift when retraining is expensive

Quantify how much internal data relationship has changed

Build interpretability into data drift detection pipelines

Limitations

OLS assumes a linear relationship — may not match nonlinear models

Requires same feature dimensionality (X_train.shape == X_prod.shape)

Sensitive to scaling (consider standardizing features)

Works best as a proxy detector, may not as a perfect substitute for full statistical drift tests

License

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linear_drift_detector-0.1.0.tar.gz (5.5 kB view details)

Uploaded Oct 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

linear_drift_detector-0.1.0-py3-none-any.whl (5.5 kB view details)

Uploaded Oct 18, 2025 Python 3

File details

Details for the file linear_drift_detector-0.1.0.tar.gz.

File metadata

Download URL: linear_drift_detector-0.1.0.tar.gz
Upload date: Oct 18, 2025
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for linear_drift_detector-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`30511abb8e982ef72b2fc1db3c65cb52a763255650a8952949d8a3768e0eb233`
MD5	`cf84da781c5fc227b2ef34fde03b356d`
BLAKE2b-256	`c279dca7c0a75547856c9257051ba04832b92c4796e6f76e230f2af205693284`

See more details on using hashes here.

File details

Details for the file linear_drift_detector-0.1.0-py3-none-any.whl.

File metadata

Download URL: linear_drift_detector-0.1.0-py3-none-any.whl
Upload date: Oct 18, 2025
Size: 5.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for linear_drift_detector-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e2dab87c715b15a074e1d811c842c4f6b27850de49f5a303c782145eb942841`
MD5	`0bd4b7da2699ea4e8c597db286825f2e`
BLAKE2b-256	`10128d6429a37aa044aae34d96a35ad91f9579980f3e4e63844ab473d41a7ffa`

See more details on using hashes here.

linear-drift-detector 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Linear Drift Detector

📖 Table of Contents

Introduction

Core Idea

Mathematical Foundation

Algorithm Overview

Installation

Quick Start Example (Regression)

Example Output

Example (Classification)

Output Details

Interpretation

When to Use

Limitations

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes