Skip to main content

A small toolbox for mlops

Project description

TinyShift

tinyshift_full_logo

**TinyShift** is a lightweight, sklearn-compatible Python library designed for **data drift detection**, **outlier identification**, and **MLOps monitoring** in production machine learning systems. The library provides modular, easy-to-use tools for detecting when data distributions or model performance change over time, with comprehensive visualization capabilities.

For enterprise-grade solutions, consider Nannyml.

Features

  • Data Drift Detection: Categorical and continuous data drift monitoring with multiple distance metrics
  • Outlier Detection: HBOS, PCA-based and SPAD outlier detection algorithms
  • Time Series Analysis: Seasonality decomposition, trend analysis, and forecasting diagnostics

Technologies Used

  • Python 3.10+
  • Scikit-learn 1.3.0+
  • Pandas 2.3.0+
  • NumPy
  • SciPy
  • Statsmodels 0.14.5+
  • Plotly 5.22.0+ (optional, for plotting)

๐Ÿ“ฆ Installation

Install TinyShift using pip:

pip install tinyshift

Development Installation

Clone and install from source:

git clone https://github.com/HeyLucasLeao/tinyshift.git
cd tinyshift
pip install -e .

๐Ÿ“– Quick Start

1. Categorical Data Drift Detection

TinyShift provides sklearn-compatible drift detectors that follow the familiar fit() and score() pattern:

import pandas as pd
from tinyshift.drift import CatDrift

# Load your data
df = pd.read_csv("data.csv")
reference_data = df[df["date"] < '2024-07-01']
analysis_data = df[df["date"] >= '2024-07-01'] 

# Initialize and fit the drift detector
detector = CatDrift(
    freq="D",                    # Daily frequency
    func="chebyshev",           # Distance metric
    drift_limit="auto",         # Automatic threshold detection
    method="expanding"          # Comparison method
)

# Fit on reference data
detector.fit(reference_data)

# Score new data for drift
drift_scores = detector.predict(analysis_data)
print(drift_scores)

Available distance metrics for categorical data:

  • "chebyshev": Maximum absolute difference between distributions
  • "jensenshannon": Jensen-Shannon divergence
  • "psi": Population Stability Index

2. Continuous Data Drift Detection

For numerical features, use the continuous drift detector:

from tinyshift.drift import ConDrift

# Initialize continuous drift detector
detector = ConDrift(
    freq="W",                   # Weekly frequency  
    func="ws",                  # Wasserstein distance
    drift_limit="auto",
    method="expanding"
)

# Fit and score
detector.fit(reference_data)
drift_scores = detector.score(analysis_data)

3. Outlier Detection

TinyShift includes sklearn-compatible outlier detection algorithms:

from tinyshift.outlier import SPAD, HBOS, PCAReconstructionError

# SPAD (Simple Probabilistic Anomaly Detector)
spad = SPAD(plus=True)
spad.fit(X_train)

outlier_scores = spad.decision_function(X_test)
outlier_labels = spad.predict(X_test)

# HBOS (Histogram-Based Outlier Score)
hbos = HBOS(dynamic_bins=True)
hbos.fit(X_train, nbins="fd")
scores = hbos.decision_function(X_test)

# PCA-based outlier detection
pca_detector = PCAReconstructionError()
pca_detector.fit(X_train)
pca_scores = pca_detector.decision_function(X_test)

4. Time Series Analysis and Diagnostics

TinyShift provides time series analysis capabilities:

from tinyshift.plot import seasonal_decompose
from tinyshift.series import trend_significance, permutation_auto_mutual_information

# Seasonal decomposition with multiple periods
seasonal_decompose(
    time_series, 
    periods=[7, 365],  # Weekly and yearly patterns
    width=1200, 
    height=800
)

# Test for significant trends
trend_result = trend_significance(time_series, alpha=0.05)
print(f"Significant trend: {trend_result}")

# Stationary Analysis
fig = stationarity_analysis(time_series)

5. Advanced Modeling Tools

from tinyshift.modelling import filter_features_by_vif
from tinyshift.stats import bootstrap_bca_interval

# Detect multicollinearity
mask = filter_features_by_vif(X, trehshold=5, verbose=True)
X.columns[mask]

# Bootstrap confidence intervals
confidence_interval = bootstrap_bca_interval(
    data, 
    statistic=np.mean, 
    alpha=0.05, 
    n_bootstrap=1000
)

๐Ÿ“ Project Structure

tinyshift/
โ”œโ”€โ”€ association_mining/          # Market basket analysis tools
โ”‚   โ”œโ”€โ”€ analyzer.py             # Transaction pattern analysis
โ”‚   โ””โ”€โ”€ encoder.py              # Data encoder
โ”œโ”€โ”€ drift/                      # Data drift detection 
โ”‚   โ”œโ”€โ”€ base.py                 # Base drift detection classes  
โ”‚   โ”œโ”€โ”€ categorical.py          # CatDrift for categorical features
โ”‚   โ””โ”€โ”€ continuous.py           # ConDrift for numerical features
โ”œโ”€โ”€ examples/                   # Jupyter notebook examples
โ”‚   โ”œโ”€โ”€ drift.ipynb            # Drift detection examples
โ”‚   โ”œโ”€โ”€ outlier.ipynb          # Outlier detection demos
โ”‚   โ”œโ”€โ”€ series.ipynb           # Time series analysis
โ”‚   โ””โ”€โ”€ transaction_analyzer.ipynb
โ”œโ”€โ”€ modelling/                  # ML modeling utilities
โ”‚   โ”œโ”€โ”€ multicollinearity.py   # VIF-based multicollinearity detection
โ”‚   โ”œโ”€โ”€ residualizer.py        # Residualizer Feature
โ”‚   โ””โ”€โ”€ scaler.py              # Custom scaling transformations
โ”œโ”€โ”€ outlier/                    # Outlier detection algorithms
โ”‚   โ”œโ”€โ”€ base.py                 # Base outlier detection classes
โ”‚   โ”œโ”€โ”€ hbos.py                 # Histogram-Based Outlier Score
โ”‚   โ”œโ”€โ”€ pca.py                  # PCA-based outlier detection  
โ”‚   โ””โ”€โ”€ spad.py                 # Simple Probabilistic Anomaly Detector
โ”œโ”€โ”€ plot/                       # Visualization capabilities  
โ”‚   โ”œโ”€โ”€ correlation.py          # Correlation analysis plots
โ”‚   โ””โ”€โ”€ diagnostic.py           # Time series diagnostics plots
โ”œโ”€โ”€ series/                     # Time series analysis tools
โ”‚   โ”œโ”€โ”€ forecastability.py     # Forecast quality metrics
โ”‚   โ”œโ”€โ”€ outlier.py             # Time series outlier detection
โ”‚   โ””โ”€โ”€ stats.py               # Statistical analysis functions
โ””โ”€โ”€ stats/                      # Statistical utilities
    โ”œโ”€โ”€ bootstrap_bca.py        # Bootstrap confidence intervals
    โ”œโ”€โ”€ statistical_interval.py # Statistical interval estimation
    โ””โ”€โ”€ utils.py               # General statistical utilities
tinyshift
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ poetry.lock
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ tinyshift
โ”‚ย ย  โ”œโ”€โ”€ association_mining
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ analyzer.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ encoder.py
โ”‚ย ย  โ”œโ”€โ”€ examples
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ outlier.ipynb
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ tracker.ipynb
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ transaction_analyzer.ipynb
โ”‚ย ย  โ”œโ”€โ”€ modelling
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ multicollinearity.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ residualizer.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ scaler.py
โ”‚ย ย  โ”œโ”€โ”€ outlier
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ base.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ hbos.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pca.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ spad.py
โ”‚ย ย  โ”œโ”€โ”€ plot
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ correlation.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ plot.py
โ”‚ย ย  โ”œโ”€โ”€ series
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ README.md
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ forecastability.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ outlier.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ stats.py
โ”‚ย ย  โ”œโ”€โ”€ stats
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ bootstrap_bca.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ series.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ statistical_interval.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ utils.py
โ”‚ย ย  โ”œโ”€โ”€ tests
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ test.pca.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ test_hbos.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ test_spad.py
โ”‚ย ย  โ””โ”€โ”€ drift
โ”‚ย ย      โ”œโ”€โ”€ __init__.py
โ”‚ย ย      โ”œโ”€โ”€ base.py
โ”‚ย ย      โ”œโ”€โ”€ categorical.py
โ”‚ย ย      โ”œโ”€โ”€ continuous.py

Development Setup

git clone https://github.com/HeyLucasLeao/tinyshift.git
cd tinyshift
pip install -e ".[all]"

๐Ÿ“‹ Requirements

  • Python: 3.10+
  • Core Dependencies:
    • pandas (>2.3.0)
    • scikit-learn (>1.3.0)
    • statsmodels (>=0.14.5)
  • Optional Dependencies:
    • plotly (>5.22.0) - for visualization
    • kaleido (<=0.2.1) - for static plot export
    • nbformat (>=5.10.4) - for notebook support

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyshift-1.1.0.tar.gz (51.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinyshift-1.1.0-py3-none-any.whl (66.1 kB view details)

Uploaded Python 3

File details

Details for the file tinyshift-1.1.0.tar.gz.

File metadata

  • Download URL: tinyshift-1.1.0.tar.gz
  • Upload date:
  • Size: 51.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tinyshift-1.1.0.tar.gz
Algorithm Hash digest
SHA256 62092a6800234e941bad146e44599c1157fb6bac05c6237f8170753a8175cf68
MD5 a2989c060c2888f3c11a4acfbfbc7445
BLAKE2b-256 312e85daf7823806a35e330b9ee40bdda02790c7ba9ee778527d7ba42ca028fb

See more details on using hashes here.

File details

Details for the file tinyshift-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: tinyshift-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tinyshift-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3d2f845e6bbe7a1d783be5ed84c3f2d8b5d231b416a34461789cec5edabd5310
MD5 5b75504ec0bfaaa7256819073a8bc917
BLAKE2b-256 7795bfd4c0facb0ff450748d5985ff023f49f7223e8c3341cd62f283b17d7c88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page