A small toolbox for mlops
Project description
TinyShift
TinyShift is a lightweight, sklearn-compatible Python library designed for data drift detection, outlier identification, and MLOps monitoring in production machine learning systems. The library provides modular, easy-to-use tools for detecting when data distributions or model performance change over time, with comprehensive visualization capabilities.
For enterprise-grade solutions, consider Nannyml.
Features
- Data Drift Detection: Categorical and continuous data drift monitoring with multiple distance metrics
- Outlier Detection: HBOS, PCA-based and SPAD outlier detection algorithms
- Time Series Analysis: Seasonality decomposition, trend analysis, forecasting diagnostics, and forecast stabilization
- Forecast Stability: Metrics and interpolation methods for stable forecasting
Technologies Used
- Python 3.10+
- Scikit-learn 1.3.0+
- Pandas 2.3.0+
- NumPy
- SciPy
- Statsmodels 0.14.5+
- Plotly 5.22.0+ (optional, for plotting)
๐ฆ Installation
Install TinyShift using pip:
pip install tinyshift
Development Installation
Clone and install from source:
git clone https://github.com/HeyLucasLeao/tinyshift.git
cd tinyshift
pip install -e .
๐ Quick Start
1. Categorical Data Drift Detection
TinyShift provides sklearn-compatible drift detectors that follow the familiar fit() and score() pattern:
import pandas as pd
from tinyshift.drift import CatDrift
# Load your data
df = pd.read_csv("data.csv")
reference_data = df[df["date"] < '2024-07-01']
analysis_data = df[df["date"] >= '2024-07-01']
# Initialize and fit the drift detector
detector = CatDrift(
freq="D", # Daily frequency
func="chebyshev", # Distance metric
drift_limit="auto", # Automatic threshold detection
method="expanding" # Comparison method
)
# Fit on reference data
detector.fit(reference_data)
# Score new data for drift
drift_scores = detector.predict(analysis_data)
print(drift_scores)
Available distance metrics for categorical data:
"chebyshev": Maximum absolute difference between distributions"jensenshannon": Jensen-Shannon divergence"psi": Population Stability Index
2. Continuous Data Drift Detection
For numerical features, use the continuous drift detector:
from tinyshift.drift import ConDrift
# Initialize continuous drift detector
detector = ConDrift(
freq="W", # Weekly frequency
func="ws", # Wasserstein distance
drift_limit="auto",
method="expanding"
)
# Fit and score
detector.fit(reference_data)
drift_predicts = detector.predict(analysis_data)
3. Outlier Detection
TinyShift includes sklearn-compatible outlier detection algorithms:
from tinyshift.outlier import SPAD, HBOS, PCAReconstructionError
# SPAD (Simple Probabilistic Anomaly Detector)
spad = SPAD(plus=True)
spad.fit(X_train)
outlier_scores = spad.decision_function(X_test)
outlier_labels = spad.predict(X_test)
# HBOS (Histogram-Based Outlier Score)
hbos = HBOS(dynamic_bins=True)
hbos.fit(X_train, nbins="fd")
scores = hbos.predict(X_test)
# PCA-based outlier detection
pca_detector = PCAReconstructionError()
pca_detector.fit(X_train)
pca_scores = pca_detector.predict(X_test)
4. Time Series Analysis and Diagnostics
TinyShift provides comprehensive time series analysis capabilities:
from tinyshift.plot import seasonal_decompose
from tinyshift.series import (
trend_significance,
foreca,
sample_entropy,
permutation_entropy,
theoretical_limit,
hurst_exponent,
hampel_filter,
bollinger_bands
)
seasonal_decompose(
time_series,
periods=[7, 365], # Weekly and yearly patterns
width=1200,
height=800
)
# Test for significant trends
r_squared, p_value = trend_significance(time_series)
# Assess forecastability
forecastability = foreca(time_series)
print(f"Forecastability (Omega): {forecastability}")
# Measure complexity and regularity
complexity = sample_entropy(time_series, m=2, tolerance=0.2)
print(f"Sample Entropy: {complexity}")
# Measure ordinal complexity
perm_entropy = permutation_entropy(time_series, m=3, delay=1, normalize=True)
print(f"Permutation Entropy: {perm_entropy}")
# Calculate theoretical predictability limit
theo_limit = theoretical_limit(time_series, m=3, delay=1)
print(f"Theoretical Limit (ฮ max): {theo_limit}")
# Detect long-term memory
hurst, p_value = hurst_exponent(time_series)
print(f"Hurst Exponent: {hurst}, P-value: {p_value}")
# Outlier detection in time series
outliers = hampel_filter(time_series, window_size=5)
outliers = bollinger_bands(time_series, window_size=20)
# Plot lag analysis with PAMI (Permutation Auto-Mutual Information)
from tinyshift.plot import pami
pami(time_series, nlags=20, m=3, delay=1, normalize=False)
5. Forecast Stability and Interpolation
TinyShift includes forecast stability metrics and interpolation methods:
from tinyshift.series import (
macv, mach, # Mean Absolute Change metrics
mascv, masch, # Mean Absolute Scaled Change metrics
rmsscv, rmssch, # Root Mean Squared Scaled Change metrics
vi, hpi, hfi # Interpolation methods
)
# Calculate forecast stability metrics
vertical_stability = macv(y_hat, y_hat_t_minus_1)
horizontal_stability = mach(y_hat)
# Scaled stability metrics
scaled_v_stability = mascv(y_train, y_hat, y_hat_t_minus_1, seasonality=12)
scaled_h_stability = masch(y_train, y_hat, seasonality=12)
# Apply forecast stabilization techniques
# Vertical Interpolation
stable_forecast = vi(y_hat, anchor, w_s=0.3)
# Horizontal Partial Interpolation
smooth_forecast = hpi(y_hat, w_s=0.4)
# Horizontal Full Interpolation
fully_stable_forecast = hfi(y_hat, w_s=0.5)
6. Advanced Modeling Tools
from tinyshift.modelling import filter_features_by_vif
from tinyshift.stats import bootstrap_bca_interval
#Residualizer
residualizer = FeatureResidualizer()
residualizer.fit(X_train[preprocess_columns], corrcoef=0.70)
#Train
X_train = X_train.astype({x: float for x in preprocess_columns})
X_train.loc[:, preprocess_columns] = residualizer.transform(X_train[preprocess_columns])
# Detect multicollinearity
mask = filter_features_by_vif(X_train, trehshold=5, verbose=True)
X_train.columns = X_train.columns[mask]
X_test.columns = X_test.columns[mask]
#Test
X_test = X_test.astype({x: float for x in preprocess_columns})
X_test.loc[:, preprocess_columns] = residualizer.transform(X_test[preprocess_columns])
# Bootstrap confidence intervals
confidence_interval = bootstrap_bca_interval(
data,
statistic=np.mean,
alpha=0.05,
n_bootstrap=1000
)
๐ Project Structure
tinyshift/
โโโ association_mining/ # Market basket analysis tools
โ โโโ README.md # Module documentation
โ โโโ analyzer.py # Transaction pattern analysis
โ โโโ encoder.py # Data encoder
โโโ drift/ # Data drift detection
โ โโโ README.md # Module documentation
โ โโโ base.py # Base drift detection classes
โ โโโ categorical.py # CatDrift for categorical features
โ โโโ continuous.py # ConDrift for numerical features
โโโ examples/ # Jupyter notebook examples
โ โโโ decomp_mstl_ml.ipynb # MSTL decomposition and ML examples
โ โโโ drift.ipynb # Drift detection examples
โ โโโ outlier.ipynb # Outlier detection demos
โ โโโ series.ipynb # Time series analysis
โ โโโ transaction_analyzer.ipynb # Transaction analysis examples
โ โโโ ts_diagnostics.ipynb # Time series diagnostics
โโโ modelling/ # ML modeling utilities
โ โโโ README.md # Module documentation
โ โโโ multicollinearity.py # VIF-based multicollinearity detection
โ โโโ residualizer.py # Residualizer Feature
โ โโโ scaler.py # Custom scaling transformations
โโโ outlier/ # Outlier detection algorithms
โ โโโ README.md # Module documentation
โ โโโ base.py # Base outlier detection classes
โ โโโ hbos.py # Histogram-Based Outlier Score
โ โโโ pca.py # PCA-based outlier detection
โ โโโ spad.py # Simple Probabilistic Anomaly Detector
โโโ plot/ # Visualization capabilities
โ โโโ README.md # Module documentation
โ โโโ correlation.py # Correlation analysis plots
โ โโโ diagnostic.py # Time series diagnostics plots
โโโ series/ # Time series analysis tools
โ โโโ README.md # Module documentation
โ โโโ forecastability.py # Forecast quality and complexity metrics
โ โโโ interpolation.py # Forecast stabilization methods
โ โโโ outlier.py # Time series outlier detection
โ โโโ stability.py # Forecast stability metrics
โ โโโ stats.py # Statistical analysis functions
โโโ stats/ # Statistical utilities
โโโ bootstrap_bca.py # Bootstrap confidence intervals
โโโ statistical_interval.py # Statistical interval estimation
โโโ utils.py # General statistical utilities
Development Setup
git clone https://github.com/HeyLucasLeao/tinyshift.git
cd tinyshift
pip install -e ".[all]"
๐ Requirements
- Python: 3.10+
- Core Dependencies:
- pandas (>2.3.0)
- scikit-learn (>1.3.0)
- statsmodels (>=0.14.5)
- Optional Dependencies:
- plotly (>5.22.0) - for visualization
- kaleido (<=0.2.1) - for static plot export
- nbformat (>=5.10.4) - for notebook support
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Inspired by Nannyml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinyshift-1.2.0.tar.gz.
File metadata
- Download URL: tinyshift-1.2.0.tar.gz
- Upload date:
- Size: 55.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e703760fe265ea3bd445de937b753e879b75835227ed60e433f8fa22d7729b67
|
|
| MD5 |
173feb2b04804ce16b31e27366f6eefc
|
|
| BLAKE2b-256 |
82509085f76f27a657b97e6b51498e3cab74ece2829c86583a397fe1b0307f92
|
File details
Details for the file tinyshift-1.2.0-py3-none-any.whl.
File metadata
- Download URL: tinyshift-1.2.0-py3-none-any.whl
- Upload date:
- Size: 70.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ecd19cc479a80743c5dfed9f7961d10a530481d42d26186cb7928572c879760
|
|
| MD5 |
f5101ba3bafd899232658dae85c6d1c8
|
|
| BLAKE2b-256 |
e3536f0855fee303f80e3112da120c49c317f40863fd39b1b2a4787d6c8369f1
|