A small toolbox for mlops
Project description
TinyShift
TinyShift is a small experimental Python library designed to detect data drifts and performance drops in machine learning models over time. The main goal of the project is to provide quick and tiny monitoring tools to help identify when data or model performance unexpectedly change. For more robust solutions, I highly recommend Nannyml.
Technologies Used
- Python 3.x
- Scikit-learn
- Pandas
- NumPy
- Plotly
- Scipy
Installation
To install TinyShift in your development environment, use pip:
pip install tinyshift
If you prefer to clone the repository and install manually:
git clone https://github.com/HeyLucasLeao/tinyshift.git
cd tinyshift
pip install .
Note: If you want to enable plotting capabilities, you need to install the extras using UV:
uv install --all-extras
Usage
Below are basic examples of how to use TinyShift's features.
1. Data Drift Detection
To detect data drift, simply score in a new dataset to compare with the reference data. The DataDriftDetector will calculate metrics to identify significant differences.
from tinyshift.detector import CategoricalDriftDetector
df = pd.DataFrame("examples.csv")
df_reference = df[(df["datetime"] < '2024-07-01')].copy()
df_analysis = df[(df["datetime"] >= '2024-07-01')].copy()
detector = CategoricalDriftTracker(df_reference, 'discrete_1', "datetime", "W", drift_limit='mad')
analysis_score = detector.score(df_analysis, "discrete_1", "datetime")
print(analysis_score)
2. Performance Tracker
To track model performance over time, use the PerformanceMonitor, which will compare model accuracy on both old and new data.
from tinyshift.tracker import PerformanceTracker
df_reference = pd.read_csv('refence.csv')
df_analysis = pd.read_csv('analysis.csv')
model = load_model('model.pkl')
df_analysis['prediction'] = model.predict(df_analysis["feature_0"])
tracker = PerformanceTracker(df_reference, 'target', 'prediction', 'datetime', "W")
analysis_score = tracker.score(df_analysis, 'target', 'prediction', 'datetime')
print(analysis_score)
3. Visualization
TinyShift also provides graphs to visualize the magnitude of drift and performance changes over time.
tracker.plot.scatter(analysis_score, fig_type="png")
tracker.plot.bar(analysis_score, fig_type="png")
4. Outlier Detection
To detect outliers in your dataset, you can use the models provided by TinyShift. Currently, it offers the Histogram-Based Outlier Score (HBOS), Simple Probabilistic Anomaly Detector (SPAD), and SPAD+.
from tinyshift.outlier import SPAD
df = pd.read_csv('data.csv')
spad_plus = SPAD(plus=True)
spad_plus.fit(df)
anomaly_scores = spad_plus.decision_function(df)
anomaly_pred = spad_plus.predict(df)
print(anomaly_scores)
print(anomaly_pred)
5. Anomaly Tracker
The Anomaly Tracker in TinyShift allows you to identify potential outliers based on the drift limit and anomaly scores generated during training. By setting a drift limit, the tracker can flag data points that exceed this threshold as possible outliers.
from tinyshift.tracker import AnomalyTracker
model = load_model('model.pkl')
tracker = AnomalyTracker(model, drift_limit='mad')
df_analysis = pd.read_csv('analysis.csv')
outliers = tracker.score(df_analysis)
print(outliers)
In this example, the AnomalyTracker is initialized with a reference model and a specified drift limit. The score method evaluates the analysis dataset, calculating anomaly scores and flagging data points that exceed the drift limit as potential outliers.
Project Structure
The basic structure of the project is as follows:
tinyshift
├── LICENSE
├── README.md
├── poetry.lock
├── pyproject.toml
├── tinyshift
│ ├── association_mining
│ │ ├── README.md
│ │ ├── __init__.py
│ │ ├── analyzer.py
│ │ └── encoder.py
│ ├── examples
│ │ ├── outlier.ipynb
│ │ ├── tracker.ipynb
│ │ └── transaction_analyzer.ipynb
│ ├── modelling
│ │ ├── __init__.py
│ │ ├── multicollinearity.py
│ │ ├── residualizer.py
│ │ └── scaler.py
│ ├── outlier
│ │ ├── README.md
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── hbos.py
│ │ ├── pca.py
│ │ └── spad.py
│ ├── plot
│ │ ├── __init__.py
│ │ ├── correlation.py
│ │ └── plot.py
│ ├── stats
│ │ ├── __init__.py
│ │ ├── bootstrap_bca.py
│ │ ├── series.py
│ │ ├── statistical_interval.py
│ │ └── utils.py
│ ├── tests
│ │ ├── test.pca.py
│ │ ├── test_hbos.py
│ │ └── test_spad.py
│ └── tracker
│ ├── __init__.py
│ ├── anomaly.py
│ ├── base.py
│ ├── categorical.py
│ ├── continuous.py
│ └── performance.py
License
This project is licensed under the MIT License - see the LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinyshift-0.5.2.tar.gz.
File metadata
- Download URL: tinyshift-0.5.2.tar.gz
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
312f94a148d3286620ee84842848bf4f275521ded4160a6603f35ac38644e31b
|
|
| MD5 |
0247615f4f71dc242fba5a9d223b4f2a
|
|
| BLAKE2b-256 |
3a1893d5b491193e23138d5f8ce7a4ea7f3f787a582262b0f831a73b9a135cae
|
File details
Details for the file tinyshift-0.5.2-py3-none-any.whl.
File metadata
- Download URL: tinyshift-0.5.2-py3-none-any.whl
- Upload date:
- Size: 49.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbcfddb9d0668bb62bb3cbe1e0eaa11ce780a3d3c5354689796f78859558c5d0
|
|
| MD5 |
2762ee142ec4c884a0411d4e1a049a16
|
|
| BLAKE2b-256 |
a2322d73cb4cf50c75e0c953b552d8b1a83cc3b9f93cb0375b5df7cc2c43a2cc
|