scikit-learn instrumentation tooling
Project description
Generalized instrumentation tooling for scikit-learn models. sklearn_instrumentation allows instrumenting the sklearn package and any scikit-learn compatible packages with estimators and transformers inheriting from sklearn.base.BaseEstimator.
Instrumentation applies decorators to methods of BaseEstimator-derived classes or instances. By default the instrumentor applies instrumentation to the following methods (except when they are properties of instances):
fit
predict
predict_log_proba
predict_proba
transform
_fit
_predict
_predict_log_proba
_predict_proba
_transform
sklearn-instrumentation supports instrumentation of full sklearn-compatible packages, as well as recursive instrumentation of models (metaestimators like Pipeline, or even single estimators like RandomForestClassifier)
Installation
The sklearn-instrumentation package is available on pypi and can be installed using pip
pip install sklearn-instrumentation
Package instrumentation
Instrument any sklearn compatible package that has BaseEstimator-derived classes.
from sklearn_instrumentation import SklearnInstrumentor
instrumentor = SklearnInstrumentor(instrument=my_instrument)
instrumentor.instrument_packages(["sklearn", "xgboost", "lightgbm"])
Full example:
import logging
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn_instrumentation import SklearnInstrumentor
from sklearn_instrumentation.instruments.logging import TimeElapsedLogger
logging.basicConfig(level=logging.INFO)
# Create an instrumentor and instrument sklearn
instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())
instrumentor.instrument_packages(["sklearn"])
# Create a toy model for classification
ss = StandardScaler()
pca = PCA(n_components=3)
rf = RandomForestClassifier()
classification_model = Pipeline(
steps=[
(
"fu",
FeatureUnion(
transformer_list=[
("ss", ss),
("pca", pca),
]
),
),
("rf", rf),
]
)
X, y = load_iris(return_X_y=True)
# Observe logging
classification_model.fit(X, y)
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006406307220458984 seconds
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001430511474609375 seconds
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0006711483001708984 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.0026731491088867188 seconds
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.1768970489501953 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.17983102798461914 seconds
# Observe logging
classification_model.predict(X)
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00024509429931640625 seconds
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0002181529998779297 seconds
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0012080669403076172 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.013531208038330078 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.013692140579223633 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.015219926834106445 seconds
# Remove instrumentation
instrumentor.uninstrument_packages(["sklearn"])
# Observe no logging
classification_model.predict(X)
Machine learning model instrumentation
Instrument any sklearn compatible trained estimator or metaestimator.
from sklearn_instrumentation import SklearnInstrumentor
instrumentor = SklearnInstrumentor(instrument=my_instrument)
instrumentor.instrument_estimator(estimator=my_ml_pipeline)
Example:
import logging
from sklearn.datasets import load_iris
from sklearn_instrumentation import SklearnInstrumentor
from sklearn_instrumentation.instruments.logging import TimeElapsedLogger
from sklearn.ensemble import RandomForestClassifier
logging.basicConfig(level=logging.INFO)
# Train a classifier
X, y = load_iris(return_X_y=True)
rf = RandomForestClassifier()
rf.fit(X, y)
# Create an instrumentor which decorates BaseEstimator methods with
# logging output when entering and exiting methods, with time elapsed logged
# on exit.
instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())
# Apply the decorator to all BaseEstimators in each of these libraries
instrumentor.instrument_estimator(rf)
# Observe the logging output
rf.predict(X)
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.014165163040161133 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.014327764511108398 seconds
# Remove the decorator from all BaseEstimators in each of these libraries
instrumentor.uninstrument_estimator(rf)
# No more logging
rf.predict(X)
Instrumentation
The package comes with a handful of instruments which log information about X or timing of execution. You can create your own instrumentation just by creating a decorator, following this pattern
from functools import wraps
def my_instrumentation(func, **dkwargs):
"""Wrap an estimator method with instrumentation.
:param func: The method to be instrumented.
:param dkwargs: Decorator kwargs, which can be passed to the
decorator at decoration time. For estimator instrumentation
this allows different parametrizations for each ml model.
"""
@wraps(func)
def wrapper(*args, **kwargs):
"""Wrapping function.
:param args: The args passed to methods, typically
just ``X`` and/or ``y``
:param kwargs: The kwargs passed to methods, usually
weights or other params
"""
# Code goes here before execution of the estimator method
retval = func(*args, **kwargs)
# Code goes here after execution of the estimator method
return retval
return wrapper
To create a stateful instrument, use a class with the __call__ method for implementing the decorator:
from functools import wraps
from sklearn_instrumentation.instruments.base import BaseInstrument
class MyInstrument(BaseInstrument)
def __init__(self, *args, **kwargs):
# handle any statefulness here
pass
def __call__(self, func, **dkwargs):
"""Wrap an estimator method with instrumentation.
:param func: The method to be instrumented.
:param dkwargs: Decorator kwargs, which can be passed to the
decorator at decoration time. For estimator instrumentation
this allows different parametrizations for each ml model.
"""
@wraps(func)
def wrapper(*args, **kwargs):
"""Wrapping function.
:param args: The args passed to methods, typically
just ``X`` and/or ``y``
:param kwargs: The kwargs passed to methods, usually
weights or other params
"""
# Code goes here before execution of the estimator method
retval = func(*args, **kwargs)
# Code goes here after execution of the estimator method
return retval
return wrapper
To pass kwargs for different ml models:
instrumentor = SklearnInstrumentor(instrument=my_instrument)
instrumentor.instrument_estimator(estimator=ml_model_1, instrument_kwargs={"name": "awesome_model"})
instrumentor.instrument_estimator(estimator=ml_model_2, instrument_kwargs={"name": "better_model"})
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sklearn-instrumentation-0.6.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84890385022d18e6dce7d1219ebeef6843aaeba787b200da9f950b343864f045 |
|
MD5 | 30cf1ecd1f3b63339a01410b3b50f7ab |
|
BLAKE2b-256 | c03ca697b7cc07d50b21e2dda4284643befb5a9c0780851fa77a23a3138b8974 |
Hashes for sklearn_instrumentation-0.6.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8256fe315ac010738ee662ca70520aa60438179ebf04e6ebe8a6e0fb3a07811 |
|
MD5 | 0fa401de6b52ec6daae5bc7377fb4da0 |
|
BLAKE2b-256 | f77b2edc2d94ac98bb28370d72c841e7b494b9a3848f5d95f675bdbf27c861d9 |