This packages enables a quick creation of a report comparing quality of several ML models
Project description
Model Quality Report
This packages enables a quick creation of a model quality report, which is returned
as a dict
.
Main ingredients are a data splitter creating test and training data according various rules and the quality report itself. The quality report takes care of the splitting, fitting, predicting and finally deriving quality metrics.
Documentation
The official documentation is hosted on ReadTheDocs: https://model-quality-report.readthedocs.io
Installing the package
Latest available code:
pip install model_quality_report
Specific version:
pip install model_quality_report==X.Y.Z
Quickstart
- The
RandomDataSplitter
splits data randomly usingsklearn.model_selection.train_test_split
:
X = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']})
y = pd.Series(data=range(5))
splitter = RandomDataSplitter(test_size=0.33, random_state=2)
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
TimeDeltaDataSplitter
divides such that data from last period of lengthtime_delta
is used as test data. Here apd.Timedelta
and the date column name is provided:
splitter = TimeDeltaDataSplitter(date_column_name='shipping_date', time_delta=pd.Timedelta(3, unit='h'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
SplitDateDataSplitter
splits such that data after a provided date are used as test data. Additionally, the name of the date column has to be provided:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
SortedDataSplitter
requires a column with sortable values. Data are divided such that the test data set encompasses last fractiontest_size
. Sorting can be in ascending and descending order.
splitter = SortedDataSplitter(sortable_column_name='shipping_date', test_size=0.2, ascending=True)
X_train, X_test, y_train, y_test = splitter.split(X, y)
- Using
RegressionQualityReport
class a quality report for a regression model can be created as following:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
model = sklearn.linear_model.LinearRegression()
quality_reporter = RegressionQualityReport(model, splitter)
report = quality_reporter.create_reports()
An exemplary report looks as follows:
{'metrics':
{'explained_variance_score': -6.018595041322246,
'mape': 0.3863636363636345,
'mean_absolute_error': 4.242424242424224,
'mean_squared_error': 29.426997245178825,
'median_absolute_error': 2.272727272727268,
'r2_score': -10.03512396694206},
'data':
{'true': {3: 10, 4: 12, 2: 8},
'predicted': {3: 12.272727272727268, 4: 20.999999999999964, 2: 6.545454545454561}}}
Note that the model
must have a model.fit
and a model.predict
function.
Available Features
Data Splitter
RandomDataSplitter
: splits randomly
TimeDeltaDataSplitter
: uses data in last period of length as test data
SplitDateDataSplitter
: uses data with timestamp newer than split date as test data
SortedDataSplitter
: sorts data along given column and takes last fraction of size x_test as test data
ByHorizon
: produces a list of splits of temporal data such that each consecutive train set has one more observation and test set one less
ByFrequency
: produces a list of splits of temporal data such that the data is split by a series of dates on a specified frequency
Quality Report
RegressionQualityReport
: creates a quality report for a regression model
Quality Metrics
RegressionQualityMetrics
: holds following functions:
- explained_variance_score
- mean_absolute_error
- mean_squared_error
- median_absolute_error
- r2_score
- mape
Developers should know
Create a virtual environment and activate it
python -m venv venv
source venv/bin/activate
Install the development packages
pip install -e .[dev]
and use pre-commit to make sure that your code is blackified automatically (used the black
package):
pre-commit install
Run tests:
pip install -e .[test]
coverage run -m unittest discover tests
coverage report
Build documentation (see more details here):
pip install -e .[doc]
mkdocs build
or use
mkdocs serve
if you prefer a live, self-refreshing, documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for model_quality_report-1.0.0rc15.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 936457b75e6952ccf07af65e212bab0ab99c265d9b5ff3d7979be9dba5cf3d07 |
|
MD5 | 9c50b55180c6c337091ed1838dc63674 |
|
BLAKE2b-256 | bc080c32a816b86a7080e6a436e80f4b758020651f7e2dc747a0fc0cecf1b2d1 |
Hashes for model_quality_report-1.0.0rc15-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac613dd8f2285fe67ea79eae15da43fa61e387a47acc0557d7b22dacd57397f2 |
|
MD5 | 95cd8c2b9583f4f9165fd76fe52e49da |
|
BLAKE2b-256 | 88ad331a53ce7f393a30a6e7530385f30ac793b635fdc1e5405c3286cd1f23ab |