Skip to main content

This packages enables a quick creation of a report comparing quality of several ML models

Project description

Model Quality Report

This packages enables a quick creation of a model quality report, which is returned as a dict.

Main ingredients are a data splitter creating test and training data according various rules and the quality report itself. The quality report takes care of the splitting, fitting, predicting and finally deriving quality metrics.

Installing the package

Latest available code:

pip install model_quality_report

Specific version:

pip install model_quality_report==X.Y.Z

Quickstart

  • The RandomDataSplitter splits data randomly using sklearn.model_selection.train_test_split:
X = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']})
y = pd.Series(data=range(5))

splitter = RandomDataSplitter(test_size=0.33, random_state=2)
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The TimeDeltaDataSplitter divides such that data from last period of length time_delta is used as test data. Here a pd.Timedelta and the date column name is provided:
splitter = TimeDeltaDataSplitter(date_column_name='shipping_date', time_delta=pd.Timedelta(3, unit='h')) 
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The SplitDateDataSplitter splits such that data after a provided date are used as test data. Additionally, the name of the date column has to be provided:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The SortedDataSplitter requires a column with sortable values. Data are divided such that the test data set encompasses last fraction test_size. Sorting can be in ascending and descending order.
splitter = SortedDataSplitter(sortable_column_name='shipping_date', test_size=0.2, ascending=True)
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • Using RegressionQualityReport class a quality report for a regression model can be created as following:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
model = sklearn.linear_model.LinearRegression()
quality_reporter = RegressionQualityReport(model, splitter)
report = quality_reporter.create_quality_report_and_return_dict(X, y)

An exemplary report looks as follows:

{'metrics': 
    {'explained_variance_score': -6.018595041322246, 
     'mape': 0.3863636363636345, 
     'mean_absolute_error': 4.242424242424224, 
     'mean_squared_error': 29.426997245178825, 
     'median_absolute_error': 2.272727272727268, 
     'r2_score': -10.03512396694206}, 
 'data': 
    {'true': {3: 10, 4: 12, 2: 8}, 
     'predicted': {3: 12.272727272727268, 4: 20.999999999999964, 2: 6.545454545454561}}}  

Note that the model must have a model.fit and a model.predict function.

Available Features

Data Splitter

RandomDataSplitter: splits randomly TimeDeltaDataSplitter: uses data in last period of length as test data SplitDateDataSplitter: uses data with timestamp newer than split date as test data SortedDataSplitter: sorts data along given column and takes last fraction of size x_test as test data ByHorizon: produces a list of splits of temporal data such that each consecutive train set has one more observation and test set one less ByFrequency: produces a list of splits of temporal data such that the data is split by a series of dates on a specified frequency

Quality Report

RegressionQualityReport: creates a quality report for a regression model

Quality Metrics

RegressionQualityMetrics: holds following functions:

  • explained_variance_score
  • mean_absolute_error
  • mean_squared_error
  • median_absolute_error
  • r2_score
  • mape

Developers should know

Create a virtual environment and activate it

python -m venv venv
source venv/bin/activate

Install the development packages

pip install -e .[dev]

and use pre-commit to make sure that your code is blackified automatically (used the black package):

pre-commit install

Run tests:

pip install -e .[test]
coverage run -m unittest discover tests
coverage report

Build documentation (see more details here):

pip install -e .[doc]
mkdocs build

or use

mkdocs serve

if you prefer a live, self-refreshing, documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_quality_report-1.0.0rc3.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

model_quality_report-1.0.0rc3-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file model_quality_report-1.0.0rc3.tar.gz.

File metadata

  • Download URL: model_quality_report-1.0.0rc3.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for model_quality_report-1.0.0rc3.tar.gz
Algorithm Hash digest
SHA256 cd0f3b5fd1617f483e9e3aa3c95a232b83e7b2d3a62ee00ac75fcd032e1ecaf3
MD5 e9ff6a88a447e3714a19008f63a79140
BLAKE2b-256 3cc5e68edd76e4440736f0393dbd696834181e1893b5cdf6f2bcc342778509f0

See more details on using hashes here.

File details

Details for the file model_quality_report-1.0.0rc3-py3-none-any.whl.

File metadata

  • Download URL: model_quality_report-1.0.0rc3-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for model_quality_report-1.0.0rc3-py3-none-any.whl
Algorithm Hash digest
SHA256 61a8b9ba4fd4fcf81e362818ec3e74ccb2ce593ba04b2d01ebd436285cd5e810
MD5 1ac9291d5f8e584ec2f3973315b8f19e
BLAKE2b-256 a9f0f7d40acaef96fc518b05deabd25caf57a565fcca3012bfae76a0b89be8ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page