Skip to main content

This packages enables a quick creation of a report comparing quality of several ML models

Project description

Documentation Status

Model Quality Report

This packages enables a quick creation of a model quality report, which is returned as a dict.

Main ingredients are a data splitter creating test and training data according various rules and the quality report itself. The quality report takes care of the splitting, fitting, predicting and finally deriving quality metrics.

Documentation

The official documentation is hosted on ReadTheDocs: https://model-quality-report.readthedocs.io

Installing the package

Latest available code:

pip install model_quality_report

Specific version:

pip install model_quality_report==X.Y.Z

Quickstart

  • The RandomDataSplitter splits data randomly using sklearn.model_selection.train_test_split:
X = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']})
y = pd.Series(data=range(5))

splitter = RandomDataSplitter(test_size=0.33, random_state=2)
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The TimeDeltaDataSplitter divides such that data from last period of length time_delta is used as test data. Here a pd.Timedelta and the date column name is provided:
splitter = TimeDeltaDataSplitter(date_column_name='shipping_date', time_delta=pd.Timedelta(3, unit='h')) 
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The SplitDateDataSplitter splits such that data after a provided date are used as test data. Additionally, the name of the date column has to be provided:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The SortedDataSplitter requires a column with sortable values. Data are divided such that the test data set encompasses last fraction test_size. Sorting can be in ascending and descending order.
splitter = SortedDataSplitter(sortable_column_name='shipping_date', test_size=0.2, ascending=True)
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • Using RegressionQualityReport class a quality report for a regression model can be created as following:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
model = sklearn.linear_model.LinearRegression()
quality_reporter = RegressionQualityReport(model, splitter)
report = quality_reporter.create_reports()

An exemplary report looks as follows:

{'metrics': 
    {'explained_variance_score': -6.018595041322246, 
     'mape': 0.3863636363636345, 
     'mean_absolute_error': 4.242424242424224, 
     'mean_squared_error': 29.426997245178825, 
     'median_absolute_error': 2.272727272727268, 
     'r2_score': -10.03512396694206}, 
 'data': 
    {'true': {3: 10, 4: 12, 2: 8}, 
     'predicted': {3: 12.272727272727268, 4: 20.999999999999964, 2: 6.545454545454561}}}  

Note that the model must have a model.fit and a model.predict function.

Available Features

Data Splitter

RandomDataSplitter: splits randomly TimeDeltaDataSplitter: uses data in last period of length as test data SplitDateDataSplitter: uses data with timestamp newer than split date as test data SortedDataSplitter: sorts data along given column and takes last fraction of size x_test as test data ByHorizon: produces a list of splits of temporal data such that each consecutive train set has one more observation and test set one less ByFrequency: produces a list of splits of temporal data such that the data is split by a series of dates on a specified frequency

Quality Report

RegressionQualityReport: creates a quality report for a regression model

Quality Metrics

RegressionQualityMetrics: holds following functions:

  • explained_variance_score
  • mean_absolute_error
  • mean_squared_error
  • median_absolute_error
  • r2_score
  • mape

Developers should know

Create a virtual environment and activate it

python -m venv venv
source venv/bin/activate

Install the development packages

pip install -e .[dev]

and use pre-commit to make sure that your code is blackified automatically (used the black package):

pre-commit install

Run tests:

pip install -e .[test]
coverage run -m unittest discover tests
coverage report

Build documentation (see more details here):

pip install -e .[doc]
mkdocs build

or use

mkdocs serve

if you prefer a live, self-refreshing, documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_quality_report-1.0.0rc18.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

model_quality_report-1.0.0rc18-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file model_quality_report-1.0.0rc18.tar.gz.

File metadata

  • Download URL: model_quality_report-1.0.0rc18.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for model_quality_report-1.0.0rc18.tar.gz
Algorithm Hash digest
SHA256 2ddf42c99866c95aa93ccda28f949cf5c23f256d7b5c9c1f3a35fa300de3ab2d
MD5 39cba96e37b23232a69a46bc1a776145
BLAKE2b-256 5873afd744b4f3c2244ed46077b62fefe54b15bb3de2f87240cda30e224f68aa

See more details on using hashes here.

File details

Details for the file model_quality_report-1.0.0rc18-py3-none-any.whl.

File metadata

  • Download URL: model_quality_report-1.0.0rc18-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for model_quality_report-1.0.0rc18-py3-none-any.whl
Algorithm Hash digest
SHA256 91207b220b6554d8e72b19e329903548b29b55171e7d52dd71adf1f3b6c286de
MD5 5e6d4654557b6d31dbbffbed5ca4f88b
BLAKE2b-256 9b34091c4fe9b92ee31beb1d52b878dee57b6568fc167d6537b065cb9c20d7a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page