Skip to main content

This packages enables a quick creation of a report comparing quality of several ML models

Project description

Documentation Status

Model Quality Report

This packages enables a quick creation of a model quality report, which is returned as a dict.

Main ingredients are a data splitter creating test and training data according various rules and the quality report itself. The quality report takes care of the splitting, fitting, predicting and finally deriving quality metrics.

Documentation

The official documentation is hosted on ReadTheDocs: https://model-quality-report.readthedocs.io

Installing the package

Latest available code:

pip install model_quality_report

Specific version:

pip install model_quality_report==X.Y.Z

Quickstart

  • The RandomDataSplitter splits data randomly using sklearn.model_selection.train_test_split:
X = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']})
y = pd.Series(data=range(5))

splitter = RandomDataSplitter(test_size=0.33, random_state=2)
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The TimeDeltaDataSplitter divides such that data from last period of length time_delta is used as test data. Here a pd.Timedelta and the date column name is provided:
splitter = TimeDeltaDataSplitter(date_column_name='shipping_date', time_delta=pd.Timedelta(3, unit='h')) 
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The SplitDateDataSplitter splits such that data after a provided date are used as test data. Additionally, the name of the date column has to be provided:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • The SortedDataSplitter requires a column with sortable values. Data are divided such that the test data set encompasses last fraction test_size. Sorting can be in ascending and descending order.
splitter = SortedDataSplitter(sortable_column_name='shipping_date', test_size=0.2, ascending=True)
X_train, X_test, y_train, y_test = splitter.split(X, y)
  • Using RegressionQualityReport class a quality report for a regression model can be created as following:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
model = sklearn.linear_model.LinearRegression()
quality_reporter = RegressionQualityReport(model, splitter)
report = quality_reporter.create_reports()

An exemplary report looks as follows:

{'metrics': 
    {'explained_variance_score': -6.018595041322246, 
     'mape': 0.3863636363636345, 
     'mean_absolute_error': 4.242424242424224, 
     'mean_squared_error': 29.426997245178825, 
     'median_absolute_error': 2.272727272727268, 
     'r2_score': -10.03512396694206}, 
 'data': 
    {'true': {3: 10, 4: 12, 2: 8}, 
     'predicted': {3: 12.272727272727268, 4: 20.999999999999964, 2: 6.545454545454561}}}  

Note that the model must have a model.fit and a model.predict function.

Available Features

Data Splitter

RandomDataSplitter: splits randomly TimeDeltaDataSplitter: uses data in last period of length as test data SplitDateDataSplitter: uses data with timestamp newer than split date as test data SortedDataSplitter: sorts data along given column and takes last fraction of size x_test as test data ByHorizon: produces a list of splits of temporal data such that each consecutive train set has one more observation and test set one less ByFrequency: produces a list of splits of temporal data such that the data is split by a series of dates on a specified frequency FixedDates: produces a list of splits of temporal data given a list of fixed dates.

Quality Report

RegressionQualityReport: creates a quality report for a regression model CrossValidationTimeSeriesQualityReport: creates a quality report for a time series model

Report Aggregation

  • ModelComparisonReport aggregates reports using the list of derivatives of QualityReportBase, data, and experiment keys.
  • ReportAggregator is designed to aggregate model quality reports from different models that potentially use different input/output data and can not fit into the framework of ModelComparisonReport. ReportAggregator operates with the list of classes that derive from ExperimentBase.

Developers should know

Create a virtual environment and activate it

python -m venv venv
source venv/bin/activate

Install the development packages

pip install -e .[dev]

and use pre-commit to make sure that your code is blackified automatically (used the black package):

pre-commit install

Run tests:

pip install -e .[test]
coverage run -m unittest discover tests
coverage report

Build documentation (see more details here):

pip install -e .[doc]
mkdocs build

or use

mkdocs serve

if you prefer a live, self-refreshing, documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_quality_report-1.3.0.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

model_quality_report-1.3.0-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file model_quality_report-1.3.0.tar.gz.

File metadata

  • Download URL: model_quality_report-1.3.0.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for model_quality_report-1.3.0.tar.gz
Algorithm Hash digest
SHA256 f64c1ea7f11784e051d59e93591591e088f336e1f92e833c030440d54c8c742b
MD5 6b1773017e9429d62c66174737ae71cc
BLAKE2b-256 0f42c4cf6f4c7d6bc3bc8c99540f192eccddd9cc815617c54f32577065104e67

See more details on using hashes here.

File details

Details for the file model_quality_report-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: model_quality_report-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for model_quality_report-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46f40fcabf8e4c027c7922dcd6d9963dafb75c6654c5ca2b2fe9bfe33b398a30
MD5 94c8b969d567016ecca40b615122c122
BLAKE2b-256 fc8046c9625ca2b36dd398dbba01838c26cb01be689f960ff0e39ae59c6c3b70

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page