This packages enables a quick creation of a report comparing quality of several ML models
Project description
Model Quality Report
This packages enables a quick creation of a model quality report, which is returned
as a dict.
Main ingredients are a data splitter creating test and training data according various rules and the quality report itself. The quality report takes care of the splitting, fitting, predicting and finally deriving quality metrics.
Documentation
The official documentation is hosted on ReadTheDocs: https://model-quality-report.readthedocs.io
Installing the package
Latest available code:
pip install model_quality_report
Specific version:
pip install model_quality_report==X.Y.Z
Quickstart
- The
RandomDataSplittersplits data randomly usingsklearn.model_selection.train_test_split:
X = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']})
y = pd.Series(data=range(5))
splitter = RandomDataSplitter(test_size=0.33, random_state=2)
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
TimeDeltaDataSplitterdivides such that data from last period of lengthtime_deltais used as test data. Here apd.Timedeltaand the date column name is provided:
splitter = TimeDeltaDataSplitter(date_column_name='shipping_date', time_delta=pd.Timedelta(3, unit='h'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
SplitDateDataSplittersplits such that data after a provided date are used as test data. Additionally, the name of the date column has to be provided:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
SortedDataSplitterrequires a column with sortable values. Data are divided such that the test data set encompasses last fractiontest_size. Sorting can be in ascending and descending order.
splitter = SortedDataSplitter(sortable_column_name='shipping_date', test_size=0.2, ascending=True)
X_train, X_test, y_train, y_test = splitter.split(X, y)
- Using
RegressionQualityReportclass a quality report for a regression model can be created as following:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
model = sklearn.linear_model.LinearRegression()
quality_reporter = RegressionQualityReport(model, splitter)
report = quality_reporter.create_reports()
An exemplary report looks as follows:
{'metrics':
{'explained_variance_score': -6.018595041322246,
'mape': 0.3863636363636345,
'mean_absolute_error': 4.242424242424224,
'mean_squared_error': 29.426997245178825,
'median_absolute_error': 2.272727272727268,
'r2_score': -10.03512396694206},
'data':
{'true': {3: 10, 4: 12, 2: 8},
'predicted': {3: 12.272727272727268, 4: 20.999999999999964, 2: 6.545454545454561}}}
Note that the model must have a model.fit and a model.predict function.
Available Features
Data Splitter
RandomDataSplitter: splits randomly
TimeDeltaDataSplitter: uses data in last period of length as test data
SplitDateDataSplitter: uses data with timestamp newer than split date as test data
SortedDataSplitter: sorts data along given column and takes last fraction of size x_test as test data
ByHorizon: produces a list of splits of temporal data such that each consecutive train set has one more observation and test set one less
ByFrequency: produces a list of splits of temporal data such that the data is split by a series of dates on a specified frequency
FixedDates: produces a list of splits of temporal data given a list of fixed dates.
Quality Report
RegressionQualityReport: creates a quality report for a regression model
CrossValidationTimeSeriesQualityReport: creates a quality report for a time series model
Report Aggregation
ModelComparisonReportaggregates reports using the list of derivatives ofQualityReportBase, data, and experiment keys.ReportAggregatoris designed to aggregate model quality reports from different models that potentially use different input/output data and can not fit into the framework ofModelComparisonReport.ReportAggregatoroperates with the list of classes that derive fromExperimentBase.
Developers should know
Create a virtual environment and activate it
python -m venv venv
source venv/bin/activate
Install the development packages
pip install -e .[dev]
and use pre-commit to make sure that your code is blackified automatically (used the black package):
pre-commit install
Run tests:
pip install -e .[test]
coverage run -m unittest discover tests
coverage report
Build documentation (see more details here):
pip install -e .[doc]
mkdocs build
or use
mkdocs serve
if you prefer a live, self-refreshing, documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file model_quality_report-1.3.0.tar.gz.
File metadata
- Download URL: model_quality_report-1.3.0.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f64c1ea7f11784e051d59e93591591e088f336e1f92e833c030440d54c8c742b
|
|
| MD5 |
6b1773017e9429d62c66174737ae71cc
|
|
| BLAKE2b-256 |
0f42c4cf6f4c7d6bc3bc8c99540f192eccddd9cc815617c54f32577065104e67
|
File details
Details for the file model_quality_report-1.3.0-py3-none-any.whl.
File metadata
- Download URL: model_quality_report-1.3.0-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46f40fcabf8e4c027c7922dcd6d9963dafb75c6654c5ca2b2fe9bfe33b398a30
|
|
| MD5 |
94c8b969d567016ecca40b615122c122
|
|
| BLAKE2b-256 |
fc8046c9625ca2b36dd398dbba01838c26cb01be689f960ff0e39ae59c6c3b70
|