This packages enables a quick creation of a report comparing quality of several ML models
Project description
Model Quality Report
This packages enables a quick creation of a model quality report, which is returned
as a dict
.
Main ingredients are a data splitter creating test and training data according various rules and the quality report itself. The quality report takes care of the splitting, fitting, predicting and finally deriving quality metrics.
Documentation
The official documentation is hosted on ReadTheDocs: https://model-quality-report.readthedocs.io
Installing the package
Latest available code:
pip install model_quality_report
Specific version:
pip install model_quality_report==X.Y.Z
Quickstart
- The
RandomDataSplitter
splits data randomly usingsklearn.model_selection.train_test_split
:
X = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': ['a', 'b', 'c', 'd', 'e']})
y = pd.Series(data=range(5))
splitter = RandomDataSplitter(test_size=0.33, random_state=2)
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
TimeDeltaDataSplitter
divides such that data from last period of lengthtime_delta
is used as test data. Here apd.Timedelta
and the date column name is provided:
splitter = TimeDeltaDataSplitter(date_column_name='shipping_date', time_delta=pd.Timedelta(3, unit='h'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
SplitDateDataSplitter
splits such that data after a provided date are used as test data. Additionally, the name of the date column has to be provided:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
X_train, X_test, y_train, y_test = splitter.split(X, y)
- The
SortedDataSplitter
requires a column with sortable values. Data are divided such that the test data set encompasses last fractiontest_size
. Sorting can be in ascending and descending order.
splitter = SortedDataSplitter(sortable_column_name='shipping_date', test_size=0.2, ascending=True)
X_train, X_test, y_train, y_test = splitter.split(X, y)
- Using
RegressionQualityReport
class a quality report for a regression model can be created as following:
splitter = SplitDateDataSplitter(date_column_name='shipping_date', split_date=pd.Timstamp('2016-01-01'))
model = sklearn.linear_model.LinearRegression()
quality_reporter = RegressionQualityReport(model, splitter)
report = quality_reporter.create_reports()
An exemplary report looks as follows:
{'metrics':
{'explained_variance_score': -6.018595041322246,
'mape': 0.3863636363636345,
'mean_absolute_error': 4.242424242424224,
'mean_squared_error': 29.426997245178825,
'median_absolute_error': 2.272727272727268,
'r2_score': -10.03512396694206},
'data':
{'true': {3: 10, 4: 12, 2: 8},
'predicted': {3: 12.272727272727268, 4: 20.999999999999964, 2: 6.545454545454561}}}
Note that the model
must have a model.fit
and a model.predict
function.
Available Features
Data Splitter
RandomDataSplitter
: splits randomly
TimeDeltaDataSplitter
: uses data in last period of length as test data
SplitDateDataSplitter
: uses data with timestamp newer than split date as test data
SortedDataSplitter
: sorts data along given column and takes last fraction of size x_test as test data
ByHorizon
: produces a list of splits of temporal data such that each consecutive train set has one more observation and test set one less
ByFrequency
: produces a list of splits of temporal data such that the data is split by a series of dates on a specified frequency
FixedDates
: produces a list of splits of temporal data given a list of fixed dates.
Quality Report
RegressionQualityReport
: creates a quality report for a regression model
CrossValidationTimeSeriesQualityReport
: creates a quality report for a time series model
Report Aggregation
ModelComparisonReport
aggregates reports using the list of derivatives ofQualityReportBase
, data, and experiment keys.ReportAggregator
is designed to aggregate model quality reports from different models that potentially use different input/output data and can not fit into the framework ofModelComparisonReport
.ReportAggregator
operates with the list of classes that derive fromExperimentBase
.
Developers should know
Create a virtual environment and activate it
python -m venv venv
source venv/bin/activate
Install the development packages
pip install -e .[dev]
and use pre-commit to make sure that your code is blackified automatically (used the black
package):
pre-commit install
Run tests:
pip install -e .[test]
coverage run -m unittest discover tests
coverage report
Build documentation (see more details here):
pip install -e .[doc]
mkdocs build
or use
mkdocs serve
if you prefer a live, self-refreshing, documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file model_quality_report-1.3.0.tar.gz
.
File metadata
- Download URL: model_quality_report-1.3.0.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f64c1ea7f11784e051d59e93591591e088f336e1f92e833c030440d54c8c742b |
|
MD5 | 6b1773017e9429d62c66174737ae71cc |
|
BLAKE2b-256 | 0f42c4cf6f4c7d6bc3bc8c99540f192eccddd9cc815617c54f32577065104e67 |
File details
Details for the file model_quality_report-1.3.0-py3-none-any.whl
.
File metadata
- Download URL: model_quality_report-1.3.0-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46f40fcabf8e4c027c7922dcd6d9963dafb75c6654c5ca2b2fe9bfe33b398a30 |
|
MD5 | 94c8b969d567016ecca40b615122c122 |
|
BLAKE2b-256 | fc8046c9625ca2b36dd398dbba01838c26cb01be689f960ff0e39ae59c6c3b70 |