Skip to main content

Machine Learning Prediction Confidence Estimation

Project description

mlpce

Machine Learning Prediction Confidence Estimation

Build Status Maintainability Test Coverage PyPi version

Let's say you have a cool XGBOOST model that you've built and now you're wanting to make predictions with it on new data points - how well does your training data cover that model space? In classic statistical analysis, especially DOEs, there are many characteristics about the data used to cover a space that can be considered (e.g. A-, D-, G-, I-optimality). I-optimality is the average prediction variance in the design space, that is, a measure of how precisely a model built on that data should be able to make new predictions.

mlpce is a Python package which provides an expression of confidence in any given prediction by using an approximating linear function to calculate the standard error of prediction for the new point and comparing it to the same value for the training data. The approximating linear function can either be specified as a string or the module will simply pick a high-order polynomial model based on the available degrees of freedom in the training data.

Usage

Consider a dataset picked to be I-Optimal for evaluating a full third-order response surface model. There are 54 rows and 6 columns. This pandas data frame can then be passed into the Confidence class where an approximating linear model will be created and the necessary matrices will be calculated. Now we can pass in a few new rows to be evaluated.

import pandas as pd
from mlpce import Confidence

pd_x = pd.DataFrame(data=[[-1, -0.5, 0.5, -1, 1, 1], [1, -1, 1, -1, -1, -1], [-0.5, 0.5, 1, -0.5, 0, 1],
                          [0.5, 1, 1, 0.5, -1, -1], [-0.5, 0.5, -0.5, 1, -1, 0.5], [-0.5, 0.5, -1, -0.5, 0.5, 1],
                          [1, 1, -1, -1, -1, 0.5], [1, -1, -1, -0.5, 1, 0.5], [1, 0.5, -1, 1, 0.5, 0],
                          [0, -0.5, 0.5, -0.5, -0.5, 0.5], [1, 1, 1, 1, 1, -0.5], [0.5, 1, -0.5, 0.5, -0.5, 1],
                          [0.5, -0.5, -0.5, -0.5, 0.5, -0.5], [1, -1, 1, -1, 0.5, 1], [-1, 1, 0, 1, 1, 1],
                          [1, 1, 0.5, -1, 1, 1], [-0.5, -0.5, -1, -1, 0.5, -1], [1, -1, -1, 0.5, 1, -1],
                          [0.5, -1, -1, -1, -0.5, -0.5], [-1, -1, 0, -0.5, -1, -1], [1, -0.5, 1, 0.5, 1, 0],
                          [0.5, -1, 0.5, 1, 0, -0.5], [1, 0.5, 0.5, -0.5, -0.5, -0.5], [1, -1, 1, 0.5, -1, 1],
                          [0.5, 0.5, -0.5, -1, 1, -1], [0.5, 0.5, 0.5, 0.5, 0.5, 0.5], [0.5, -0.5, 0, 1, 1, 1],
                          [-0.5, -0.5, 1, 0.5, -1, -0.5], [-1, 1, 0, -0.5, 1, 0], [1, 1, -0.5, -1, -0.5, -1],
                          [0.5, 0.5, -1, 1, -1, -0.5], [0.5, 1, 1, -1, -1, 0.5], [1, -1, -1, 1, -1, 0.5],
                          [-0.5, -1, -0.5, 0.5, 1, 0], [1, -0.5, -0.5, -1, -1, 1], [-1, -0.5, -1, 1, -0.5, -1],
                          [-1, 1, -1, 1, 0.5, -1], [-0.5, -1, -1, -0.5, -1, 1], [-1, 0, -0.5, -1, -0.5, 0.5],
                          [1, -1, 0.5, -1, 1, -1], [-1, 0.5, -1, -0.5, -1, -1], [1, 1, 1, 1, -1, 1],
                          [1, -1, -0.5, 0.5, -1, -1], [-1, 0.5, 1, 1, -1, -1], [-1, -1, 1, -0.5, 1, -0.5],
                          [-1, -0.5, -1, 0.5, 0, 1], [-1, -1, 1, -1, -1, 1], [-1, 0, 0.5, 1, 1, -1], 
                          [0.5, 1, 1, -1, 0.5, -1], [-0.5, 0.5, 1, -1, -1, -1], [-1, 0, 1, 1, -1, 1], 
                          [-1, 1, 0.5, -0.5, -1, 1], [-0.5, 1, 0.5, 0.5, 0, -0.5], [-1, -1, 1, 1, 0.5, 0.5]],
                    columns=['a', 'b', 'c', 'd', 'e', 'f'])
pd_x_k = pd.DataFrame(data=[[0, 0, 0, 0, 0, 0], [2, 2, 2, 2, 2, 2]],
                      columns=['a', 'b', 'c', 'd', 'e', 'f'])

emm = Confidence(known=pd_x)
pred_variance, confidence = emm.assess_x(pd_x_k)

The results are dictionaries with keys matching any responses provided as well as a 'Full' key which evaluates the row in the setting of all x values (without regard for missing values in responses). The first element is the calculated, unscaled prediction variance. The second element is a string of 'High', 'Mid' or 'Low' indicating how confident you can feel in the model's ability to make predictions in this space.

  • High - the prediction variance is less than the 90th percentile of training data's prediction variances
  • Mid - the prediction variance is no greater than the maximum prediction variance of the training data
  • Low - the prediction variance is greater than the maximum prediction variance of the training data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlpce-0.1.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlpce-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file mlpce-0.1.0.tar.gz.

File metadata

  • Download URL: mlpce-0.1.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.5

File hashes

Hashes for mlpce-0.1.0.tar.gz
Algorithm Hash digest
SHA256 212b988557c3ac7fa331cb964813f393b74d248ac12270ee822b760bc3fecf61
MD5 7fe9341c06ed40845800373668acfd5b
BLAKE2b-256 5e9d9cfd09dfcc69d7099e771dd81781505c03f825821e31e39158bc406161f1

See more details on using hashes here.

File details

Details for the file mlpce-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlpce-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.5

File hashes

Hashes for mlpce-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d839ca9d92b25a683829a7f9305055425cfe29955329964ea8f9779b264725f2
MD5 e2b33d197ee7e1e11c50e83671174ae0
BLAKE2b-256 3451d4383216670cdec1ce06b2a10b5da9433ac8642a03100edfbcdc24fc7269

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page