Skip to main content

Utilities to calibrate model outcome probability and evaluate calibration.

Project description

probability-calibration Python Library

Sample × Category Probability Calibration with Two Dimensions (ProbCalib2D)

This repository contains library code (Calibration Folder) to evaluate calibration and to measure the calibration error of models, including confidence intervals, models outcome probability give more information, not traditionally measure on model's accuracy. Generally, calibration is a pos-processing way to take an existing model and correct its uncertainties to make them more reliable and trustworthy.

Problem Setting

Deep Convolutional Neural Networks have achieved very good performance in text or image classification tasks. However, the problem of probability calibration leads to myriad problems in safety-critical machine learning application filed. The predictions of model can be over-confident if without calibration or mis-calibration. As a consequence ,it is necessary to evaluate the model calibration. There is still a main limitation, which is the calibration only adapted for one dimension. The aim is to find calibration methods that take into account both dimensions simultaneously.

Installation

pip install probability-calibration

Multi-label vs. Multi-class Classification

Multi-label: In the complete and “non-exclusive”, or multi-label setting, zero,one or more categories can be associated to a given sample (e.g., Pascal VOC [3]). The labelvector associated to a given sample is still a binary one but does not necessarily correspond toany one-hot encoding

Multi-label classification is a predictive modeling task that involves predicting zero or moremutually non-exclusive class labels. When designing a CNN model to perform a classificationtask (e.g. classifying objects in cifar10 dataset or classifying handwritten digits) we want to tellour model whether it is allowed to choose many answers (e.g. both frog and cat) or only oneanswer (e.g. the digit “8.”

Multi-class: In the complete and “exclusive”, or multi-class setting, each of thesamples is associated to exactly one category (e.g., MNIST [2], SVHN [5], CIFAR [4], CUBcategories [8],ILSVRC [7], and many other standard collections). This section will discuss how we can achieve this goal by applying either a sigmoid or a softmax function to our classifier’s raw output values.

Applying Sigmoid versus Softmax

Generally, we use softmax activation instead of sigmoid with the cross-entropy loss because softmax activation distributes the probability throughout each output node. If it is a binary classification, using sigmoid is same as softmax. For multi-class classification use sofmax with cross-entropy.

At the end of a neural network classifier, you’ll get a vector of “raw output values”: for example [-0.5, 1.2, -0.1, 2.4] if your neural network has four outputs . We’d like to convert these raw values into an understandable format: probabilities. After all, it makes more sense to tell a patient that their risk of diabetes is 91% rather than “2.4” (which looks arbitrary.)

We convert a classifier’s raw output values into probabilities using either a sigmoid function or a softmax function.

Applying SoftMax normalization versus Platt Scaling

Is Platt scaling can well calibrated? For anwsering this question, we are going to take Softmax to compare methods like Platt scaling. These scores can be used for ranking test images according to their likeliness to contain a given target concept (search task) or for ranking target concepts according to their likeliness to be visible in a given image (classification task).

Evaluating Calibration

In order to rectify the problem of the uncertainty of model, these works resulted in two common ways of measuring calibration: reliability diagrams [17] and estimates of the squared expected calibration error (ECE)[17].

Both tasks are usually evaluated with different metrics. In the multi-class setting, the classification performance is generally evaluated using the top-N accuracy, usually with N = 1.

The retrieval performance is generally evaluated using the mean average precision (MAP). The MAP may also be used for the evaluation of the classification performance in the multilabel setting, like for Pascal VOC, as there may be more than one correct category associated to a given sample

Visualizing calibration with reliability diagrams.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probability-calibration-0.0.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

probability_calibration-0.0.1-py2-none-any.whl (5.4 kB view details)

Uploaded Python 2

File details

Details for the file probability-calibration-0.0.1.tar.gz.

File metadata

  • Download URL: probability-calibration-0.0.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.9

File hashes

Hashes for probability-calibration-0.0.1.tar.gz
Algorithm Hash digest
SHA256 60805ff91d2bf2ca7283fb848b11a67c32b11d824c2a0876ce0382f75233c958
MD5 3408d2d54b0d5b6a3ac108a9e88d19d8
BLAKE2b-256 17f9ee61d78bf11ccab07df392e6fe22c23e10f1b96303bf24275ca1115b571b

See more details on using hashes here.

File details

Details for the file probability_calibration-0.0.1-py2-none-any.whl.

File metadata

  • Download URL: probability_calibration-0.0.1-py2-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.9

File hashes

Hashes for probability_calibration-0.0.1-py2-none-any.whl
Algorithm Hash digest
SHA256 0b096f8c4362c9f9248d34042c396418e7e8390f0f34e336f80fc8a98629cd3e
MD5 a6fb13aa539f2f9fb935891c984ad176
BLAKE2b-256 4c026c80ed89488f93b3a513439e46ff88bac9d98491a08e333ceff1315128e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page