Skip to main content

An implementation of online conformal prediction

Project description

online-cp -- Online Conformal Prediction

This project is an implementation of Online Conformal Prediction.

For now, take a look at example.ipynb to see how to use the library.

Quick start

Let's create a dataset with noisy evaluations of the function $f(x_1, x_2) = x_1 + x_2$.

import numpy as np
N = 30
X = np.random.uniform(0, 1, (N, 2))
y = X.sum(axis=1) + np.random.normal(0, 0.1, N)
cp.learn_initial_training_set(X, y)

Import the library and create a regressor:

from online_cp import ConformalRidgeRegressor
cp = ConformalRidgeRegressor()

Alternative 1: Learn the whole dataset online

cp.learn_initial_training_set(X, y)

Predict an object (your output may not be exactly the same, as the dataset depends on the random seed).

cp.predict(np.array([0.5, 0.5]), epsilon=0.1, bounds='both')
(0.8065748777057368, 1.2222461945130274)

The prediction set is the closed interval whose boundaries are indicated by the output.

Alternative 2: Learn the dataset sequentially online, and make predictions as we go. In order to output nontrivial prediction at significance level $\epsilon=0.1$, we need to have learned at least 20 examples.

cp = ConformalRidgeRegressor()
for i, (obj, lab) in enumerate(zip(X, y)):
    print(cp.predict(obj, epsilon=0.1, bounds='both'))
    cp.learn_one(obj, lab)

The output will be (inf, inf) for the first 19 predictions, after which we will typically see meaningful prediction sets.

Future considerations

Release minimal version

For use in projects, it may be good to have a released minimal version of OnlineConformalPrediction. Initially, it could include

  • Conformalised Ridge Regression
  • Plugin martingale
  • Possibly Conformalised Nearest Neighbours Regression (but I will have to check it carefully for any bugs)

Properties of CPs?

  • Should we keep track of errors internally in the parent class?
  • Should we store the average interval size?
  • For classifiers; should we store the efficiency metrics?

Linear regression

We will initally focus on regression, but online classification is actually easier. A simple class that uses e.g. scikit-learn classifiers to define nonconformity measure could be easily implemented.

There are at least three commonly used regularisations used in linear regression, all of which are compatible with the kernel trick.

  • $L1$ (Lasso)
  • $L2$ (Ridge)
  • Linear combination of the above (Elastic net)

All of these can be conformalized, and at least Ridge can also be used in conformal predictive systems (CPS).

Another relatively simple regressor is the k-nearest neighbours algorithm, which is very flexible as it can use arbitrary distances. It is particularly interesting in the CPS setting. The distance can be measured in feature space as defined by a kernel.

Ridge and KNN are described in detail in Algorithmic Learning in a Random World. Lasso and Elastic net are conformalised in the paper Fast Exact Conformalization of Lasso using Piecewise Linear Homotopy, but I am unaware of any extention to CPS.

Teaching schedule

Section 3.3 in Algorithmic Learning in a Radnom World deals with, so called, weak teachers. In the pure online mode, labels arrive immediately after a predition is made. This makes little sense in practice. The notion of a teaching schedule formalises this, and makes the relevant validity guarantees clear. There are three types of validity; weak, strong, and iterated logartihm validity.

There may be settings where the user wants to specify a teaching schedule beforehand, to guarantee some property of validity. It may also be the case that the teaching schedule is implied by the usage, and it would then be useful to know if the resulting prediciton sets are valid.

A teaching schedule also serves as documentation of what has been done, which could be useful in practice.

Todo

  • Should we add some scaler? Don't know if it is neccesary for Ridge
  • Possibly add a class MimoConformalRidgeRegressor
  • Add CPS version of ridge regressor?
  • Possibly add a TeachingSchedule?
  • Possibly add ACI, both for single, and MIMO CRR?
  • Add references to papers and books to README
  • Add k-NN regressor and CPS

References

How to cite papers? I think I have seen it in some repos.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

online_cp-0.0.1.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

online_cp-0.0.1-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file online_cp-0.0.1.tar.gz.

File metadata

  • Download URL: online_cp-0.0.1.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for online_cp-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1d9e0131b87fdaf438d52cf0d4a5c75bd60619e59b39317cc6892f74631e673b
MD5 099336aca2460cfc4abdf64a4fea13df
BLAKE2b-256 0ab86332a4bae5adfc7be4e0f587b3f0a421826499a3256ba5d3c01eb3c607b7

See more details on using hashes here.

File details

Details for the file online_cp-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: online_cp-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for online_cp-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5df90a95f605cdf04b115fb4909e83731027820d957dfffb43b55d697d6fd67e
MD5 8577bc05e1e00cb5732611c90e29ea35
BLAKE2b-256 5799705c8086cc3714b5d8cea237831ad58683780cce380a9ba4997a4b50e63a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page