An implementation of online conformal prediction
Project description
online-cp -- Online Conformal Prediction
This project is an implementation of Online Conformal Prediction.
For now, take a look at example.ipynb
to see how to use the library.
Quick start
Let's create a dataset with noisy evaluations of the function $f(x_1, x_2) = x_1 + x_2$.
import numpy as np
N = 30
X = np.random.uniform(0, 1, (N, 2))
y = X.sum(axis=1) + np.random.normal(0, 0.1, N)
cp.learn_initial_training_set(X, y)
Import the library and create a regressor:
from online_cp import ConformalRidgeRegressor
cp = ConformalRidgeRegressor()
Alternative 1: Learn the whole dataset online
cp.learn_initial_training_set(X, y)
Predict an object (your output may not be exactly the same, as the dataset depends on the random seed).
cp.predict(np.array([0.5, 0.5]), epsilon=0.1, bounds='both')
(0.8065748777057368, 1.2222461945130274)
The prediction set is the closed interval whose boundaries are indicated by the output.
Alternative 2: Learn the dataset sequentially online, and make predictions as we go. In order to output nontrivial prediction at significance level $\epsilon=0.1$, we need to have learned at least 20 examples.
cp = ConformalRidgeRegressor()
for i, (obj, lab) in enumerate(zip(X, y)):
print(cp.predict(obj, epsilon=0.1, bounds='both'))
cp.learn_one(obj, lab)
The output will be (inf, inf)
for the first 19 predictions, after which we will typically see meaningful prediction sets.
Future considerations
Release minimal version
For use in projects, it may be good to have a released minimal version of OnlineConformalPrediction. Initially, it could include
- Conformalised Ridge Regression
- Plugin martingale
- Possibly Conformalised Nearest Neighbours Regression (but I will have to check it carefully for any bugs)
Properties of CPs?
- Should we keep track of errors internally in the parent class?
- Should we store the average interval size?
- For classifiers; should we store the efficiency metrics?
Linear regression
We will initally focus on regression, but online classification is actually easier. A simple class that uses e.g. scikit-learn classifiers to define nonconformity measure could be easily implemented.
There are at least three commonly used regularisations used in linear regression, all of which are compatible with the kernel trick.
- $L1$ (Lasso)
- $L2$ (Ridge)
- Linear combination of the above (Elastic net)
All of these can be conformalized, and at least Ridge can also be used in conformal predictive systems (CPS).
Another relatively simple regressor is the k-nearest neighbours algorithm, which is very flexible as it can use arbitrary distances. It is particularly interesting in the CPS setting. The distance can be measured in feature space as defined by a kernel.
Ridge and KNN are described in detail in Algorithmic Learning in a Random World. Lasso and Elastic net are conformalised in the paper Fast Exact Conformalization of Lasso using Piecewise Linear Homotopy, but I am unaware of any extention to CPS.
Teaching schedule
Section 3.3 in Algorithmic Learning in a Radnom World deals with, so called, weak teachers. In the pure online mode, labels arrive immediately after a predition is made. This makes little sense in practice. The notion of a teaching schedule formalises this, and makes the relevant validity guarantees clear. There are three types of validity; weak, strong, and iterated logartihm validity.
There may be settings where the user wants to specify a teaching schedule beforehand, to guarantee some property of validity. It may also be the case that the teaching schedule is implied by the usage, and it would then be useful to know if the resulting prediciton sets are valid.
A teaching schedule also serves as documentation of what has been done, which could be useful in practice.
Todo
- Should we add some scaler? Don't know if it is neccesary for Ridge
- Possibly add a class MimoConformalRidgeRegressor
- Add CPS version of ridge regressor?
- Possibly add a TeachingSchedule?
- Possibly add ACI, both for single, and MIMO CRR?
- Add references to papers and books to README
- Add k-NN regressor and CPS
References
How to cite papers? I think I have seen it in some repos.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file online_cp-0.0.1.tar.gz
.
File metadata
- Download URL: online_cp-0.0.1.tar.gz
- Upload date:
- Size: 20.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d9e0131b87fdaf438d52cf0d4a5c75bd60619e59b39317cc6892f74631e673b |
|
MD5 | 099336aca2460cfc4abdf64a4fea13df |
|
BLAKE2b-256 | 0ab86332a4bae5adfc7be4e0f587b3f0a421826499a3256ba5d3c01eb3c607b7 |
File details
Details for the file online_cp-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: online_cp-0.0.1-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5df90a95f605cdf04b115fb4909e83731027820d957dfffb43b55d697d6fd67e |
|
MD5 | 8577bc05e1e00cb5732611c90e29ea35 |
|
BLAKE2b-256 | 5799705c8086cc3714b5d8cea237831ad58683780cce380a9ba4997a4b50e63a |