Optimal binning and conformal prediction for regression
Reason this release was yanked:
Release contained unneeded files
Project description
Optimal Binning for Regression via LOO-CRPS
Research project extending the Venn-ABERS calibrated prediction framework to regression (continuous response, real-valued covariate).
Overview
Given training data $(x_i, y_i)$, we sort by $x$, partition the observations into $K$ contiguous bins, and use the within-bin empirical CDF as the predictive distribution. The three main contributions are:
- Optimal binning: bin boundaries are chosen to minimise the total leave-one-out CRPS via dynamic programming in $O(n^2 K)$ time.
- Cross-validated K selection: the number of bins $K$ is selected by test CRPS on a held-out split; within-sample LOO-CRPS is gameable and must not be used for model selection.
- Conformal prediction: the bin ECDF is wrapped in a full conformal predictor (CRPS as nonconformity score) to obtain finite-sample coverage guarantees.
The Venn prediction band — the family of augmented ECDFs as the hypothetical label varies — is also formalised as the direct regression analog of the Venn-ABERS interval.
Package usage
import numpy as np
from crpsconfreg import BinningPredictor
# x and y must be sorted by x
pred = BinningPredictor().fit(x_train, y_train, K_max=20) # CV selects K*
# Conformal prediction interval with coverage >= 1 - epsilon
lo, hi = pred.predict_interval(x_new, epsilon=0.10)
# Conformal p-value for an observed (x*, y*) pair
pvals = pred.conformal_pvalue(x_new, y_new)
# Within-bin empirical CDF
cdf = pred.predict_ecdf(x_new, t_grid)
print(f"K* = {pred.K_}, bin sizes = {pred.bin_sizes()}")
Pass K=k to fit() to fix the number of bins and bypass cross-validation.
Installation
uv sync
# or
pip install -e .
Repository structure
| Path | Description |
|---|---|
src/crpsconfreg/ |
Python package |
src/crpsconfreg/binning.py |
LOO-CRPS cost, DP, bin boundaries |
src/crpsconfreg/selection.py |
CRPS scoring, CV K selection |
src/crpsconfreg/conformal.py |
Conformal p-values (vectorised), prediction intervals |
src/crpsconfreg/predictor.py |
BinningPredictor high-level class |
tests/ |
61 pytest tests |
dp_formulation.tex / .pdf |
Full write-up: derivations, CV K selection, Venn band, conformal prediction, numerical illustration |
conformal_binning.ipynb |
Clean notebook: CV K selection, Venn band, p-value curves, fan plot, coverage check |
optimal_binning.ipynb |
Original exploratory notebook (includes Bayesian regularisation exploration) |
save_figures.py |
Reproduces all figures in figures/ as PDFs |
showcase.ipynb |
Interactive notebook: edit one config cell to plug in any DGP |
demo/app.py |
Panel + Pyodide static web demo (Python in the browser) |
demo-js/ |
TypeScript + Vite + Plotly.js web demo (instant load, no WASM) |
docs/ |
Generated static site served by GitHub Pages |
Key results
- LOO-CRPS cost of a bin $S$ with $m \ge 2$ observations: $\mathrm{cost}(S) = \frac{m}{(m-1)^2} \sum_{\ell < r} |y_\ell - y_r|$
- CV selects $K^*=3$ on the heteroscedastic example ($n=200$, $Y|X=x \sim \mathcal{N}(x,(1+x)^2)$), with bin sizes 51/60/89
- Empirical coverage at $\varepsilon=0.10$: 91.0% (target $\ge$ 90%) on a 2000-point test set
Interactive showcase
showcase.ipynb lets you plug in any data-generating process and immediately see
the full pipeline — K selection, partition, predictive CDFs, conformal p-value curves,
prediction bands, and empirical coverage. Edit only the configuration cell at the top:
def ygiven_x(x, rng): # sample Y | X = x
return rng.normal(loc=x, scale=0.5 + x)
def true_quantile(x, p): # oracle p-quantile (or set to None)
from scipy.stats import norm
return norm.ppf(p, loc=x, scale=0.5 + x)
n_train, x_lo, x_hi = 300, 0.0, 3.0
K_max, epsilon, seed = 20, 0.10, 42
Three alternatives are included as comments: skewed (gamma), bimodal mixture, and sinusoidal mean.
Live demo
A static web demo runs entirely in the browser with no server and no Python runtime.
It is built with TypeScript, Vite, and Plotly.js (demo-js/), giving instant page
loads. Four DGP presets are included; you can also write a custom Data Generating
Process in the editor. All plots from the notebook are reproduced.
Enable GitHub Pages (Settings → Pages → Source: docs/ on js-demo) to host it
at https://ptocca.github.io/RegressionVenn/.
To rebuild docs/ after editing the demo source:
cd demo-js
npm install # first time only
npm run build # outputs to ../docs/
The earlier Panel + Pyodide demo (demo/app.py, branch master) is still available
and can be rebuilt with bash demo/build.sh.
Running the tests
uv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crpsconfreg-0.2.0.tar.gz.
File metadata
- Download URL: crpsconfreg-0.2.0.tar.gz
- Upload date:
- Size: 39.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbc55737c2ed5dfed14d8a5b099caaf03dd50716201262b3681087b9036a8a56
|
|
| MD5 |
2acbceddebaa2813a6f5d9adaef62dfb
|
|
| BLAKE2b-256 |
f40bd077a1510c26fcceb79597eb6d814926318d015f88e4dd886f455737fea0
|
File details
Details for the file crpsconfreg-0.2.0-py3-none-any.whl.
File metadata
- Download URL: crpsconfreg-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
762a6ec00d9624a5eb4130b6227b323ce7948d42c381bc8fee40f1c21380c5de
|
|
| MD5 |
30f0a47d1cbf3aabaae5e73d7da2357c
|
|
| BLAKE2b-256 |
b586102dafe77ee1842c5562390fbcd06e3d6569af3e0ae709e4408f40d82671
|