Skip to main content

Streaming probability calibration via multiplicative weights

Project description

MWU Calibration

License: MIT

Streaming probability calibration via multiplicative weights.

Installation

pip install streamcal

For development:

pip install -e ".[dev]"

The Problem

ML models output probabilities that are often miscalibrated—a predicted 70% doesn't mean 70% of those cases are positive. Batch calibrators (Platt scaling, isotonic regression) require periodic refits, creating a compute-drift tradeoff.

MWU maintains per-bucket bias factors with O(#buckets) cost per batch, adapting continuously without offline retraining.

Method

Maintain bias factors $c_b$ per bucket. After each batch:

$$c_b \leftarrow c_b \cdot \exp(-\eta \cdot (\bar{p}_b - \bar{y}_b))$$

where $\bar{p}_b$ is the mean calibrated probability and $\bar{y}_b$ is the observed outcome rate in bucket $b$.

Results

Semi-synthetic experiments (LightGBM base model, linear drift, B=50 buckets):

Method Brier ECE CPU ms/batch
MWU 0.133 0.070 0.08
Platt 0.129 0.043 4.92
Isotonic 0.128 0.043 4.36

MWU is 61× faster than Platt while achieving comparable Brier scores.

Usage

from streamcal import MWUCalibrator

cal = MWUCalibrator(n_buckets=50, eta=0.1)

for p_raw, y in data_stream:
    p_calibrated = cal.update(p_raw, y)

Available Calibrators

Streaming (online):

  • MWUCalibrator - Multiplicative Weights Update
  • OnlineSGD - Online SGD with additive updates
  • PerBucketEMA - Per-bucket exponential moving average

Batch (refit on accumulated data):

  • PlattScaling - Logistic regression on logits
  • IsotonicCalibrator - Isotonic regression
  • TemperatureScaling - Temperature scaling

Metrics

from streamcal import brier_score, expected_calibration_error

brier = brier_score(y_true, y_pred)
ece = expected_calibration_error(y_true, y_pred, n_bins=20)

Reproduce Experiments

pip install -e ".[experiments]"
python experiments/run_experiments.py
python experiments/generate_figures.py

Paper

See ms/mwu_calibration.pdf for theory and full results.

Related Work

This uses the same MWU/mirror descent algorithm as onlinerake (survey weighting), applied to probability calibration instead of sample reweighting.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

streamcal-0.1.0.tar.gz (484.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

streamcal-0.1.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file streamcal-0.1.0.tar.gz.

File metadata

  • Download URL: streamcal-0.1.0.tar.gz
  • Upload date:
  • Size: 484.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for streamcal-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4f548dfb4776d2cc387299ab02c68caf08abcc39dc20153c67819e6e2f211756
MD5 1959fc1f1be17afc94a387e1f010d732
BLAKE2b-256 8b1c45f4800fb7e8db3602485ae976e0b440d1f8cf4054a859d4876ad64ae650

See more details on using hashes here.

File details

Details for the file streamcal-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: streamcal-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for streamcal-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91c1d595f3691adda173469e6dbc9d85f9497b1813f2026d39bd4f0da381c6e6
MD5 a5d6ef4abdfff0383fed3ae404144940
BLAKE2b-256 954109452e80e233ca46c9368542b5ce2c9e8b27f33b77b2cf27571225e3dd00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page