Small Sample Beta Correction - PAC guarantees with small datasets
Project description
SSBC: Small-Sample Beta Correction
Small-Sample Beta Correction provides PAC (Probably Approximately Correct) guarantees for conformal prediction with small calibration sets.
- PyPI package: https://pypi.org/project/ssbc/
- Free software: MIT License
- Documentation: https://ssbc.readthedocs.io.
Overview
SSBC addresses the challenge of constructing valid prediction sets when you have limited calibration data. Traditional conformal prediction assumes large calibration sets, but in practice, data is often scarce. SSBC provides finite-sample correction with PAC guarantees.
Key Features
- ✅ Small-Sample Correction: PAC-valid conformal prediction for small calibration sets
- ✅ Mondrian Conformal Prediction: Per-class calibration for handling class imbalance
- ✅ Comprehensive Statistics: Detailed reporting with Clopper-Pearson confidence intervals
- ✅ Hyperparameter Tuning: Interactive parallel coordinates visualization for parameter optimization
- ✅ Simulation Tools: Built-in data generators for testing and validation
Installation
pip install ssbc
Or from source:
git clone https://github.com/yourusername/ssbc.git
cd ssbc
pip install -e .
Quick Start
import numpy as np
from ssbc import (
ssbc_correct,
BinaryClassifierSimulator,
split_by_class,
mondrian_conformal_calibrate,
report_prediction_stats,
)
# 1. Generate simulated data
sim = BinaryClassifierSimulator(
p_class1=0.1,
beta_params_class0=(2, 8),
beta_params_class1=(8, 2),
seed=42
)
labels, probs = sim.generate(n_samples=100)
# 2. Split by class for Mondrian CP
class_data = split_by_class(labels, probs)
# 3. Calibrate with SSBC correction
cal_result, pred_stats = mondrian_conformal_calibrate(
class_data=class_data,
alpha_target=0.10, # 10% miscoverage
delta=0.10, # 90% PAC guarantee
mode="beta"
)
# 4. Generate comprehensive report
summary = report_prediction_stats(pred_stats, cal_result, verbose=True)
Core Algorithm: SSBC
The SSBC algorithm finds the optimal corrected miscoverage rate α' that satisfies:
P(Coverage(α') ≥ 1 - α_target) ≥ 1 - δ
from ssbc import ssbc_correct
result = ssbc_correct(
alpha_target=0.10, # Target 10% miscoverage
n=50, # Calibration set size
delta=0.10, # PAC parameter (90% confidence)
mode="beta" # Infinite test window
)
print(f"Corrected α: {result.alpha_corrected:.4f}")
print(f"u*: {result.u_star}")
Parameters
alpha_target: Target miscoverage rate (e.g., 0.10 for 90% coverage)n: Calibration set sizedelta: PAC risk tolerance (probability of violating guarantee)mode: "beta" (infinite test) or "beta-binomial" (finite test)
Module Structure
The library is organized into focused modules:
Core Modules
ssbc.core: Core SSBC algorithm (ssbc_correct,SSBCResult)ssbc.conformal: Mondrian conformal prediction (mondrian_conformal_calibrate,split_by_class)ssbc.statistics: Statistical utilities (clopper_pearson_intervals,cp_interval)
Analysis & Visualization
ssbc.visualization: Reporting and plotting (report_prediction_stats,plot_parallel_coordinates_plotly)ssbc.hyperparameter: Parameter tuning (sweep_hyperparams_and_collect,sweep_and_plot_parallel_plotly)
Testing & Simulation
ssbc.simulation: Data generators (BinaryClassifierSimulator)
Examples
The examples/ directory contains comprehensive demonstrations:
1. Core SSBC Algorithm
python examples/ssbc_core_example.py
Demonstrates the SSBC algorithm for different calibration set sizes.
2. Mondrian Conformal Prediction
python examples/mondrian_conformal_example.py
Complete workflow: simulation → calibration → reporting.
3. Hyperparameter Sweep
python examples/hyperparameter_sweep_example.py
Interactive parameter tuning with parallel coordinates visualization.
Hyperparameter Tuning
Sweep over α and δ values to find optimal configurations:
from ssbc import sweep_and_plot_parallel_plotly
import numpy as np
# Define grid
alpha_grid = np.arange(0.05, 0.20, 0.05)
delta_grid = np.arange(0.05, 0.20, 0.05)
# Run sweep and visualize
df, fig = sweep_and_plot_parallel_plotly(
class_data=class_data,
alpha_0=alpha_grid, delta_0=delta_grid,
alpha_1=alpha_grid, delta_1=delta_grid,
color='err_all' # Color by error rate
)
# Save interactive plot
fig.write_html("sweep_results.html")
# Analyze results
print(df[['a0', 'd0', 'cov', 'sing_rate', 'err_all']].head())
The interactive plot allows you to:
- Brush (select) ranges on any axis to filter configurations
- Explore trade-offs between coverage, automation, and error rates
- Identify Pareto-optimal hyperparameter settings
Understanding the Output
Per-Class Statistics (Conditioned on True Label)
For each class, the report shows:
- Abstentions: Empty prediction sets
- Singletons: Confident predictions (automated decisions)
- Correct: True label in singleton set
- Incorrect: True label not in singleton set
- Doublets: Both labels included (escalated to human review)
Marginal Statistics (Deployment View)
Overall performance metrics ignoring true labels:
- Coverage: Fraction of predictions containing the true label
- Singleton rate: Fraction of confident predictions (automation level)
- Escalation rate: Fraction requiring human review
- Error rates: By predicted class and overall
PAC Bounds
The report includes theoretical and observed singleton error rates:
- α'_bound: Theoretical upper bound from PAC analysis
- α'_observed: Observed error rate on calibration data
- ✓ if observed ≤ bound (PAC guarantee satisfied)
Citation
If you use SSBC in your research, please cite:
@software{ssbc2024,
author = {Zwart, Petrus H},
title = {SSBC: Small-Sample Beta Correction},
year = {2024},
url = {https://github.com/yourusername/ssbc}
}
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE file for details.
Credits
This package was created with Cookiecutter and the audreyfeldroy/cookiecutter-pypackage project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssbc-0.1.0.tar.gz.
File metadata
- Download URL: ssbc-0.1.0.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f97a1265d3569b526fd0a02f6f81c53d65dfb8aeaccc9864ecfd6725b1a6b7bc
|
|
| MD5 |
2849732acc7af76fe3de11b91da5800d
|
|
| BLAKE2b-256 |
8c3760bdb48e0fb0269b3c6df5cd41d0de21c087c6ab22239037c436a902b5b1
|
Provenance
The following attestation bundles were made for ssbc-0.1.0.tar.gz:
Publisher:
release.yml on phzwart/ssbc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ssbc-0.1.0.tar.gz -
Subject digest:
f97a1265d3569b526fd0a02f6f81c53d65dfb8aeaccc9864ecfd6725b1a6b7bc - Sigstore transparency entry: 599915052
- Sigstore integration time:
-
Permalink:
phzwart/ssbc@7f643f01b426b09fb4f4dfce0035391ec2a7d7ef -
Branch / Tag:
refs/heads/main - Owner: https://github.com/phzwart
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7f643f01b426b09fb4f4dfce0035391ec2a7d7ef -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file ssbc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ssbc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
764ef06b85a6464d16a6dc58faac64dc12d2b289f071bb396690cba3bdf125e9
|
|
| MD5 |
ba87970e4b0ae989770853cee003f2f2
|
|
| BLAKE2b-256 |
d4b18d8db4a8ab10763105c436a1a349a5f8e9f5b62f57e408c9153195a5a09d
|
Provenance
The following attestation bundles were made for ssbc-0.1.0-py3-none-any.whl:
Publisher:
release.yml on phzwart/ssbc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ssbc-0.1.0-py3-none-any.whl -
Subject digest:
764ef06b85a6464d16a6dc58faac64dc12d2b289f071bb396690cba3bdf125e9 - Sigstore transparency entry: 599915060
- Sigstore integration time:
-
Permalink:
phzwart/ssbc@7f643f01b426b09fb4f4dfce0035391ec2a7d7ef -
Branch / Tag:
refs/heads/main - Owner: https://github.com/phzwart
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7f643f01b426b09fb4f4dfce0035391ec2a7d7ef -
Trigger Event:
workflow_dispatch
-
Statement type: