Utilities for computing optimal classification cutoffs for binary and multiclass classification
Project description
Optimal Classification Cut-Offs
Select optimal probability thresholds for binary and multiclass classification.
Maximize F1, precision, recall, accuracy, or custom cost-sensitive utilities with algorithms designed for piecewise‑constant classification metrics.
Why thresholds—and what are we optimizing?
Most probabilistic classifiers output scores or probabilities p = P(y=1|x) (binary) or a probability vector over classes (multiclass). Turning those into decisions requires thresholds:
- Binary: predict 1 if
p > τ, else 0. - Multiclass: predict class
argmax_k p_kor use per‑class thresholdsτ_k.
The default τ = 0.5 is rarely optimal for your objective (e.g., F1 under imbalance, cost asymmetry, etc.). Because metrics like F1/precision/recall/accuracy only change when thresholds cross unique probability values, they are piecewise‑constant. That structure lets us compute globally optimal thresholds quickly and exactly.
Methods at a glance (from basic → advanced)
Intuition: we want the cut(s) over sorted probabilities that maximize your objective.
-
Unique scan (unique cuts) — baseline / safe: evaluate the metric at all unique predicted probabilities and pick the best. Competitive when
n_uniqueis moderate.
Method:"unique_scan". -
Sort & scan (exact, fast) — recommended for piecewise metrics: sort probabilities once and compute all candidate scores with vectorized cumulative counts. O(n log n), exact optimum for F1/precision/recall/accuracy.
Method:"sort_scan". -
Expected Fβ (Dinkelbach; calibrated) — analytical, fastest when valid: solves a fractional program for expected Fβ under perfect calibration. Currently supports F1. Use when you trust calibration and want the expected‑metric optimum.
Mode:"expected". -
Continuous optimizers — for non‑piecewise targets or micro‑averaged multiclass joint objectives: fallback to
scipy.optimizeor simple gradient heuristics. Not guaranteed optimal for stepwise metrics.
Methods:"minimize","gradient".
Multiclass strategies:
-
One‑vs‑Rest (OvR) — optimize each class's threshold independently (macro/weighted/none averaging). Simple and effective; by default we predict the highest‑probability class above its threshold, falling back to
argmaxif none pass.
Method:"auto","unique_scan","sort_scan","minimize","gradient". -
Coordinate Ascent (coupled, single‑label consistent) — optimizes F1 for the single‑label rule
argmax_k (p_k − τ_k). Typically better for imbalanced problems; currently F1 only, comparison">"only, and no sample weights.
Method:"coord_ascent".
Practical validation: holdout & cross‑validation
Thresholds are hyperparameters. To estimate a threshold you can trust:
- Split: Train your model; reserve validation data (or use cross‑validation) to choose
τ. - (Optional) Calibrate probabilities (
CalibratedClassifierCV) for better transportability. - Select thresholds on validation/CV using this library.
- Freeze the threshold and evaluate on a held‑out test set.
This repository includes cross‑validation utilities to estimate thresholds and quantify uncertainty.
🚀 Quick start
Install
pip install optimal-classification-cutoffs
Optional dependencies for enhanced performance and testing:
# For performance optimization (recommended)
pip install optimal-classification-cutoffs[performance]
# For running examples
pip install optimal-classification-cutoffs[examples]
# For development and testing
pip install optimal-classification-cutoffs[dev]
# All optional dependencies
pip install optimal-classification-cutoffs[all]
Binary
from optimal_cutoffs import get_optimal_threshold
y_true = [0, 1, 1, 0, 1]
y_prob = [0.2, 0.8, 0.7, 0.3, 0.9]
# Optimize F1 threshold
result = get_optimal_threshold(y_true, y_prob, metric="f1", method="auto")
print(result.threshold) # e.g. 0.7...
y_pred = result.predict(y_prob) # boolean labels
Multiclass (OvR thresholds)
import numpy as np
from optimal_cutoffs import get_optimal_threshold
y_true = [0, 1, 2, 0, 1]
y_prob = np.array([
[0.7, 0.2, 0.1],
[0.1, 0.8, 0.1],
[0.1, 0.1, 0.8],
[0.6, 0.3, 0.1],
[0.2, 0.7, 0.1],
])
result = get_optimal_threshold(y_true, y_prob, metric="f1") # auto-detects multiclass
print(result.thresholds) # per-class τ_k
y_pred = result.predict(y_prob) # integer class labels
Cost-Sensitive Binary
from optimal_cutoffs import get_optimal_threshold, bayes_thresholds_from_costs
# Empirical (finite-sample) optimum from labeled data
result = get_optimal_threshold(
y_true, y_prob,
utility={"tp": 50.0, "tn": 0.0, "fp": -1.0, "fn": -10.0}, # benefits/costs
)
tau = result.threshold
# Closed-form Bayes threshold (calibrated probabilities)
result_bayes = bayes_thresholds_from_costs(
fp_costs=[1.0], fn_costs=[10.0] # costs per class
)
tau_bayes = result_bayes.thresholds[0]
API Decision Stack
-
Problem: binary or multiclass (auto‑detected).
-
Objective: metric ("f1", "precision", "recall", "accuracy") or utility/cost (binary‑only).
-
Estimation regime (choose one): • Empirical (finite sample) — optimize on labeled data. • Expected under calibration — – Bayes (utility, closed‑form; binary‑only), or – Dinkelbach (expected F1; no weights).
-
Method (empirical only): "auto", "sort_scan", "unique_scan", "minimize", "gradient"; multiclass adds "coord_ascent". For expected F1, use mode="expected".
-
Tolerance: control numerical precision for floating-point comparisons (default: 1e-10).
-
Validation: holdout or cross‑validation (cv_threshold_optimization, nested_cv_threshold_optimization).
Examples
- Empirical metric (binary):
get_optimal_threshold(y, p, metric="f1", method="auto")
- Empirical utility (binary):
get_optimal_threshold(y, p, utility={"fp":-1, "fn":-5}, method="sort_scan")
- Bayes utility (calibrated, binary):
bayes_thresholds_from_costs(fp_costs=[1], fn_costs=[5]) # or
get_optimal_threshold(None, p, utility={"fp":-1,"fn":-5}, mode="bayes")
- Expected F1 via Dinkelbach (calibrated, binary):
get_optimal_threshold(y, p, metric="f1", mode="expected")
- Custom tolerance for numerical precision:
get_optimal_threshold(y, p, metric="f1", tolerance=1e-6)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file optimal_classification_cutoffs-0.6.0.tar.gz.
File metadata
- Download URL: optimal_classification_cutoffs-0.6.0.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaacaadfc9a8be5ed663a74b327667126b75472f3a2467dde70777a0f7520701
|
|
| MD5 |
d5578fa0fa84b4fb5e79232a2294fd8d
|
|
| BLAKE2b-256 |
a01f3426f15a6947b8d96b3a58aacd965799f855ee60fabe73c7b9570280f1ff
|
Provenance
The following attestation bundles were made for optimal_classification_cutoffs-0.6.0.tar.gz:
Publisher:
python-publish.yml on finite-sample/optimal_classification_cutoffs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
optimal_classification_cutoffs-0.6.0.tar.gz -
Subject digest:
aaacaadfc9a8be5ed663a74b327667126b75472f3a2467dde70777a0f7520701 - Sigstore transparency entry: 586080015
- Sigstore integration time:
-
Permalink:
finite-sample/optimal_classification_cutoffs@fb2da54600b7a9e99598b3fffc10bca9553083af -
Branch / Tag:
refs/heads/master - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@fb2da54600b7a9e99598b3fffc10bca9553083af -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file optimal_classification_cutoffs-0.6.0-py3-none-any.whl.
File metadata
- Download URL: optimal_classification_cutoffs-0.6.0-py3-none-any.whl
- Upload date:
- Size: 50.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b45a2dc2d6ef694e2c7bbd91fbcf203bae5ab5146f3dcee359386bf347c8457
|
|
| MD5 |
c54c2b68f570ea3db8ee5a0b4345ffc9
|
|
| BLAKE2b-256 |
a2226e0b05b51e9c8b8269c6b50978c40cd753e0ef4dd8f7c83fb807663e9503
|
Provenance
The following attestation bundles were made for optimal_classification_cutoffs-0.6.0-py3-none-any.whl:
Publisher:
python-publish.yml on finite-sample/optimal_classification_cutoffs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
optimal_classification_cutoffs-0.6.0-py3-none-any.whl -
Subject digest:
1b45a2dc2d6ef694e2c7bbd91fbcf203bae5ab5146f3dcee359386bf347c8457 - Sigstore transparency entry: 586080020
- Sigstore integration time:
-
Permalink:
finite-sample/optimal_classification_cutoffs@fb2da54600b7a9e99598b3fffc10bca9553083af -
Branch / Tag:
refs/heads/master - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@fb2da54600b7a9e99598b3fffc10bca9553083af -
Trigger Event:
workflow_dispatch
-
Statement type: