Skip to main content

Utilities for computing optimal classification cutoffs for binary and multiclass classification

Project description

Optimal Classification Cut-Offs

Python application Documentation PyPI version PyPI Downloads Python License: MIT

Select optimal probability thresholds for binary and multiclass classification.
Maximize F1, precision, recall, accuracy, or custom cost-sensitive utilities with algorithms designed for piecewise‑constant classification metrics.


Why thresholds—and what are we optimizing?

Most probabilistic classifiers output scores or probabilities p = P(y=1|x) (binary) or a probability vector over classes (multiclass). Turning those into decisions requires thresholds:

  • Binary: predict 1 if p > τ, else 0.
  • Multiclass: predict class argmax_k p_k or use per‑class thresholds τ_k.

The default τ = 0.5 is rarely optimal for your objective (e.g., F1 under imbalance, cost asymmetry, etc.). Because metrics like F1/precision/recall/accuracy only change when thresholds cross unique probability values, they are piecewise‑constant. That structure lets us compute globally optimal thresholds quickly and exactly.


Methods at a glance (from basic → advanced)

Intuition: we want the cut(s) over sorted probabilities that maximize your objective.

  • Smart brute (unique cuts)baseline / safe: evaluate the metric at all unique predicted probabilities and pick the best. Competitive when n_unique is moderate.
    Method: "smart_brute".

  • Sort & scan (exact, fast)recommended for piecewise metrics: sort probabilities once and compute all candidate scores with vectorized cumulative counts. O(n log n), exact optimum for F1/precision/recall/accuracy.
    Method: "sort_scan".

  • Dinkelbach (expected Fβ; calibrated)analytical, fastest when valid: solves a fractional program for expected Fβ under perfect calibration. Currently supports F1. Use when you trust calibration and want the expected‑metric optimum.
    Method: "dinkelbach".

  • Continuous optimizersfor non‑piecewise targets or micro‑averaged multiclass joint objectives: fallback to scipy.optimize or simple gradient heuristics. Not guaranteed optimal for stepwise metrics.
    Methods: "minimize", "gradient".

Multiclass strategies:

  • One‑vs‑Rest (OvR) — optimize each class’s threshold independently (macro/weighted/none averaging). Simple and effective; by default we predict the highest‑probability class above its threshold, falling back to argmax if none pass.
    Method: "auto", "smart_brute", "sort_scan", "minimize", "gradient".

  • Coordinate Ascent (coupled, single‑label consistent) — optimizes F1 for the single‑label rule argmax_k (p_k − τ_k). Typically better for imbalanced problems; currently F1 only, comparison ">" only, and no sample weights.
    Method: "coord_ascent".


Practical validation: holdout & cross‑validation

Thresholds are hyperparameters. To estimate a threshold you can trust:

  1. Split: Train your model; reserve validation data (or use cross‑validation) to choose τ.
  2. (Optional) Calibrate probabilities (CalibratedClassifierCV) for better transportability.
  3. Select thresholds on validation/CV using this library.
  4. Freeze the threshold and evaluate on a held‑out test set.

This repository includes cross‑validation utilities to estimate thresholds and quantify uncertainty.


🚀 Quick start

Install

pip install optimal-classification-cutoffs

Binary

from optimal_cutoffs import ThresholdOptimizer

y_true = [0, 1, 1, 0, 1]
y_prob = [0.2, 0.8, 0.7, 0.3, 0.9]

# Optimize F1 threshold
opt = ThresholdOptimizer(objective="f1", method="auto")
opt.fit(y_true, y_prob)
print(opt.threshold_)            # e.g. 0.7...
y_pred = opt.predict(y_prob)     # boolean labels

Multiclass (OvR thresholds)

import numpy as np
from optimal_cutoffs import ThresholdOptimizer

y_true = [0, 1, 2, 0, 1]
y_prob = np.array([
    [0.7, 0.2, 0.1],
    [0.1, 0.8, 0.1],
    [0.1, 0.1, 0.8],
    [0.6, 0.3, 0.1],
    [0.2, 0.7, 0.1],
])

opt = ThresholdOptimizer(objective="f1")   # auto-detects multiclass
opt.fit(y_true, y_prob)
print(opt.threshold_)                      # per-class τ_k
y_pred = opt.predict(y_prob)               # integer class labels

Cost-Sensitive Binary

from optimal_cutoffs import get_optimal_threshold, bayes_threshold_from_costs

# Empirical (finite-sample) optimum from labeled data
tau = get_optimal_threshold(
    y_true, y_prob,
    utility={"tp": 50.0, "tn": 0.0, "fp": -1.0, "fn": -10.0},  # benefits/costs
)

# Closed-form Bayes threshold (calibrated probabilities)
tau_bayes = bayes_threshold_from_costs(
    fp_cost=1.0, fn_cost=10.0, tp_benefit=50.0, tn_benefit=0.0
)

API Decision Stack

  1. Problem: binary or multiclass (auto‑detected).

  2. Objective: metric ("f1", "precision", "recall", "accuracy") or utility/cost (binary‑only).

  3. Estimation regime (choose one):     • Empirical (finite sample) — optimize on labeled data.     • Expected under calibration —       – Bayes (utility, closed‑form; binary‑only), or       – Dinkelbach (expected F1; no weights).

  4. Method (empirical only): "auto", "sort_scan", "smart_brute", "minimize", "gradient"; multiclass adds "coord_ascent". For expected F1, use "dinkelbach".

  5. Validation: holdout or cross‑validation (cv_threshold_optimization, nested_cv_threshold_optimization).

Examples

  • Empirical metric (binary):
get_optimal_threshold(y, p, metric="f1", method="auto")
  • Empirical utility (binary):
get_optimal_threshold(y, p, utility={"fp":-1, "fn":-5}, method="sort_scan")
  • Bayes utility (calibrated, binary):
bayes_threshold_from_costs(fp_cost=1, fn_cost=5) # or
get_optimal_threshold(None, p, utility={"fp":-1,"fn":-5}, bayes=True)
  • Expected F1 via Dinkelbach (calibrated, binary):
get_optimal_threshold(y, p, metric="f1", method="dinkelbach")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimal_classification_cutoffs-0.5.0.tar.gz (148.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optimal_classification_cutoffs-0.5.0-py3-none-any.whl (51.0 kB view details)

Uploaded Python 3

File details

Details for the file optimal_classification_cutoffs-0.5.0.tar.gz.

File metadata

File hashes

Hashes for optimal_classification_cutoffs-0.5.0.tar.gz
Algorithm Hash digest
SHA256 a100980a11a07447cd804ef0f7345092e0b4911004a92b2167fb06268e3153b2
MD5 01d48da87cb6cfe215c71f6e9da41d51
BLAKE2b-256 71a62cf5c97bd3cf26b5bdf63bd49d977b4e9723b3a3bf418780207337c41a12

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimal_classification_cutoffs-0.5.0.tar.gz:

Publisher: python-publish.yml on finite-sample/optimal_classification_cutoffs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file optimal_classification_cutoffs-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for optimal_classification_cutoffs-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d6b9ab74f59ff2259b0cfc4c8a43be506dd4cbf9091103e6d712b39e7791ee65
MD5 cb39c63b232844c8785b9410e8cf57d5
BLAKE2b-256 580eba4243181b9edbfe6997dc669042519d9aaa7260d87dae986e6bf3337205

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimal_classification_cutoffs-0.5.0-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/optimal_classification_cutoffs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page