Utilities for computing optimal classification cutoffs for binary and multiclass classification

These details have not been verified by PyPI

Project description

Optimal Classification Cutoffs

Transform your ML model performance with optimal decision thresholds.

Most classifiers output probabilities, but decisions need thresholds. The default τ = 0.5 is almost always wrong for real objectives like F1, precision/recall, or business costs. This library finds the exact optimal threshold using O(n log n) algorithms, delivering 40%+ metric improvements in 3 lines of code.

The Problem: Default 0.5 Thresholds Are Wrong

# ❌ WRONG: Default 0.5 threshold (what everyone does)
y_pred = (model.predict_proba(X)[:, 1] >= 0.5).astype(int)
# F1 Score: 0.654

# ✅ RIGHT: Optimal threshold (3 lines of code)
from optimal_cutoffs import optimize_thresholds
result = optimize_thresholds(y_true, y_scores, metric="f1") 
y_pred = result.predict(y_scores_test)
# F1 Score: 0.891 (+36% improvement!)

Why this matters: Default 0.5 assumes equal costs and balanced classes. Real problems have imbalanced data (fraud: 1%, disease: 5%) and asymmetric costs (missing fraud costs $1000, false alarm costs $1). Optimal thresholds are typically 0.05-0.30, not 0.50.

Installation

pip install optimal-classification-cutoffs

Optional Performance Boost:

# For 10-100× speedups with Numba JIT compilation
pip install optimal-classification-cutoffs[performance]

# For Jupyter examples and visualizations  
pip install optimal-classification-cutoffs[examples]

Python 3.14+ Support: The package works on all Python versions 3.12+, including cutting-edge Python 3.14. Numba acceleration is optional and will automatically fall back to pure Python when unavailable.

Quick Start

Binary Classification: 40%+ F1 Improvement

from optimal_cutoffs import optimize_thresholds

# Your existing model probabilities
y_scores = model.predict_proba(X_test)[:, 1]

# Find optimal threshold (exact solution, O(n log n))
result = optimize_thresholds(y_true, y_scores, metric="f1")
print(f"Optimal threshold: {result.threshold:.3f}")  # e.g., 0.127 not 0.5!
print(f"Expected F1: {result.scores[0]:.3f}")

# Make optimal predictions
y_pred = result.predict(y_scores_new)

Multiclass Classification: Per-Class Thresholds

import numpy as np
from optimal_cutoffs import optimize_thresholds

# Multiclass probabilities (n_samples, n_classes)
y_scores = model.predict_proba(X_test)

# Automatically detects multiclass, optimizes per-class thresholds
result = optimize_thresholds(y_true, y_scores, metric="f1")
print(f"Per-class thresholds: {result.thresholds}")
print(f"Task detected: {result.task.value}")  # "multiclass"
print(f"Method used: {result.method}")        # "coord_ascent"

# Predictions use optimal thresholds
y_pred = result.predict(y_scores_new)

Cost-Sensitive Decisions: No Thresholds Needed

from optimal_cutoffs import optimize_decisions

# Cost matrix: rows=true class, cols=predicted class
# False negatives cost 10x more than false positives
cost_matrix = [[0, 1], [10, 0]]

result = optimize_decisions(y_probs, cost_matrix)
y_pred = result.predict(y_probs_new)  # Bayes-optimal decisions

API Overview

Clean, minimal API designed around user jobs-to-be-done:

Core Functions (The Only Two You Need)

from optimal_cutoffs import optimize_thresholds, optimize_decisions

# For threshold-based optimization (F1, precision, recall, etc.)
result = optimize_thresholds(y_true, y_scores, metric="f1")

# For cost matrix optimization (no thresholds)  
result = optimize_decisions(y_probs, cost_matrix)

Progressive Disclosure: Power When You Need It

from optimal_cutoffs import metrics, bayes, cv, algorithms

# Custom metrics
custom_f2 = lambda tp, tn, fp, fn: (5*tp) / (5*tp + 4*fn + fp)
metrics.register("f2", custom_f2)

# Cross-validation with threshold tuning
thresholds = cv.cross_validate(model, X, y, metric="f1")

# Advanced algorithms
result = algorithms.multiclass.coordinate_ascent(y_true, y_scores)

Auto-Selection with Explanations

Everything is explainable. The library tells you what it detected and why:

result = optimize_thresholds(y_true, y_scores)  # All defaults

print(f"Task: {result.task.value}")           # "binary" (auto-detected)
print(f"Method: {result.method}")             # "sort_scan" (O(n log n))
print(f"Notes: {result.notes}")               # ["Detected binary task...", "Selected sort_scan for O(n log n) optimization..."]

Why This Works: Mathematical Foundations

Piecewise Structure

Most metrics (F1, precision, recall) are piecewise-constant in threshold τ. Sorting scores once enables exact optimization in O(n log n) time.

Bayes Decision Theory

Under calibrated probabilities, optimal binary thresholds have closed form:

τ* = cost_fp / (cost_fp + cost_fn)

Independent of class priors, depends only on cost ratio.

Multiclass Extensions

One-vs-Rest: Independent per-class thresholds (macro averaging)
Coordinate Ascent: Coupled thresholds for single-label consistency
General Costs: Skip thresholds, apply Bayes rule on probability vectors

Performance

O(n log n) exact optimization for piecewise metrics
O(1) closed-form solutions for cost-sensitive objectives
Optional Numba acceleration (10-100× speedups) with automatic pure Python fallback
Python 3.14+ compatible - works on all modern Python versions
640+ tests ensuring correctness

Typical speedups: 10-100× faster than grid search, with exact solutions. Performance optimizations are optional - core functionality works everywhere.

Complete Example: Real Impact

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from optimal_cutoffs import optimize_thresholds

# Realistic imbalanced dataset (like fraud detection)
X, y = make_classification(n_samples=1000, weights=[0.9, 0.1], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)

# Train any classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_scores = model.predict_proba(X_test)[:, 1]

# ❌ Default threshold
y_pred_default = (y_scores >= 0.5).astype(int)
f1_default = f1_score(y_test, y_pred_default)
print(f"Default F1: {f1_default:.3f}")  # ~0.65

# ✅ Optimal threshold  
result = optimize_thresholds(y_test, y_scores, metric="f1")
y_pred_optimal = result.predict(y_scores)
f1_optimal = f1_score(y_test, y_pred_optimal)
print(f"Optimal F1: {f1_optimal:.3f}")  # ~0.89

improvement = (f1_optimal - f1_default) / f1_default * 100
print(f"Improvement: {improvement:+.1f}%")  # ~+40%

When to Use This

Perfect for:

Imbalanced classification (fraud, medical, spam)
Cost-sensitive decisions (business impact)
Performance-critical applications (exact solutions)
Research requiring theoretical optimality

Not needed for:

Perfectly balanced classes with symmetric costs
Problems requiring probabilistic outputs
Uncalibrated models (calibrate first)

Advanced Usage

Cross-Validation with Thresholds

from optimal_cutoffs import cv

# Thresholds are hyperparameters - validate them!
scores = cv.cross_validate(
    model, X, y, 
    metric="f1",
    cv=5,
    return_thresholds=True
)

Custom Metrics

from optimal_cutoffs import metrics

# Register custom Fβ score
def f_beta(tp, tn, fp, fn, beta=2.0):
    return (1 + beta**2) * tp / ((1 + beta**2) * tp + beta**2 * fn + fp)

metrics.register("f2", lambda tp, tn, fp, fn: f_beta(tp, tn, fp, fn, 2.0))

# Use like any built-in metric
result = optimize_thresholds(y_true, y_scores, metric="f2")

Multiple Metrics

# Optimize different metrics
f1_result = optimize_thresholds(y_true, y_scores, metric="f1")
precision_result = optimize_thresholds(y_true, y_scores, metric="precision")

print(f"F1 optimal τ: {f1_result.threshold:.3f}")
print(f"Precision optimal τ: {precision_result.threshold:.3f}")

References

Lipton et al. (2014) Optimal Thresholding of Classifiers to Maximize F1
Elkan (2001) The Foundations of Cost-Sensitive Learning
Dinkelbach (1967) Nonlinear Fractional Programming
Platt (1999) Probabilistic Outputs for Support Vector Machines

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Science/Research
Programming Language
Topic
- Scientific/Engineering :: Mathematics

Release history Release notifications | RSS feed

This version

2.0.0

Dec 26, 2025

0.6.1

Nov 1, 2025

0.6.0

Oct 6, 2025

0.5.0

Sep 30, 2025

0.4.0

Sep 30, 2025

0.3.0

Sep 26, 2025

0.2.1

Sep 25, 2025

0.2.0

Sep 25, 2025

0.1.0

Aug 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimal_classification_cutoffs-2.0.0.tar.gz (56.4 kB view details)

Uploaded Dec 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optimal_classification_cutoffs-2.0.0-py3-none-any.whl (69.1 kB view details)

Uploaded Dec 26, 2025 Python 3

File details

Details for the file optimal_classification_cutoffs-2.0.0.tar.gz.

File metadata

Download URL: optimal_classification_cutoffs-2.0.0.tar.gz
Upload date: Dec 26, 2025
Size: 56.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimal_classification_cutoffs-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2c054b95afdacee7c494fa2f03b6e67a9daa23c51d20371c1f5945d071328e29`
MD5	`4f888fa5ee095e9ce5dba6462f46598d`
BLAKE2b-256	`6f4adb293d671dec0a985cd47bfbf762fec7b1083d717d77c12503dc4df34f63`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimal_classification_cutoffs-2.0.0.tar.gz:

Publisher: python-publish.yml on finite-sample/optimal-classification-cutoffs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimal_classification_cutoffs-2.0.0.tar.gz
- Subject digest: 2c054b95afdacee7c494fa2f03b6e67a9daa23c51d20371c1f5945d071328e29
- Sigstore transparency entry: 779893020
- Sigstore integration time: Dec 26, 2025
Source repository:
- Permalink: finite-sample/optimal-classification-cutoffs@8b5cef542786085f2fb7ea7fca310e756e8436d9
- Branch / Tag: refs/heads/master
- Owner: https://github.com/finite-sample
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@8b5cef542786085f2fb7ea7fca310e756e8436d9
- Trigger Event: workflow_dispatch

File details

Details for the file optimal_classification_cutoffs-2.0.0-py3-none-any.whl.

File metadata

Download URL: optimal_classification_cutoffs-2.0.0-py3-none-any.whl
Upload date: Dec 26, 2025
Size: 69.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimal_classification_cutoffs-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`864e7122828f69eae705fe59226c0190cc114fede2779be12f66f2397709ba38`
MD5	`667e5419c39fe6ee0de212c49cb0a8ed`
BLAKE2b-256	`8f3dc8bd25391ec7da1a33b9e17af0024ed4c25d0e1213ee1769f73741a361fc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimal_classification_cutoffs-2.0.0-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/optimal-classification-cutoffs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimal_classification_cutoffs-2.0.0-py3-none-any.whl
- Subject digest: 864e7122828f69eae705fe59226c0190cc114fede2779be12f66f2397709ba38
- Sigstore transparency entry: 779893022
- Sigstore integration time: Dec 26, 2025
Source repository:
- Permalink: finite-sample/optimal-classification-cutoffs@8b5cef542786085f2fb7ea7fca310e756e8436d9
- Branch / Tag: refs/heads/master
- Owner: https://github.com/finite-sample
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@8b5cef542786085f2fb7ea7fca310e756e8436d9
- Trigger Event: workflow_dispatch

optimal-classification-cutoffs 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Optimal Classification Cutoffs

The Problem: Default 0.5 Thresholds Are Wrong

Installation

Quick Start

Binary Classification: 40%+ F1 Improvement

Multiclass Classification: Per-Class Thresholds

Cost-Sensitive Decisions: No Thresholds Needed

API Overview

Core Functions (The Only Two You Need)

Progressive Disclosure: Power When You Need It

Auto-Selection with Explanations

Why This Works: Mathematical Foundations

Piecewise Structure

Bayes Decision Theory

Multiclass Extensions

Performance

Complete Example: Real Impact

When to Use This

Advanced Usage

Cross-Validation with Thresholds

Custom Metrics

Multiple Metrics

References

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance