Utilities for computing optimal classification cutoffs for binary and multiclass classification

These details have not been verified by PyPI

Project description

Optimal Classification Cut-Offs

Select optimal probability thresholds for binary and multiclass classification.
Maximize F1, precision, recall, accuracy, or custom cost-sensitive metrics using efficient algorithms designed for piecewise-constant classification metrics.

🚀 Quick Start

Installation

pip install optimal-classification-cutoffs

Binary Classification

from optimal_cutoffs import ThresholdOptimizer

# Your true labels and predicted probabilities
y_true = [0, 1, 1, 0, 1]
y_prob = [0.2, 0.8, 0.7, 0.3, 0.9]

# Find optimal threshold for F1 score
optimizer = ThresholdOptimizer(objective="f1")
optimizer.fit(y_true, y_prob)
print(f"Optimal threshold: {optimizer.threshold_:.3f}")

# Make predictions
y_pred = optimizer.predict(y_prob)
print(f"Predictions: {y_pred}")  # [0, 1, 1, 0, 1]

Multiclass Classification

import numpy as np

# Multiclass data: 3 classes, 5 samples
y_true = [0, 1, 2, 0, 1] 
y_prob = np.array([
    [0.7, 0.2, 0.1],  # Sample 0: most likely class 0
    [0.1, 0.8, 0.1],  # Sample 1: most likely class 1
    [0.1, 0.1, 0.8],  # Sample 2: most likely class 2
    [0.6, 0.3, 0.1],  # Sample 3: most likely class 0
    [0.2, 0.7, 0.1],  # Sample 4: most likely class 1
])

# Find optimal per-class thresholds
optimizer = ThresholdOptimizer(objective="f1")
optimizer.fit(y_true, y_prob)
print(f"Per-class thresholds: {optimizer.threshold_}")

# Make predictions
y_pred = optimizer.predict(y_prob)
print(f"Predicted classes: {y_pred}")

Cost-Sensitive Optimization ✨ New in v0.3.0

# Medical diagnosis: false negatives cost 10x more than false positives
threshold = get_optimal_threshold(
    y_true, y_prob, 
    utility={"tp": 50, "tn": 0, "fp": -1, "fn": -10}
)

# Or for calibrated probabilities (no training data needed)
from optimal_cutoffs import bayes_threshold_from_costs
optimal_threshold = bayes_threshold_from_costs(
    cost_fp=1.0,    # Cost of false positive
    cost_fn=10.0,   # Cost of false negative (10x worse)
    benefit_tp=50.0 # Benefit of catching true positive
)

🔧 Key Features

⚡ Optimized Algorithms for Piecewise Metrics

Classification metrics like F1, accuracy, precision, and recall are piecewise-constant functions that only change when thresholds cross unique probability values. Standard optimizers fail because these metrics have zero gradients everywhere.

Our solution: Specialized algorithms that guarantee global optima:

sort_scan: O(n log n) exact algorithm, 50-100x faster than naive approaches
coord_ascent: Advanced multiclass optimizer for coupled single-label predictions
auto: Intelligent method selection based on your data

F1 Score Piecewise Behavior

F1 score only changes at unique probability values. Our algorithms find the true optimum.

💰 Cost-Sensitive Optimization ✨ New in v0.3.0

Handle scenarios where different errors have different costs:

# Financial fraud: missing fraud (FN) costs much more than false alarms (FP)
from optimal_cutoffs import make_cost_metric

cost_metric = make_cost_metric(
    cost_fp=1.0,     # False positive cost (false alarm)
    cost_fn=100.0,   # False negative cost (missed fraud)
    benefit_tp=500.0  # True positive benefit (caught fraud)
)

threshold = get_optimal_threshold(y_true, y_prob, metric=cost_metric)

Bayes-Optimal Thresholds: For calibrated probabilities, calculate optimal thresholds directly without training data:

from optimal_cutoffs import bayes_threshold_from_utility

# Direct calculation for calibrated probabilities
threshold = bayes_threshold_from_utility(
    U_tp=50,  U_tn=0,   # Utilities for correct predictions
    U_fp=-1,  U_fn=-10  # Utilities for errors  
)

🎯 Multiclass Strategies

One-vs-Rest (Default): Independent per-class thresholds

thresholds = get_optimal_multiclass_thresholds(y_true, y_prob, method="auto")

Coordinate Ascent: Coupled optimization for single-label consistency

# Better for imbalanced datasets
thresholds = get_optimal_multiclass_thresholds(y_true, y_prob, method="coord_ascent")

🤔 When to Use What?

Threshold Optimization vs Calibration

Use Threshold Optimization When:	Use Calibration When:
Maximizing classification metrics (F1, precision)	Need reliable probability estimates
Making binary decisions for deployment	Comparing model confidence
Handling class imbalance	Converting scores to probabilities

Best Practice: Use Both Together

from sklearn.calibration import CalibratedClassifierCV

# 1. Calibrate probabilities first
calibrated_model = CalibratedClassifierCV(base_model)
y_prob = calibrated_model.predict_proba(X_val)[:, 1]

# 2. Optimize threshold on calibrated probabilities  
optimizer = ThresholdOptimizer(objective="f1")
optimizer.fit(y_val, y_prob)

# Result: Reliable probabilities AND optimal decisions

Cost-Sensitive vs Metric Optimization

Use Cost-Sensitive When:	Use Metric Optimization When:
Different errors have different costs	All errors are equally bad
Business impact varies by error type	Optimizing standard metrics (F1, accuracy)
Medical, financial, safety applications	General ML model evaluation

Method Selection Guide

Method	Best For	Speed	Guarantees
`"auto"`	Most cases	Fast	Selects best method automatically
`"sort_scan"`	Binary piecewise metrics	Very Fast	Exact global optimum
`"coord_ascent"`	Multiclass, imbalanced data	Medium	Local optimum, single-label consistent
`"minimize"`	Custom smooth metrics	Medium	Local optimum

📖 API Reference

Core Functions

`ThresholdOptimizer(objective="f1", method="auto")`

Scikit-learn style threshold optimization

optimizer = ThresholdOptimizer(objective="f1", method="auto")
optimizer.fit(y_true, y_prob)
y_pred = optimizer.predict(y_prob)

Auto-detects binary vs multiclass inputs
Methods: "auto", "sort_scan", "coord_ascent", "minimize", "gradient"
Returns: Fitted object with threshold_ attribute

`get_optimal_threshold(y_true, y_prob, metric="f1", method="auto", **kwargs)`

Functional interface for threshold optimization

threshold = get_optimal_threshold(y_true, y_prob, metric="f1")

New in v0.3.0: utility, minimize_cost, and bayes parameters
Returns: Optimal threshold (float for binary, array for multiclass)

Cost-Sensitive Functions ✨ New in v0.3.0

`bayes_threshold_from_utility(U_tp, U_tn, U_fp, U_fn)`

Calculate Bayes-optimal threshold for calibrated probabilities

threshold = bayes_threshold_from_utility(U_tp=1, U_tn=0, U_fp=-1, U_fn=-5)

`bayes_threshold_from_costs(cost_fp, cost_fn, benefit_tp=0, benefit_tn=0)`

Convenience wrapper for cost-based optimization

threshold = bayes_threshold_from_costs(cost_fp=1, cost_fn=10, benefit_tp=50)

`make_cost_metric(cost_fp, cost_fn, benefit_tp=0, benefit_tn=0)`

Create custom cost-sensitive metrics

custom_metric = make_cost_metric(cost_fp=1, cost_fn=5, benefit_tp=10)
threshold = get_optimal_threshold(y_true, y_prob, metric=custom_metric)

`make_linear_counts_metric(w_tp=0, w_tn=0, w_fp=0, w_fn=0)`

Create metrics from confusion matrix weights

profit_metric = make_linear_counts_metric(w_tp=100, w_fp=-10, w_fn=-50)

Multiclass Functions

`get_optimal_multiclass_thresholds(y_true, y_prob, metric="f1", method="auto")`

Multiclass threshold optimization

thresholds = get_optimal_multiclass_thresholds(y_true, y_prob, method="coord_ascent")

Utility Functions

Click to expand utility functions

`get_confusion_matrix(y_true, y_prob, threshold)`

tp, tn, fp, fn = get_confusion_matrix(y_true, y_prob, 0.5)

`get_multiclass_confusion_matrix(y_true, y_prob, thresholds)`

cms = get_multiclass_confusion_matrix(y_true, y_prob, [0.3, 0.5, 0.7])

`register_metric(name, func)` and `register_metrics(metrics_dict)`

@register_metric("custom_f2")
def f2_score(tp, tn, fp, fn):
    return (5 * tp) / (5 * tp + 4 * fn + fp)

📊 Examples

Basic Examples

Binary classification - Getting started
Multiclass classification - Multi-class optimization

Advanced Examples

Cost-sensitive medical diagnosis ✨ New
Financial fraud detection ✨ New
Cross-validation workflows - Robust evaluation
Integration with sklearn - Production pipelines

🧮 Theory & Background

Why do standard optimizers fail? Classification metrics are piecewise-constant functions with zero gradients everywhere except at breakpoints. Traditional optimizers get trapped in flat regions and miss the global optimum.

Our innovation: Exact algorithms that leverage the mathematical structure of classification metrics. The sort-and-scan method achieves O(n log n) complexity while guaranteeing global optimality for piecewise metrics.

For detailed mathematical explanations and interactive visualizations, see our comprehensive documentation.

🔬 Advanced Methods

Coordinate Ascent for Multiclass: Unlike One-vs-Rest approaches, our coordinate ascent method maintains single-label consistency by coupling classes through argmax(P - τ) decision rules. This often improves macro-F1 on imbalanced datasets.

Dinkelbach Fractional Programming: For expected F-beta optimization under calibrated probabilities, the Dinkelbach method provides ultra-fast exact solutions using the F1 threshold identity. Future release planned.

👨‍💻 Authors

Suriyan Laohaprapanon and Gaurav Sood

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.0

Dec 26, 2025

0.6.1

Nov 1, 2025

0.6.0

Oct 6, 2025

0.5.0

Sep 30, 2025

0.4.0

Sep 30, 2025

This version

0.3.0

Sep 26, 2025

0.2.1

Sep 25, 2025

0.2.0

Sep 25, 2025

0.1.0

Aug 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimal_classification_cutoffs-0.3.0.tar.gz (127.3 kB view details)

Uploaded Sep 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

optimal_classification_cutoffs-0.3.0-py3-none-any.whl (39.8 kB view details)

Uploaded Sep 26, 2025 Python 3

File details

Details for the file optimal_classification_cutoffs-0.3.0.tar.gz.

File metadata

Download URL: optimal_classification_cutoffs-0.3.0.tar.gz
Upload date: Sep 26, 2025
Size: 127.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimal_classification_cutoffs-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`0a7dd425d1ee263258a8fc838e6038a9ad20e43ba5e231fa0117d64f66c82fbc`
MD5	`50db44fb721e9c76b6ce2f223e61faf0`
BLAKE2b-256	`a4fe3e2f1824d8a72d6c1599598bfda5c6c182109de069f5c9b605d26b5685af`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimal_classification_cutoffs-0.3.0.tar.gz:

Publisher: python-publish.yml on finite-sample/optimal_classification_cutoffs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimal_classification_cutoffs-0.3.0.tar.gz
- Subject digest: 0a7dd425d1ee263258a8fc838e6038a9ad20e43ba5e231fa0117d64f66c82fbc
- Sigstore transparency entry: 562355922
- Sigstore integration time: Sep 26, 2025
Source repository:
- Permalink: finite-sample/optimal_classification_cutoffs@6c218a816a9408ea9c205acb19154738f59b424e
- Branch / Tag: refs/heads/master
- Owner: https://github.com/finite-sample
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@6c218a816a9408ea9c205acb19154738f59b424e
- Trigger Event: workflow_dispatch

File details

Details for the file optimal_classification_cutoffs-0.3.0-py3-none-any.whl.

File metadata

Download URL: optimal_classification_cutoffs-0.3.0-py3-none-any.whl
Upload date: Sep 26, 2025
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for optimal_classification_cutoffs-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d651972b94e034ed3884147910815c2c34d3c03539a9e69854081192ccd1b4a`
MD5	`38894383b0225ba7bf919ebd7e83f50a`
BLAKE2b-256	`a54a2ab0a43d125ff407dcd45c3a8e689562819159067af4df1fc992a57a20bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for optimal_classification_cutoffs-0.3.0-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/optimal_classification_cutoffs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: optimal_classification_cutoffs-0.3.0-py3-none-any.whl
- Subject digest: 6d651972b94e034ed3884147910815c2c34d3c03539a9e69854081192ccd1b4a
- Sigstore transparency entry: 562355941
- Sigstore integration time: Sep 26, 2025
Source repository:
- Permalink: finite-sample/optimal_classification_cutoffs@6c218a816a9408ea9c205acb19154738f59b424e
- Branch / Tag: refs/heads/master
- Owner: https://github.com/finite-sample
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@6c218a816a9408ea9c205acb19154738f59b424e
- Trigger Event: workflow_dispatch

optimal-classification-cutoffs 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Optimal Classification Cut-Offs

🚀 Quick Start

Installation

Binary Classification

Multiclass Classification

Cost-Sensitive Optimization ✨ New in v0.3.0

🔧 Key Features

⚡ Optimized Algorithms for Piecewise Metrics

💰 Cost-Sensitive Optimization ✨ New in v0.3.0

🎯 Multiclass Strategies

🤔 When to Use What?

Threshold Optimization vs Calibration

Cost-Sensitive vs Metric Optimization

Method Selection Guide

📖 API Reference

Core Functions

ThresholdOptimizer(objective="f1", method="auto")

get_optimal_threshold(y_true, y_prob, metric="f1", method="auto", **kwargs)

Cost-Sensitive Functions ✨ New in v0.3.0

bayes_threshold_from_utility(U_tp, U_tn, U_fp, U_fn)

bayes_threshold_from_costs(cost_fp, cost_fn, benefit_tp=0, benefit_tn=0)

make_cost_metric(cost_fp, cost_fn, benefit_tp=0, benefit_tn=0)

make_linear_counts_metric(w_tp=0, w_tn=0, w_fp=0, w_fn=0)

Multiclass Functions

get_optimal_multiclass_thresholds(y_true, y_prob, metric="f1", method="auto")

Utility Functions

get_confusion_matrix(y_true, y_prob, threshold)

get_multiclass_confusion_matrix(y_true, y_prob, thresholds)

register_metric(name, func) and register_metrics(metrics_dict)

📊 Examples

Basic Examples

Advanced Examples

🧮 Theory & Background

🔬 Advanced Methods

👨‍💻 Authors

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`ThresholdOptimizer(objective="f1", method="auto")`

`get_optimal_threshold(y_true, y_prob, metric="f1", method="auto", **kwargs)`

`bayes_threshold_from_utility(U_tp, U_tn, U_fp, U_fn)`

`bayes_threshold_from_costs(cost_fp, cost_fn, benefit_tp=0, benefit_tn=0)`

`make_cost_metric(cost_fp, cost_fn, benefit_tp=0, benefit_tn=0)`

`make_linear_counts_metric(w_tp=0, w_tn=0, w_fp=0, w_fn=0)`

`get_optimal_multiclass_thresholds(y_true, y_prob, metric="f1", method="auto")`

`get_confusion_matrix(y_true, y_prob, threshold)`

`get_multiclass_confusion_matrix(y_true, y_prob, thresholds)`

`register_metric(name, func)` and `register_metrics(metrics_dict)`