PBS and PLL are superior evaluation metrics for probabilistic classifiers, fixing flaws in Brier Score (MSE) and Log Loss (Cross-Entropy). Strictly proper, consistent, and better for model selection, early stopping, and checkpointing.
Project description
Superior Scoring Rules: Better Metrics for Probabilistic Evaluation
Problem with Traditional Metrics
Accuracy-based metrics (Accuracy, F1) treat all correct predictions equally, ignoring confidence. In high-stakes domains, confidence calibration is critical:
-
Cancer Diagnosis: 51% vs. 99% confidence in malignancy should not be treated differently.
-
ICU Triage & Mortality: Overconfident mispredictions risk patient safety.
-
Autonomous Vehicles: Decisions depend on uncertainty about obstacles.
-
Financial Risk Modeling: Pricing and investment hinge on calibrated probabilities.
-
Security Threat Detection: High-confidence false negatives undermine defenses.
Thus, Accuracy or F1 Score alone is insufficient: they ignore the confidence of predictions.
Limitations of MSE & Cross-Entropy
Mean Squared Error (Brier Score) and Cross-Entropy (Log Loss) are strictly proper scoring rules, rewarding calibration. However, they can still favor incorrect predictions over correct ones. Example:
| Vector | True Label (Y) | Predicted Probabilities (P) | Brier Score | Log Loss | State |
|---|---|---|---|---|---|
A |
[0, 1, 0] |
[0.33, 0.34, 0.33] |
0.6534 | 0.4685 | Correct |
B |
[0, 1, 0] |
[0.51, 0.49, 0.00] |
0.5202 | 0.3098 | Incorrect |
Both MSE and Log Loss favor B over A, contradicting the principle of rewarding correct predictions.
Our Solution: PBS & PLL
To ensure correct predictions always receive better scores, we introduce a penalty term for misclassifications:
-
Penalized Brier Score (PBS)
-
Penalized Logarithmic Loss (PLL)
These metrics are both strictly proper and superior (never favor wrong over right).
Quick Start
Installation from PyPI
pip install superior-scoring-rules
Install from Source (Development)
Clone the repository:
git clone https://github.com/Ruhallah93/superior-scoring-rules.git
Basic Usage
import tensorflow as tf
from superior_scoring_rules import pbs, pll
# Sample data (batch_size=3, num_classes=4)
y_true = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
y_pred = tf.constant([[0.9, 0.05, 0.05, 0],
[0.1, 0.8, 0.05, 0.05],
[0.1, 0.1, 0.1, 0.7]])
print("PBS:", pbs(y_true, y_pred).numpy())
print("PLL:", pll(y_true, y_pred).numpy())
Early Stopping & Checkpointing
Use PBS/PLL instead of val_loss:
class PBSCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
logs = logs or {}
logs['val_pbs'] = pbs(self.validation_data[1], self.model.predict(self.validation_data[0]))
# or
logs['val_pll'] = pll(self.validation_data[1], self.model.predict(self.validation_data[0]))
model.fit(..., callbacks=[PBSCallback(),
tf.keras.callbacks.EarlyStopping(monitor='val_pbs', patience=5, mode='min'),
tf.keras.callbacks.ModelCheckpoint('best.h5', monitor='val_pbs', save_best_only=True)
])
Paper & Citation
-
Superior scoring rules for probabilistic evaluation of single-label multi-class classification tasks
-
arXiv: 2407.17697
@article{ahmadian2025superior,
title={Superior scoring rules for probabilistic evaluation of single-label multi-class classification tasks},
author={Ahmadian, Rouhollah and Ghatee, Mehdi and Wahlstr{\"o}m, Johan},
journal={International Journal of Approximate Reasoning},
pages={109421},
year={2025},
publisher={Elsevier}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file superior_scoring_rules-1.0.4.tar.gz.
File metadata
- Download URL: superior_scoring_rules-1.0.4.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bd5b4284b8dbe6a0971c2f2ea2e5eaf6be0df65d23c7e6ac3609a71a2054b20
|
|
| MD5 |
f3fbad666fc0dfb4b0329f926a5d8a3d
|
|
| BLAKE2b-256 |
23d39c8016ed979a6d45aaecb9c1a2d08dabe298606cce3d11311a94b276bc3c
|
File details
Details for the file superior_scoring_rules-1.0.4-py3-none-any.whl.
File metadata
- Download URL: superior_scoring_rules-1.0.4-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51020e75a11bf53ac5879e6e0691561a915d51b9cc64102c63326c1c66670cd4
|
|
| MD5 |
eea54fb794c7a0be5e7572d1b7b19521
|
|
| BLAKE2b-256 |
ac41f04ec09c3af1cedbe438ad0c852ce1c5d1a6de39f0e12d10e07d3c857a4e
|