PBS and PLL are superior evaluation metrics for probabilistic classifiers, fixing flaws in Brier Score (MSE) and Log Loss (Cross-Entropy). Strictly proper, consistent, and better for model selection, early stopping, and checkpointing.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Superior Scoring Rules: Enhanced Calibrated Metrics for Probabilistic Evaluation

GitHub, arXiv Preprint

superior-scoring-rules is a Python library that provides strictly proper, confidence-aware evaluation metrics for probabilistic multi-class classification. Unlike traditional metrics such as Brier Score or Log Loss, these scoring rules penalize overconfident mispredictions, ensuring correct predictions are always scored better.

Why Accuracy, F1, Brier Score, and Log-Loss Fall Short in Probabilistic Classification

In many high-stakes applications, confidence calibration is critical. Traditional accuracy-based metrics (Accuracy, F1) ignore prediction confidence. Consider:

Cancer Diagnosis: Differentiating 51% vs. 99% confidence in malignancy
ICU Triage: Overconfident mispredictions risk patient safety
Autonomous Vehicles: Handling uncertainties about obstacles
Financial Risk Modeling: Pricing and investment decisions
Security Threat Detection: High-confidence false negatives

Accuracy or F1 score alone cannot capture this nuance.

Problem with Traditional Metrics

Accuracy-based metrics (Accuracy, F1) treat all correct predictions equally, ignoring confidence. In high-stakes domains, confidence calibration is critical:

Cancer Diagnosis: 51% vs. 99% confidence in malignancy should not be treated differently.
ICU Triage & Mortality: Overconfident mispredictions risk patient safety.
Autonomous Vehicles: Decisions depend on uncertainty about obstacles.
Financial Risk Modeling: Pricing and investment hinge on calibrated probabilities.
Security Threat Detection: High-confidence false negatives undermine defenses.

Thus, Accuracy or F1 Score alone is insufficient: they ignore the confidence of predictions.

Limitations of MSE & Cross-Entropy

Mean Squared Error (Brier Score) and Cross-Entropy (Log Loss) are strictly proper scoring rules, rewarding calibration. However, they can still favor incorrect predictions over correct ones. Example:

Vector	True Label (Y)	Predicted Probabilities (P)	Brier Score	Log Loss	State
`A`	`[0, 1, 0]`	`[0.33, 0.34, 0.33]`	0.6534	0.4685	Correct
`B`	`[0, 1, 0]`	`[0.51, 0.49, 0.00]`	0.5202	0.3098	Incorrect

Both MSE and Log Loss favor B over A, contradicting the principle of rewarding correct predictions.

Our Solution: PBS & PLL

To ensure correct predictions always receive better scores, we introduce a penalty term for misclassifications:

Penalized Brier Score (PBS)
Penalized Logarithmic Loss (PLL)

These metrics are both strictly proper and superior (never favor wrong over right).

Quick Start

Installation from PyPI

pip install superior-scoring-rules

Install from Source (Development)

Clone the repository:

git clone https://github.com/Ruhallah93/superior-scoring-rules.git

Basic Usage

import tensorflow as tf
from superior_scoring_rules import pbs, pll

# Sample data (batch_size=3, num_classes=4)
y_true = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
y_pred = tf.constant([[0.9, 0.05, 0.05, 0], 
                     [0.1, 0.8, 0.05, 0.05],
                     [0.1, 0.1, 0.1, 0.7]])

print("PBS:", pbs(y_true, y_pred).numpy())
print("PLL:", pll(y_true, y_pred).numpy())

Early Stopping & Checkpointing

Use PBS/PLL instead of val_loss:

class PBSCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        logs['val_pbs'] = pbs(self.validation_data[1], self.model.predict(self.validation_data[0]))
        # or
        logs['val_pll'] = pll(self.validation_data[1], self.model.predict(self.validation_data[0]))

model.fit(..., callbacks=[PBSCallback(),
    tf.keras.callbacks.EarlyStopping(monitor='val_pbs', patience=5, mode='min'),
    tf.keras.callbacks.ModelCheckpoint('best.h5', monitor='val_pbs', save_best_only=True)
])

Paper & Citation

@article{ahmadian2025superior,
  title={Superior scoring rules for probabilistic evaluation of single-label multi-class classification tasks},
  author={Ahmadian, Rouhollah and Ghatee, Mehdi and Wahlstr{\"o}m, Johan},
  journal={International Journal of Approximate Reasoning},
  pages={109421},
  year={2025},
  publisher={Elsevier}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.0.6

May 17, 2025

This version

1.0.5

May 17, 2025

1.0.4

May 17, 2025

1.0.3

May 17, 2025

1.0.2

May 17, 2025

1.0.1

May 17, 2025

1.0.0

May 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superior_scoring_rules-1.0.5.tar.gz (4.6 kB view details)

Uploaded May 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

superior_scoring_rules-1.0.5-py3-none-any.whl (4.6 kB view details)

Uploaded May 17, 2025 Python 3

File details

Details for the file superior_scoring_rules-1.0.5.tar.gz.

File metadata

Download URL: superior_scoring_rules-1.0.5.tar.gz
Upload date: May 17, 2025
Size: 4.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for superior_scoring_rules-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`1d67752063e0b57ce783f34e5a888b1378e58ff35f54ca6b8368e20ce610de5d`
MD5	`c445bd68f392cdc8fe2e10e33e296468`
BLAKE2b-256	`03ce505b7fc2bd0a1bc03562707374427071384cbf208f181408d5c3d5ce3bae`

See more details on using hashes here.

File details

Details for the file superior_scoring_rules-1.0.5-py3-none-any.whl.

File metadata

Download URL: superior_scoring_rules-1.0.5-py3-none-any.whl
Upload date: May 17, 2025
Size: 4.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for superior_scoring_rules-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`036ac8cf7eee005dc7a5343558ae8f4598d37f434961b3d0e44df0e689b7e607`
MD5	`05e5ad887447f057ea87bc5254f3a55f`
BLAKE2b-256	`d0c70339980481ee7b3ba2214a3f17eb1829ec93cce706278169e7b356dff30c`

See more details on using hashes here.

superior-scoring-rules 1.0.5

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Superior Scoring Rules: Enhanced Calibrated Metrics for Probabilistic Evaluation

Why Accuracy, F1, Brier Score, and Log-Loss Fall Short in Probabilistic Classification

Problem with Traditional Metrics

Limitations of MSE & Cross-Entropy

Our Solution: PBS & PLL

Quick Start

Installation from PyPI

Install from Source (Development)

Basic Usage

Early Stopping & Checkpointing

Paper & Citation

Related Topics

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes