Probabilistic Scoring List classifier
Project description
Probabilistic Scoring Lists
Probabilistic scoring lists are incremental models that evaluate one feature of the dataset at a time. PSLs can be seen as a extension to scoring systems in two ways:
- they can be evaluated at any stage allowing to trade of model complexity and prediction speed.
- they provide a probability distribution over scores instead of hard thresholds.
Scoring Systems are used as decision support for human experts in medical or law domains.
The implementation adheres to the sklearn-api.
Install
pip install scikit-psl
Usage
For examples have a look at the examples
folder, but here is a simple example
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from skpsl import ProbabilisticScoringList
# Generating synthetic data with continuous features and a binary target variable
X, y = make_classification(n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)
psl = ProbabilisticScoringList({-1, 1, 2})
psl.fit(X_train, y_train)
print(f"Brier score: {psl.score(X_test, y_test):.4f}")
"""
Brier score: 0.2438 (lower is better)
"""
df = psl.inspect(5)
print(df.to_string(index=False, na_rep="-", justify="center", float_format=lambda x: f"{x:.2f}"))
"""
Stage Threshold Score T = -2 T = -1 T = 0 T = 1 T = 2 T = 3 T = 4 T = 5
0 - - - - 0.51 - - - - -
1 >-2.4245 2.00 - - 0.00 - 0.63 - - -
2 >-0.9625 -1.00 - 0.00 0.00 0.48 1.00 - - -
3 >0.4368 -1.00 0.00 0.00 0.12 0.79 1.00 - - -
4 >-0.9133 1.00 0.00 0.00 0.12 0.12 0.93 1.00 - -
5 >2.4648 2.00 0.00 0.00 0.07 0.07 0.92 1.00 1.00 1.00
"""
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.3.0 - 2023-08-10
Added
- PSL classifier can now run with continuous data and optimally (wrt. expected entropy) select thresholds to binarize the data
Changed
- Significantly improved optimum calculation for MinEntropyBinarizer (the same optimization algorithm is shared with the psls internal binarization algorithm)
0.2.0 - 2023-08-10
Added
- PSL classifier
- introduced parallelization
- implemented l-step lookahead
- simple inspect(·) method that creates a tabular representation of the model
0.1.0 - 2023-08-08
Added
- Initial implementation of the PSL algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scikit_psl-0.3.0.tar.gz
(11.9 kB
view hashes)
Built Distribution
scikit_psl-0.3.0-py3-none-any.whl
(10.9 kB
view hashes)
Close
Hashes for scikit_psl-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cf7b2f403d1766f4bc0a028ae7ca4a0a52d3147b8cff36c0c101d3b186d2142 |
|
MD5 | 4c43bb12046a039d1432cbd8e16c1a71 |
|
BLAKE2b-256 | fa3a705832764494a8b76a52b83ba02f96ac58797c7e55e0b05eb76082f1adf2 |