Probabilistic Scoring List classifier
Project description
Probabilistic Scoring Lists
Probabilistic scoring lists are incremental models that evaluate one feature of the dataset at a time. PSLs can be seen as a extension to scoring systems in two ways:
- they can be evaluated at any stage allowing to trade of model complexity and prediction speed.
- they provide a probability distribution over scores instead of hard thresholds.
Scoring Systems are used as decision support for human experts in medical or law domains.
The implementation adheres to the sklearn-api.
Install
pip install scikit-psl
Usage
For examples have a look at the examples
folder, but here is a simple example
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from skpsl import ProbabilisticScoringList
# Generating synthetic data with continuous features and a binary target variable
X, y = make_classification(n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)
psl = ProbabilisticScoringList({-1, 1, 2})
psl.fit(X_train, y_train)
print(f"Brier score: {psl.score(X_test, y_test):.4f}")
"""
Brier score: 0.2438 (lower is better)
"""
df = psl.inspect(5)
print(df.to_string(index=False, na_rep="-", justify="center", float_format=lambda x: f"{x:.2f}"))
"""
Stage Threshold Score T = -2 T = -1 T = 0 T = 1 T = 2 T = 3 T = 4 T = 5
0 - - - - 0.51 - - - - -
1 >-2.4245 2.00 - - 0.00 - 0.63 - - -
2 >-0.9625 -1.00 - 0.00 0.00 0.48 1.00 - - -
3 >0.4368 -1.00 0.00 0.00 0.12 0.79 1.00 - - -
4 >-0.9133 1.00 0.00 0.00 0.12 0.12 0.93 1.00 - -
5 >2.4648 2.00 0.00 0.00 0.07 0.07 0.92 1.00 1.00 1.00
"""
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.3.1 - 2023-09-12
Fixed
- PSL is now correctly handles when all instances belong to the negative class
- #1 if the first feature is assigned a negative score, it is now assigned the most negative score
0.3.0 - 2023-08-10
Added
- PSL classifier can now run with continuous data and optimally (wrt. expected entropy) select thresholds to binarize the data
Changed
- Significantly improved optimum calculation for MinEntropyBinarizer (the same optimization algorithm is shared with the psls internal binarization algorithm)
0.2.0 - 2023-08-10
Added
- PSL classifier
- introduced parallelization
- implemented l-step lookahead
- simple inspect(·) method that creates a tabular representation of the model
0.1.0 - 2023-08-08
Added
- Initial implementation of the PSL algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scikit_psl-0.3.1.tar.gz
(10.3 kB
view hashes)
Built Distribution
scikit_psl-0.3.1-py3-none-any.whl
(11.0 kB
view hashes)
Close
Hashes for scikit_psl-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a982430d36b6ced9bba4af2f7a59839f270c5fc6ed9cbf95614510b5be8e4d6e |
|
MD5 | c896f2a103f4cd3f6daf4084282dc1d5 |
|
BLAKE2b-256 | 0c42a52c771186dd026235fef880058f1451b297fc3b5fa4ddf6c4dba3e29775 |