Protree is a set of utilities for prototype selection in tree-based models and usage of prototypes in drift detection.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Actions

Protree

Protree is a Python library containing a set of utilities for prototype selection for tree-based models. The implemented tools include dataloaders, measures, prototype selection algorithms, and drift detection algorithms. The library is designed to be used with the scikit-learn and river libraries.

The library was implemented as a part of master's thesis by Jacek Karolczak under the supervision of prof. dr hab. Jerzy Stefanowski The main contribution of the thesis is the introduction of the measures to assess and compare prototypes, and the A-PETE and ANCIENT algorithms.

A-PETE: Adaptive Prototype Explanations of Tree Ensembles

A-PETE is a prototype selection method for ensemble of tree classifiers. A-PETE has been presented at PP-RAI 2024: 5th Polish Conference on Artificial Intelligence and is scheduled to be published in the conference proceedings.

Abstract

The need for interpreting machine learning models is addressed through prototype explanations within the context of tree ensembles. An algorithm named Adaptive Prototype Explanations of Tree Ensembles (A-PETE) is proposed to automatise the selection of prototypes for these classifiers. Its unique characteristics is using a specialised distance measure and a modified k-medoid approach. Experiments demonstrated its competitive predictive accuracy with respect to earlier explanation algorithms. It also provides a sufficient number of prototypes for the purpose of interpreting the random forest classifier.

How to use?

Here is an example of how to use the A-PETE algorithm to select prototypes for a random forest classifier. The example uses the Iris dataset and the random forest classifier from the scikit-learn library.

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

from protree.explainers import APete

x, y = load_iris(as_frame=True, return_X_y=True)
random_forest = RandomForestClassifier().fit(x, y)
explainer = APete(model=random_forest)
prototypes = explainer.select_prototypes(x)
print(prototypes)

Output:

{
  0:   
         sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
      0                5.1               3.5                1.4               0.2, 
  1:     
         sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
     64                5.6               2.9                3.6               1.3, 
  2:
         sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
    104                6.5               3.0                5.8               2.2
}

Using A-PETE? Cite us!

@article{karolczak2024apete,
  title={A-PETE: Adaptive Prototype Explanations of Tree Ensembles},
  author={Karolczak, Jacek and Stefanowski, Jerzy},
  journal={Progress in Polish Artificial Intelligence Research},
  volume={5},
  pages={2--8},
  year={2024},
  publisher={Warsaw University of Technology WUT Press}
}

ANCIENT: Algorithm for New Concept Identification and Explanation in Tree-based models

ANCIENT is proposed to detect drift and also explain the nature of the drift using prototypes. The algorithm leverages the principle that the measures describing data sub-populations from different distributions are more dissimilar than those from the same distribution.

How to use?

from river import forest

from protree.data.stream_generators import Plane
from protree.explainers import APete
from protree.detectors import Ancient

window_size = 300

model = forest.ARFClassifier()
detector = Ancient(model=model, prototype_selector=APete, window_length=window_size, alpha=0.55,
                   measure="minimal_distance", strategy="total", clock=16)
ds = Plane(drift_position=[1150, 1800, 2500])

for i, (x, y) in enumerate(ds):
    model.learn_one(x, y)
    detector.update(x, y)
    if detector.drift_detected:
        print(f"{int(i - window_size / 2)}) Drift detected!")

Output:

1175) Drift detected!
1687) Drift detected!
1751) Drift detected!
2583) Drift detected!

Experiments reproduction

The experiments conducted in the thesis can be reproduced using the scripts provided in the scripts directory, specifically:

scripts/experiment-static.py - allows to reproduce the experiments on static datasets, especially A-PETE effectiveness described in Chapter 3, as well as computing the measures described in Chapter 4,
scripts/experiment-steam-sklearn.py - enables reproducing the experiments on the stream datasets proving the proposed measures' effectiveness in explaining drifts described in Chapter 5.2,
scripts/experiment-detect-drift.py - makes it possible to reproduce the experiments on the stream datasets proving the ANCIENT algorithm's effectiveness in detecting drifts described in Chapter 5.3.

Usage of the scripts is described in the help message, which can be displayed by running the script with the --help flag, for instance:

python scripts/experiment-static.py --help

Output:

Usage: experiment-static.py [OPTIONS]
                            {breast_cancer|caltech|compass|diabetes|mnist|rhc}
                            {KMeans|G_KM|SM_A|SM_WA|SG|APete}

Options:
  -d, --directory TEXT   Directory where datasets are stored.
  -p, --n_features TEXT  The number of features to consider when looking for
                         the best split. Allowable values are 'sqrt', positive
                         ints and floats between 0 and 1.
  -t, --n_trees INTEGER  Number of trees. Allowable values are positive ints.
  -kw, --kw_args TEXT    Additional, keyword arguments for the explainer. Must
                         be in the form of key=value,key2=value2...
  --log                  A flag indicating whether to log the results to
                         wandb.
  --help                 Show this message and exit.

For instance:

python scripts/experiment-static.py diabetes APete -p sqrt

Output:

total_n_prototypes: 5
score/accuracy/train/random_forest: 1.0
score/accuracy/train/prototypes: 0.8391304347826087
score/accuracy/valid/random_forest: 0.7532467532467533
score/accuracy/valid/prototypes: 0.7532467532467533
score/accuracy/test/random_forest: 0.7337662337662338
score/accuracy/test/prototypes: 0.7142857142857143
score/gmean/train/random_forest: 1.0
score/gmean/train/prototypes: 0.8391304347826087
score/gmean/valid/random_forest: 0.7532467532467533
score/gmean/valid/prototypes: 0.7532467532467533
score/gmean/test/random_forest: 0.7337662337662337
score/gmean/test/prototypes: 0.7142857142857143
score/valid/fidelity: 0.948051948051948
score/valid/hubness: 0.9428732702115488
score/valid/mean_in_distribution: 0.09985377777777778
score/valid/mean_out_distribution: 0.01757222222222222
vector/valid/partial_in_distribution:
        0: [0.2089, 0.10524, 0.04924]
        1: [0.07157407407407407, 0.06431481481481481]
vector/valid/partial_hubnesses:
        0: 0.8907627072209605
        1: 0.9949838332021369
vector/valid/partial_out_distribution:
        0: [0.02162962962962963, 0.017, 0.02248148148148148]
        1: [0.01521, 0.01154]
vector/valid/consistent_votes:
        0: [0.9365079365079365, 1.0, 0.9047619047619048]
        1: [0.9230769230769231, 1.0]
vector/valid/voting_frequency:
        0: [0.4090909090909091, 0.16883116883116883, 0.13636363636363635]
        1: [0.16883116883116883, 0.11688311688311688]

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.2

Nov 11, 2025

This version

0.1.1

Oct 27, 2024

0.1.0

Oct 23, 2024

0.0.0.dev0 pre-release

Oct 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protree-0.1.1.tar.gz (25.7 kB view details)

Uploaded Oct 27, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

protree-0.1.1-py3-none-any.whl (29.1 kB view details)

Uploaded Oct 27, 2024 Python 3

File details

Details for the file protree-0.1.1.tar.gz.

File metadata

Download URL: protree-0.1.1.tar.gz
Upload date: Oct 27, 2024
Size: 25.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for protree-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`54ff84e7313c49ef9f3597c263f45dff50a7ac2e30efeb3a7e0e5797a9fa2a50`
MD5	`8b03d6b6c2ffad246262ca4054d372d0`
BLAKE2b-256	`0806a8a9f8fcd6e99f6a3dbae606864961616f55d9a1b29af3e6966040b14cf6`

See more details on using hashes here.

File details

Details for the file protree-0.1.1-py3-none-any.whl.

File metadata

Download URL: protree-0.1.1-py3-none-any.whl
Upload date: Oct 27, 2024
Size: 29.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for protree-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d777dba15102d5d2cad3582ec729d01201e1ee6f22bc098eaca91d470a7afbc`
MD5	`8c2e511481e70ca415fe8f9cd9039923`
BLAKE2b-256	`73acae668b3392391c34673d84ac4e6407168d16745cd54cf8f8908f9104ad81`

See more details on using hashes here.

protree 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Protree

A-PETE: Adaptive Prototype Explanations of Tree Ensembles

Abstract

How to use?

Using A-PETE? Cite us!

ANCIENT: Algorithm for New Concept Identification and Explanation in Tree-based models

How to use?

Experiments reproduction

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes