Explainable imbalanceD learninG compARatOr
Project description
Explainable imbalanceD learninG compARatOr
Overview
The usage of many balancing methods like Random Undersampling, Random Oversampling, SMOTE, NearMiss is a very popular solution when dealing with imbalanced data. However, a question can be posed of whether these techniques can change the model behaviour or the relationships present in data.
As there are many kinds of Machine Learning models, this package provides model-agnostic tools to investigate the model behaviour and its changes. These tools are also known as Explainable Artificial Intelligence (XAI) tools and include techniques such as Partial Dependence Profile (PDP), Accumulated Local Effects (ALE) and Variable Importance (VI).
Apart from that, the package implements novel methods to compare the explanations, which are Standard Deviation of Distances (for PDP and ALE) and the Wilcoxon statistical test (for VI).
Generally speaking, this package aims to giving a user-friendly interface to investigate whether the described phenomena take place.
The package was written in Python and consists of four modules: dataset, balancing, model and explain. It provides a simple and user-friendly interface which aims to automate the process of data balancing with different methods, training Machine Learning models and calculating PDP/ALE/VI explanations. The package can be used for one input dataset or for a number of datasets arranged in arrays or nested arrays.
Technologies
The package was written in Python and was checked to be compatible with Python 3.8, Python 3.9 and Python 3.10.
It uses most popular libraries for Machine Learning in Python:
- pandas, NumPy
- scikit-learn, xgboost
- imbalanced-learn
- dalex
- scipy, statsmodels
- matplotlib
- openml
User Manual
User Manual is available as a part of the documentation, here
Installation
The edgaro package is available on PyPI and can be installed by:
pip install edgaro
Documentation
The documentation is available at adrianstando.github.io/edgaro
Project purpose
This package was created for the purpose of my Engineering Thesis "The impact of data balancing on model behaviour with Explainable Artificial Intelligence tools in imbalanced classification problems".
This package was used in my paper "The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems", presented at the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications (LIDTA 2024).
Citation
@InProceedings{pmlr-v241-stando24a,
title = {The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems},
author = {Stando, Adrian and Cavus, Mustafa and Biecek, Przemyslaw},
booktitle = {Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications},
pages = {16--30},
year = {2024},
volume = {241},
series = {Proceedings of Machine Learning Research},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v241/stando24a/stando24a.pdf}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edgaro-1.0.2.2.tar.gz.
File metadata
- Download URL: edgaro-1.0.2.2.tar.gz
- Upload date:
- Size: 62.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65151d61242bb16fb4381c2de7d04dc8c5f2a98d3e39ba06aed07fc3d45e7fb3
|
|
| MD5 |
a1708123f6f293db02fca10d12cdee1b
|
|
| BLAKE2b-256 |
a3b46c1babbcfef961935cb5eb5012c7b4eb34b879247e237c6a987b041af2eb
|
File details
Details for the file edgaro-1.0.2.2-py3-none-any.whl.
File metadata
- Download URL: edgaro-1.0.2.2-py3-none-any.whl
- Upload date:
- Size: 61.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fae91440fb8006141627d44df37abbf097c4d7c53020dcc4a4ab3d57b7c4eac9
|
|
| MD5 |
bdc7a0a27878d5d0c373457c1b287895
|
|
| BLAKE2b-256 |
351d62313dee3d792acfbf0f1ff3c87ee22f3736042d6f3880681f77888ba6c4
|