Skip to main content

Explainable imbalanceD learninG compARatOr

Project description

Explainable imbalanceD learninG compARatOr

main-check

Overview

The usage of many balancing methods like Random Undersampling, Random Oversampling, SMOTE, NearMiss is a very popular solution when dealing with imbalanced data. However, a question can be posed of whether these techniques can change the model behaviour or the relationships present in data.

As there are many kinds of Machine Learning models, this package provides model-agnostic tools to investigate the model behaviour and its changes. These tools are also known as Explainable Artificial Intelligence (XAI) tools and include techniques such as Partial Dependence Profile (PDP), Accumulated Local Effects (ALE) and Variable Importance (VI).

Apart from that, the package implements novel methods to compare the explanations, which are Standard Deviation of Distances (for PDP and ALE) and the Wilcoxon statistical test (for VI).

Generally speaking, this package aims to giving a user-friendly interface to investigate whether the described phenomena take place.

The package was written in Python and consists of four modules: dataset, balancing, model and explain. It provides a simple and user-friendly interface which aims to automate the process of data balancing with different methods, training Machine Learning models and calculating PDP/ALE/VI explanations. The package can be used for one input dataset or for a number of datasets arranged in arrays or nested arrays.

Technologies

The package was written in Python and was checked to be compatible with Python 3.8, Python 3.9 and Python 3.10.

It uses most popular libraries for Machine Learning in Python:

  • pandas, NumPy
  • scikit-learn, xgboost
  • imbalanced-learn
  • dalex
  • scipy, statsmodels
  • matplotlib
  • openml

User Manual

User Manual is available as a part of the documentation, here

Installation

The edgaro package is available on PyPI and can be installed by:

pip install edgaro

Documentation

The documentation is available at adrianstando.github.io/edgaro

Project purpose

This package was created for the purpose of my Engineering Thesis "The impact of data balancing on model behaviour with Explainable Artificial Intelligence tools in imbalanced classification problems".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgaro-1.0.2.1.tar.gz (61.6 kB view details)

Uploaded Source

Built Distribution

edgaro-1.0.2.1-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file edgaro-1.0.2.1.tar.gz.

File metadata

  • Download URL: edgaro-1.0.2.1.tar.gz
  • Upload date:
  • Size: 61.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for edgaro-1.0.2.1.tar.gz
Algorithm Hash digest
SHA256 4ef296924b34eb06ccea89bf606571f6f7d5b8d692432b7416b563260d54bb6f
MD5 292e374d75376f84de0ed8e8444a8e8c
BLAKE2b-256 09f4ce4db16126edd931b8148f5c8e3b11db52dbb0995ada053cb14f4815e91e

See more details on using hashes here.

File details

Details for the file edgaro-1.0.2.1-py3-none-any.whl.

File metadata

  • Download URL: edgaro-1.0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for edgaro-1.0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ce93110987f93c0cce33f1de91a5b3a44bcd3c545de9b0eb42c6349a367c1a4e
MD5 55414237f22ec3ba0d79ca6d8aecf70f
BLAKE2b-256 023cc07d1d80dd5b53e3161e8f61b8e4767812282d19161a0958b88168853ed2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page