Skip to main content

Python package for tackling multiclass imbalance problems.

Project description

Build Status codecov Documentation Status PyPI version PyPI - Python Version PyPI license

multi-imbalance

Multi-class imbalance is a common problem occurring in real-world supervised classifications tasks. While there has already been some research on the specialized methods aiming to tackle that challenging problem, most of them still lack coherent Python implementation that is simple, intuitive and easy to use. multi-imbalance is a python package tackling the problem of multi-class imbalanced datasets in machine learning.

Requirements

Tha package has been tested under python 3.6, 3.7 and 3.8. It relies heavily on scikit-learn and typical scientific stack (numpy, scipy, pandas etc.). Requirements include:

  • numpy>=1.17.0,
  • scikit-learn>=0.21.3,
  • pandas>=0.25.1,
  • pytest>=5.1.2,
  • imbalanced-learn>=0.6.1
  • IPython>=7.13.0,
  • seaborn>=0.10.1,
  • matplotlib>=3.2.1

Installation

Just type in

pip install multi-imbalance

Implemented algorithms

Our package includes implementation of such algorithms, as:

  • One-vs-One (OVO) and One-vs-all (OVA) ensembles [2],
  • Error-Correcting Output Codes (ECOC) [1] with dense, sparse and complete encoding [9] ,
  • Global-CS [4],
  • Static-SMOTE [10],
  • Mahalanobis Distance Oversampling [3],
  • Similarity-based Oversampling and Undersampling Preprocessing (SOUP) [5],
  • SPIDER3 cost-sensitive pre-processing [8].
  • Multi-class Roughly Balanced Bagging (MRBB) [7],
  • SOUP Bagging [6],

Example usage

from multi_imbalance.resampling.mdo import MDO

# Mahalanbois Distance Oversampling
mdo = MDO(k=9, k1_frac=0, seed=0)

# read the data
X_train, y_train, X_test, y_test = ...

# preprocess
X_train_resampled, y_train_resampled = mdo.fit_transform(np.copy(X_train), np.copy(y_train))

# train the classifier on preprocessed data
clf_tree = DecisionTreeClassifier(random_state=0)
clf_tree.fit(X_train_resampled, y_train_resampled)

# make predictions
y_pred = clf_tree.predict(X_test)

For more examples please refer to https://multi-imbalance.readthedocs.io/en/latest/

About

If you use multi-imbalance in a scientific publication, please consider including citation to the following thesis:

@bachelorthesis{ MultiImbalance2020,
author = "Jacek Grycza, Damian Horna, Hanna Klimczak, Kamil Plucínski",
title = "Multi-imbalance:  Python package for multi-class imbalance learning",
school = "Poznan University of Technology",
address = "Poznan, Poland",
year = "2020",}

References:

[1] Dietterich, T., and Bakiri, G. Solving multi-class learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2 (02 1995), 263–286.

[2] Fernández, A., López, V., Galar, M., del Jesus, M., and Herrera, F. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-Based Systems 42 (2013), 97 – 110.

[3] Abdi, L., and Hashemi, S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering 28 (January 2016), 238–251.

[4] Zhou, Z., and Liu, X. On multi-class cost-sensitive learning. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1 (2006), AAAI’06, AAAI Press, pp. 567–572.

[5] Janicka, M., Lango, M., and Stefanowski, J. Using information on class interrelations to improve classification of multi-class imbalanced data: A new resampling algorithm. International Journal of Applied Mathematics and Computer Science 29 (December 2019).

[6] Lango, M., and Stefanowski, J. SOUP-Bagging: a new approach for multi-class imbalanced data classification. PP-RAI ’19: Polskie Porozumienie na Rzecz Sztucznej Inteligencji (2019).

[7] Lango, M., and Stefanowski, J. Multi-class and feature selection extensions of roughly balanced bagging for imbalanced data. J Intell Inf Syst 50 (2017), 97–127

[8] Wojciechowski, S., Wilk, S., and Stefanowski, J. An algorithm for selective preprocessing of multi-class imbalanced data. In Proceedings of the 10th International Conference on Computer Recognition Systems (05 2017), pp. 238–247.

[9] Kuncheva, L. Combining Pattern Classifiers: Methods and Algorithms. Wiley (2004).

[10] Fernández-Navarro, F., Hervás-Martínez, C., and Antonio Gutiérrez, P. A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognition, 44(8), 1821–1833 (2011).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi-imbalance-0.0.12.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

multi_imbalance-0.0.12-py3-none-any.whl (43.2 kB view details)

Uploaded Python 3

File details

Details for the file multi-imbalance-0.0.12.tar.gz.

File metadata

  • Download URL: multi-imbalance-0.0.12.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for multi-imbalance-0.0.12.tar.gz
Algorithm Hash digest
SHA256 dea91826fdd0a2014a53a81e0d5caf5a06191fc415d60ae1572584ce05582751
MD5 4852154b9502a675077f79534787f9b8
BLAKE2b-256 0f671122894cb08494743f48b25de1919cfa17134094f4185130b3d546fe59ab

See more details on using hashes here.

Provenance

File details

Details for the file multi_imbalance-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: multi_imbalance-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 43.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for multi_imbalance-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 2edfca22e455d8b88cffb72aabaef6d7e006bd100fb81e2e51d39eefcfd47778
MD5 c6b0940573567ab2b8b7ca2d5f01684a
BLAKE2b-256 fd2ad7a4168b4c5f85a989ae758bd96a375f53f32c6f34216bd10509dae015bb

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page