Skip to main content

A general interface for clustering based over-sampling algorithms.

Project description

cluster-over-sampling

ci doc

Category Tools
Development black ruff mypy docformatter
Package version pythonversion downloads
Documentation mkdocs
Communication gitter discussions

Introduction

A general interface for clustering based over-sampling algorithms.

Installation

cluster-over-sampling is currently available on the PyPi's repository, and you can install it via pip:

pip install cluster-over-sampling

SOM clusterer requires optional dependencies:

pip install cluster-over-sampling[som]

Similarly for Geometric SMOTE oversampler:

pip install cluster-over-sampling[gsmote]

You can also install both of them:

pip install cluster-over-sampling[all]

Usage

All the classes included in cluster-over-sampling follow the imbalanced-learn API using the functionality of the base oversampler. Using scikit-learn convention, the data are represented as follows:

  • Input data X: 2D array-like or sparse matrices.
  • Targets y: 1D array-like.

The clustering-based oversamplers implement a fit method to learn from X and y:

clustering_based_oversampler.fit(X, y)

They also implement a fit_resample method to resample X and y:

X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)

References

If you use cluster-over-sampling in a scientific publication, we would appreciate citations to any of the following papers:

[^1]: G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with Applications, vol. 82, pp. 40-52, 2017. [^2]: G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018. [^3]: G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert Systems with Applications, vol. 183,115230, 2021.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluster-over-sampling-0.5.0.tar.gz (24.8 kB view hashes)

Uploaded source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page