A general interface for clustering based over-sampling algorithms.
Project description
cluster-over-sampling
Category | Tools |
---|---|
Development | |
Package | |
Documentation | |
Communication |
Introduction
A general interface for clustering based over-sampling algorithms.
Installation
For user installation, cluster-over-sampling
is currently available on the PyPi's repository, and you can
install it via pip
:
pip install cluster-over-sampling
Development installation requires to clone the repository and then use PDM to install the project as well as the main and development dependencies:
git clone https://github.com/georgedouzas/cluster-over-sampling.git
cd cluster-over-sampling
pdm install
SOM clusterer requires optional dependencies:
pip install cluster-over-sampling[som]
Usage
All the classes included in cluster-over-sampling
follow the imbalanced-learn API using the functionality of the base
oversampler. Using scikit-learn convention, the data are represented as follows:
- Input data
X
: 2D array-like or sparse matrices. - Targets
y
: 1D array-like.
The clustering-based oversamplers implement a fit
method to learn from X
and y
:
clustering_based_oversampler.fit(X, y)
They also implement a fit_resample
method to resample X
and y
:
X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)
References
If you use cluster-over-sampling
in a scientific publication, we would appreciate citations to any of the following papers:
- G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with Applications, vol. 82, pp. 40-52, 2017.
- G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.
- G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert Systems with Applications, vol. 183,115230, 2021.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cluster-over-sampling-0.6.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05a89a4daa5a1d005aeb48afd041e8a92163d9e852d121236317065fe18097a2 |
|
MD5 | 0e853fbf879dd877d2e08e66df05574a |
|
BLAKE2b-256 | 57fc370c2bb96b4b9b3a47265349070acdce681037cfde36e778c90faa11650b |
Hashes for cluster_over_sampling-0.6.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e035ed3cbe7bf8021b791e47b68f4ec46a191cb509cd9f0b07b50716d5836f81 |
|
MD5 | 979b31751afc1c1a4521c89fb82b3f85 |
|
BLAKE2b-256 | 524df942180ff764093d768f76e1dcb78679c6be625d8d1f74fbf9bcb5a383ff |