Skip to main content

Experimental implementations of several (over/under)-sampling techniques not yet available in the imbalanced-learn library.

Project description

LoRAS

CI Codecov PyPI

Localized Random Affine Shadowsampling

This repo provides a python implementation of an imbalanced dataset oversampling technique known as Localized Random Affine Shadowsampling (LoRAS). It also provides implementations of several other over/under-sampling algorithms not yet available in the imbalanced-learn package. These implementations piggybacks off of imbalanced-learn and thus aim to be as compatible as possible with it.

Dependencies

  • Python >= 3.8
  • numpy >= 1.17.3
  • imbalanced-learn < 1.0.0

Installation

Using pip:

$ pip install -U pyloras

Alternatively, one can install from source with the following shell commands:

$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ pip install .

Usage

from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]

# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)

Visualization

Below is a comparision of imbalanced-learn's SMOTE implementation with LORAS on the dummy data used in this doc page using the default parameters.

The plots can be reproduced by running:

$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>

References

  • Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
  • Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.
  • A. Tripathi, R. Chakraborty and S. K. Kopparapu, "A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 10650-10657, doi: 10.1109/ICPR48806.2021.9413002.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyloras-0.1.0b6.tar.gz (13.2 kB view hashes)

Uploaded Source

Built Distribution

pyloras-0.1.0b6-py3-none-any.whl (14.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page