Skip to main content

LoRAS: An oversampling approach for imbalanced datasets

Project description

LoRAS

CI Codecov PyPI

Localized Random Affine Shadowsampling

This repo provides a python implementation of an imbalanced dataset oversampling technique known as Localized Random Affine Shadowsampling (LoRAS). This implementation piggybacks off the package imbalanced-learn and thus aims to be as compatible as possible with it.

Dependencies

  • Python >= 3.6
  • numpy >= 1.17.0
  • imbalanced-learn

Installation

Using pip:

$ pip install -U pyloras

Installing from source requires an installation of poetry and the following shell commands:

$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ poetry install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH 

Usage

from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]

# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)

Visualization

Below is a comparision of imbalanced-learn's SMOTE implementation with LORAS on the dummy data used in this doc page using the default parameters.

The plots can be reproduced by running:

$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>

References

  • Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
  • Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.
  • A. Tripathi, R. Chakraborty and S. K. Kopparapu, "A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 10650-10657, doi: 10.1109/ICPR48806.2021.9413002.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyloras-0.1.0b5.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

pyloras-0.1.0b5-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file pyloras-0.1.0b5.tar.gz.

File metadata

  • Download URL: pyloras-0.1.0b5.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyloras-0.1.0b5.tar.gz
Algorithm Hash digest
SHA256 cf6d9b6d9f0009d123d9e60b71a5d5afe7be6a793dc96a05ddcf9f555b3a5e59
MD5 6f4c051ff2618ae3a2e7bd512deb03ed
BLAKE2b-256 73ee2122d464defde26da640f1366c9a99d3448b1fd13fa0af3eb5a32e66cfaa

See more details on using hashes here.

File details

Details for the file pyloras-0.1.0b5-py3-none-any.whl.

File metadata

  • Download URL: pyloras-0.1.0b5-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for pyloras-0.1.0b5-py3-none-any.whl
Algorithm Hash digest
SHA256 3160049adfe0303d530e767dcda589afc17ac05f0bce0e5d5b8d4c319c3ba744
MD5 5c8796c9d7c889279a1570a06aee26e3
BLAKE2b-256 d0a05cd5a3376e526edc2f7e9f13cf8e4788858a4f352ed107e248fd9ec95262

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page