Skip to main content

LoRAS: An oversampling approach for imbalanced datasets

Project description

LoRAS

CI Codecov PyPI

Localized Random Affine Shadowsampling

This repo provides a python implementation of an imbalanced dataset oversampling technique known as Localized Random Affine Shadowsampling (LoRAS). This implementation piggybacks off the package imbalanced-learn and thus aims to be as compatible as possible with it.

Dependencies

  • Python >= 3.6
  • numpy >= 1.17.0
  • imbalanced-learn

Installation

Using pip:

$ pip install -U pyloras

Installing from source requires an installation of poetry and the following shell commands:

$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ poetry install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH 

Usage

from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]

# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)

Visualization

Below is a comparision of imbalanced-learn's SMOTE implementation with LORAS on the dummy data used in this doc page using the default parameters.

The plots can be reproduced by running:

$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>

References

  • Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
  • Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyloras-0.1.0b4.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

pyloras-0.1.0b4-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file pyloras-0.1.0b4.tar.gz.

File metadata

  • Download URL: pyloras-0.1.0b4.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for pyloras-0.1.0b4.tar.gz
Algorithm Hash digest
SHA256 2447e18090a2a9f92dcb691499516844242936649740a7b192ed46e11e40cfe4
MD5 842aa2816b2d50f33b71b2e945221c79
BLAKE2b-256 4ef20c774b141706916bea59fcc9f7e3dc669a984061f7eda11a64ec3c9927a6

See more details on using hashes here.

File details

Details for the file pyloras-0.1.0b4-py3-none-any.whl.

File metadata

  • Download URL: pyloras-0.1.0b4-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6

File hashes

Hashes for pyloras-0.1.0b4-py3-none-any.whl
Algorithm Hash digest
SHA256 69e868e48fb0038b985a6834ffce6f38410137db41508c192db29717cacc8fa2
MD5 7a878b62252c80dd7e160462c5a68a0a
BLAKE2b-256 7ff72abfcc9bc8c8e03f5590c3b930de83be73680ada74ae2f95c8a414682d48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page