Skip to main content

LoRAS: An oversampling approach for imbalanced datasets

Project description

LoRAS

CI Codecov PyPI

Localized Random Affine Shadowsampling

This repo provides a python implementation of an imbalanced dataset oversampling technique known as Localized Random Affine Shadowsampling (LoRAS). This implementation piggybacks off the package imbalanced-learn and thus aims to be as compatible as possible with it.

Dependencies

  • imbalanced-learn

Installation

Using pip:

$ pip install -U pyloras

Installing from source requires an installation of poetry and the following shell commands:

$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ poetry install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH 

Usage

from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]

# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)

Visualization

Below is a comparision of imbalanced-learn's SMOTE implementation with LORAS on the dummy data used in this doc page using the default parameters.

The plots can be reproduced by running:

$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>

References

Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyloras-0.1.0b3.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

pyloras-0.1.0b3-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file pyloras-0.1.0b3.tar.gz.

File metadata

  • Download URL: pyloras-0.1.0b3.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pyloras-0.1.0b3.tar.gz
Algorithm Hash digest
SHA256 ffa6a0bd12b632a9ff6d7d427a17ceb92cad1300eb56bbdd2a11acd98686f97f
MD5 2f4371118e49721bd8620029c0f28927
BLAKE2b-256 9c871710e0238072720817af8fce13ae732b1bf3e9a44f2775f13985a54c3c65

See more details on using hashes here.

File details

Details for the file pyloras-0.1.0b3-py3-none-any.whl.

File metadata

  • Download URL: pyloras-0.1.0b3-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for pyloras-0.1.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 92072bcd20842112e654d038b4cff1d2a229766a239f65a10d5e330e43398e5e
MD5 aff674f0cc28ac10bcf98bf054e13f01
BLAKE2b-256 9f1131d1bd2b3e18446de30fd725fe6fd67571bddc272d1cd142fa5a5ec69f00

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page