Skip to main content

Experimental implementations of several (over/under)-sampling techniques not yet available in the imbalanced-learn library.

Project description

LoRAS

CI Codecov PyPI

Localized Random Affine Shadowsampling

This repo provides a python implementation of an imbalanced dataset oversampling technique known as Localized Random Affine Shadowsampling (LoRAS). It also provides implementations of several other over/under-sampling algorithms not yet available in the imbalanced-learn package. These implementations piggybacks off of imbalanced-learn and thus aim to be as compatible as possible with it.

Dependencies

  • Python >= 3.8
  • numpy >= 1.17.3
  • imbalanced-learn < 1.0.0

Installation

Using pip:

$ pip install -U pyloras

Alternatively, one can install from source with the following shell commands:

$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ pip install .

Usage

from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]

# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)

Visualization

Below is a comparision of imbalanced-learn's SMOTE implementation with LORAS on the dummy data used in this doc page using the default parameters.

The plots can be reproduced by running:

$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>

References

  • Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
  • Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.
  • A. Tripathi, R. Chakraborty and S. K. Kopparapu, "A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 10650-10657, doi: 10.1109/ICPR48806.2021.9413002.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyloras-0.1.0b6.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

pyloras-0.1.0b6-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file pyloras-0.1.0b6.tar.gz.

File metadata

  • Download URL: pyloras-0.1.0b6.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for pyloras-0.1.0b6.tar.gz
Algorithm Hash digest
SHA256 1c7ee116de9abbf36310fb766ea4420e4fa3615c1c8f44e76c73b4a50b170bcf
MD5 c1b90002e2654ae0554be04484f78e23
BLAKE2b-256 de8b2b832673f21a1cb873eb3825dc135df52ada12b6d0fd1057599653b0188b

See more details on using hashes here.

File details

Details for the file pyloras-0.1.0b6-py3-none-any.whl.

File metadata

  • Download URL: pyloras-0.1.0b6-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for pyloras-0.1.0b6-py3-none-any.whl
Algorithm Hash digest
SHA256 c13ed504adab476617aff876cec82c36f62ff3801391512abca2b48c21599355
MD5 853a99809e6ea9fc864af167eea4e201
BLAKE2b-256 ff7f90020e140ddb5b5d29e6c42d0b8810c53f8a0450ed03ddd8db0345a1a6c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page