LoRAS: An oversampling approach for imbalanced datasets
Project description
LoRAS
Localized Random Affine Shadowsampling
This repo provides a python implementation of an imbalanced dataset oversampling
technique known as Localized Random Affine Shadowsampling (LoRAS). This implementation
piggybacks off the package imbalanced-learn
and thus aims to be as compatible
as possible with it.
Dependencies
Python >= 3.6
numpy >= 1.17.0
imbalanced-learn
Installation
Using pip
:
$ pip install -U pyloras
Installing from source requires an installation of poetry and the following shell commands:
$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ poetry install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH
Usage
from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
n_redundant=0, n_repeated=0, n_classes=3,
n_clusters_per_class=1,
weights=[0.01, 0.05, 0.94],
class_sep=0.8, random_state=0)
lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]
# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)
Visualization
Below is a comparision of imbalanced-learn
's SMOTE
implementation with LORAS
on the dummy data used in this doc page using the default parameters.
The plots can be reproduced by running:
$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>
References
- Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
- Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyloras-0.1.0b4.tar.gz
.
File metadata
- Download URL: pyloras-0.1.0b4.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2447e18090a2a9f92dcb691499516844242936649740a7b192ed46e11e40cfe4 |
|
MD5 | 842aa2816b2d50f33b71b2e945221c79 |
|
BLAKE2b-256 | 4ef20c774b141706916bea59fcc9f7e3dc669a984061f7eda11a64ec3c9927a6 |
File details
Details for the file pyloras-0.1.0b4-py3-none-any.whl
.
File metadata
- Download URL: pyloras-0.1.0b4-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69e868e48fb0038b985a6834ffce6f38410137db41508c192db29717cacc8fa2 |
|
MD5 | 7a878b62252c80dd7e160462c5a68a0a |
|
BLAKE2b-256 | 7ff72abfcc9bc8c8e03f5590c3b930de83be73680ada74ae2f95c8a414682d48 |