Experimental implementations of several (over/under)-sampling techniques not yet available in the imbalanced-learn library.
Project description
LoRAS
Localized Random Affine Shadowsampling
This repo provides a python implementation of an imbalanced dataset oversampling
technique known as Localized Random Affine Shadowsampling (LoRAS). It also provides
implementations of several other over/under-sampling algorithms not yet available in
the imbalanced-learn
package. These implementations piggybacks off of imbalanced-learn
and thus aim to be as compatible as possible with it.
Dependencies
Python >= 3.8
numpy >= 1.17.3
imbalanced-learn < 1.0.0
Installation
Using pip
:
$ pip install -U pyloras
Alternatively, one can install from source with the following shell commands:
$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ pip install .
Usage
from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
n_redundant=0, n_repeated=0, n_classes=3,
n_clusters_per_class=1,
weights=[0.01, 0.05, 0.94],
class_sep=0.8, random_state=0)
lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]
# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)
Visualization
Below is a comparision of imbalanced-learn
's SMOTE
implementation with LORAS
on the dummy data used in this doc page using the default parameters.
The plots can be reproduced by running:
$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>
References
- Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
- Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.
- A. Tripathi, R. Chakraborty and S. K. Kopparapu, "A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 10650-10657, doi: 10.1109/ICPR48806.2021.9413002.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyloras-0.1.0b6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c13ed504adab476617aff876cec82c36f62ff3801391512abca2b48c21599355 |
|
MD5 | 853a99809e6ea9fc864af167eea4e201 |
|
BLAKE2b-256 | ff7f90020e140ddb5b5d29e6c42d0b8810c53f8a0450ed03ddd8db0345a1a6c9 |