Experimental implementations of several (over/under)-sampling techniques not yet available in the imbalanced-learn library.
Project description
LoRAS
Localized Random Affine Shadowsampling
This repo provides a python implementation of an imbalanced dataset oversampling
technique known as Localized Random Affine Shadowsampling (LoRAS). It also provides
implementations of several other over/under-sampling algorithms not yet available in
the imbalanced-learn
package. These implementations piggybacks off of imbalanced-learn
and thus aim to be as compatible as possible with it.
Dependencies
Python >= 3.8
numpy >= 1.17.3
imbalanced-learn < 1.0.0
Installation
Using pip
:
$ pip install -U pyloras
Alternatively, one can install from source with the following shell commands:
$ git clone https://github.com/zoj613/pyloras.git
$ cd pyloras/
$ pip install .
Usage
from collections import Counter
from pyloras import LORAS
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=20000, n_features=5, n_informative=5,
n_redundant=0, n_repeated=0, n_classes=3,
n_clusters_per_class=1,
weights=[0.01, 0.05, 0.94],
class_sep=0.8, random_state=0)
lrs = LORAS(random_state=0, manifold_learner_params={'perplexity': 35, 'n_iter': 250})
print(sorted(Counter(y).items()))
# [(0, 270), (1, 1056), (2, 18674)]
X_resampled, y_resampled = lrs.fit_resample(X, y)
print(sorted(Counter(y_resampled.astype(int)).items()))
# [(0, 18674), (1, 18674), (2, 18674)]
# one can also use any custom 2d manifold learner via the ``manifold_learner` parameter
from umap import UMAP
LORAS(manifold_learner=UMAP()).fit_resample(X, y)
Visualization
Below is a comparision of imbalanced-learn
's SMOTE
implementation with LORAS
on the dummy data used in this doc page using the default parameters.
The plots can be reproduced by running:
$ python scripts/compare_oversamplers.py --n_neighbors=<optional> --n_shadow=<optional> --n_affine=<optional>
References
- Bej, S., Davtyan, N., Wolfien, M. et al. LoRAS: an oversampling approach for imbalanced datasets. Mach Learn 110, 279–301 (2021). https://doi.org/10.1007/s10994-020-05913-4
- Bej, S., Schultz, K., Srivastava, P., Wolfien, M., & Wolkenhauer, O. (2021). A multi-schematic classifier-independent oversampling approach for imbalanced datasets. ArXiv, abs/2107.07349.
- A. Tripathi, R. Chakraborty and S. K. Kopparapu, "A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 10650-10657, doi: 10.1109/ICPR48806.2021.9413002.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyloras-0.1.0b6.tar.gz
.
File metadata
- Download URL: pyloras-0.1.0b6.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c7ee116de9abbf36310fb766ea4420e4fa3615c1c8f44e76c73b4a50b170bcf |
|
MD5 | c1b90002e2654ae0554be04484f78e23 |
|
BLAKE2b-256 | de8b2b832673f21a1cb873eb3825dc135df52ada12b6d0fd1057599653b0188b |
File details
Details for the file pyloras-0.1.0b6-py3-none-any.whl
.
File metadata
- Download URL: pyloras-0.1.0b6-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c13ed504adab476617aff876cec82c36f62ff3801391512abca2b48c21599355 |
|
MD5 | 853a99809e6ea9fc864af167eea4e201 |
|
BLAKE2b-256 | ff7f90020e140ddb5b5d29e6c42d0b8810c53f8a0450ed03ddd8db0345a1a6c9 |