Pre-processing technics for imbalanced datasets in regression modelling
Project description
PyImbalReg
Pre-processing technics for imbalanced datasets in regression modelling
Dealing with imbalanced datasets for regression modelling
Your trained regression model has heteroskedasticity problem? Your model can't predict the extreme values very well? Consider using these pre-processing technics for solving these issues probably caused by your imbalanced dataset.
- RandomOversampling (RO)
- GaussianNoise and Undersampling (GN)
- WEighted Relevance-based Combination Strategy (WERCS)
How to use? (2 Minutes read)
- Pass your data as a pandas dataframe to any of these technics
- Define a releavnce function that maps the output variable to [0, 1] (The higher this value the rarer the samples)
- Set a threshold for flagging the samples: rare and normal
- Set model-specific parameters
- get() a new dataset with new samples
Installation
## Pypi version
pip install PyImbalReg
## GitHub version
pip install git+https://github.com/vd1371/PyImbalReg.git
Example
# importing PyImbalReg
import PyImbalReg as pir
# importing the data
from seaborn import load_dataset
data = load_dataset('dots')
ro = pir.RandomOversampling(data, # Passing the data
rel_func = None, # Default relevance function will be used
threshold = 0.7, # Set the threshold
o_percentage = 5 # ( o_percentage - 1 ) x n_rare_samples will be added
)
new_data = ro.get()
Requirements
- SciPy
- Pandas
- Numpy
Other examples
Contributions
Please share your issues, new technics and your contributions with us. Your help is much appreciated in advance.
If you are using this repository
Please cite the below reference(s)
Branco, P., Torgo, L. and Ribeiro, R.P., 2019. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing, 343, pp.76-99.
License
© Vahid Asghari, 2020. Licensed under the General Public License v3.0 (GPLv3).
P.S. Some parts of the readme and codes were inspired from https://github.com/nickkunz/smogn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for PyImbalReg-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 325746f9f44e80fe6f039ab17d1c2db957dd1b54c7266b712bd2a5281e9af03f |
|
MD5 | ff41cc0ad8be35d75a3ea478c5622cf6 |
|
BLAKE2b-256 | 4d696edda09d648c1682c45c69b0db7b5b7481ac8390595238c59bbe879d9777 |