Skip to main content

Pre-processing technics for imbalanced datasets in regression modelling

Project description

PyImbalReg

Pre-processing technics for imbalanced datasets in regression modelling

PyPI version License: GPL v3 Codacy Badge GitHub last commit


Dealing with imbalanced datasets for regression modelling

Your trained regression model has heteroskedasticity problem? Your model can't predict the extreme values very well? Consider using these pre-processing technics for solving these issues probably caused by your imbalanced dataset.

  • RandomOversampling (RO)
  • GaussianNoise and Undersampling (GN)
  • WEighted Relevance-based Combination Strategy (WERCS)

How to use? (2 Minutes read)

  1. Pass your data as a pandas dataframe to any of these technics
  2. Define a releavnce function that maps the output variable to [0, 1] (The higher this value the rarer the samples)
  3. Set a threshold for flagging the samples: rare and normal
  4. Set model-specific parameters
  5. get() a new dataset with new samples

Installation

## Pypi version
pip install PyImbalReg

## GitHub version
pip install git+https://github.com/vd1371/PyImbalReg.git

Example

# importing PyImbalReg
import PyImbalReg as pir

# importing the data
from seaborn import load_dataset
data = load_dataset('dots')

ro = pir.RandomOversampling(data,           # Passing the data
			rel_func = None,    # Default relevance function will be used
			threshold = 0.7,    # Set the threshold
			o_percentage = 5    # ( o_percentage - 1 ) x n_rare_samples will be added 
			)
new_data = ro.get()

Requirements

  1. SciPy
  2. Pandas
  3. Numpy

Other examples

  1. RandomOversampling
  2. GaussianNoise
  3. WERCS

Contributions

Please share your issues, new technics and your contributions with us. Your help is much appreciated in advance.


If you are using this repository

Please cite the below reference(s)

Branco, P., Torgo, L. and Ribeiro, R.P., 2019. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing, 343, pp.76-99.


License

© Vahid Asghari, 2020. Licensed under the General Public License v3.0 (GPLv3).

P.S. Some parts of the readme and codes were inspired from https://github.com/nickkunz/smogn


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyImbalReg-0.0.3.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

PyImbalReg-0.0.3-py3-none-any.whl (24.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page