Skip to main content

Pre-processing technics for imbalanced datasets in regression modelling

Project description

PyImbalReg

Pre-processing technics for imbalanced datasets in regression modelling

PyPI version License: GPL v3 Codacy Badge GitHub last commit


Dealing with imbalanced datasets for regression modelling

Your trained regression model has heteroskedasticity problem? Your model can't predict the extreme values very well? Consider using these pre-processing technics for solving these issues probably caused by your imbalanced dataset.

  • RandomOversampling (RO)
  • GaussianNoise and Undersampling (GN)
  • WEighted Relevance-based Combination Strategy (WERCS)

How to use? (2 Minutes read)

  1. Pass your data as a pandas dataframe to any of these technics
  2. Define a releavnce function that maps the output variable to [0, 1] (The higher this value the rarer the samples)
  3. Set a threshold for flagging the samples: rare and normal
  4. Set model-specific parameters
  5. get() a new dataset with new samples

Installation

## Pypi version
pip install PyImbalReg

## GitHub version
pip install git+https://github.com/vd1371/PyImbalReg.git

Example

# importing PyImbalReg
import PyImbalReg as pir

# importing the data
from seaborn import load_dataset
data = load_dataset('dots')

ro = pir.RandomOversampling(data,           # Passing the data
			rel_func = None,    # Default relevance function will be used
			threshold = 0.7,    # Set the threshold
			o_percentage = 5    # ( o_percentage - 1 ) x n_rare_samples will be added 
			)
new_data = ro.get()

Requirements

  1. SciPy
  2. Pandas
  3. Numpy

Other examples

  1. RandomOversampling
  2. GaussianNoise
  3. WERCS

Contributions

Please share your issues, new technics and your contributions with us. Your help is much appreciated in advance.


If you are using this repository

Please cite the below reference(s)

Branco, P., Torgo, L. and Ribeiro, R.P., 2019. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing, 343, pp.76-99.


License

© Vahid Asghari, 2020. Licensed under the General Public License v3.0 (GPLv3).

P.S. Some parts of the readme and codes were inspired from https://github.com/nickkunz/smogn


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyImbalReg-0.0.3.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

PyImbalReg-0.0.3-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file PyImbalReg-0.0.3.tar.gz.

File metadata

  • Download URL: PyImbalReg-0.0.3.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for PyImbalReg-0.0.3.tar.gz
Algorithm Hash digest
SHA256 8e92c5038347ee89bfa1b4b35b6d6693fa8fe26d32d3617ba375e7d74f8a5613
MD5 c851ae53935af672b95ec9a7faec176d
BLAKE2b-256 47dafcbca942b4df020efd6c5a340e6539db21a3ff5738008f3d3d0a7c1c70c7

See more details on using hashes here.

File details

Details for the file PyImbalReg-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: PyImbalReg-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.21.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for PyImbalReg-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 325746f9f44e80fe6f039ab17d1c2db957dd1b54c7266b712bd2a5281e9af03f
MD5 ff41cc0ad8be35d75a3ea478c5622cf6
BLAKE2b-256 4d696edda09d648c1682c45c69b0db7b5b7481ac8390595238c59bbe879d9777

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page