Skip to main content

An interface to apply your favourite re-sampler on regression tasks.

Project description

forthebadge made-with-python ForTheBadge built-with-love

PyPI version shields.io Downloads Maintenance

Regression ReSampling

A python library for repurposing traditional classification-based resampling (undersampling and/or oversampling) techniques for regression tasks. Currently supports all resampling techniques present in imblearn

Why does this exist?

While we were working on a regression task, we realized that the target variable was skewed, i.e., most samples were present in a particular range. One can easily solve the skew problem for classification tasks via a slew of resampling techniques (either under or over sampling) but this luxury is unavailable for regression tasks. We therefore decided to create an interface that can repurpose all resampling techniques for classification problems to regression problems!

How to install?

pip install reg_resampler

Functions and parameters

# This returns a numpy list of classes for each corresponding sample. It also automatically merges classes when required
fit(X, target, bins=3, min_n_samples=6, balanced_binning=False, verbose=2)
  • X - Either a pandas dataframe or numpy matrix. Complete data to be resampled.
  • target - Either string (for pandas) or index (for numpy). The target variable to be resampled.
  • bins=3 - The number of classes that the user wants to generate. (Default: 3)
  • min_n_samples=6 - Minimum number of samples in each bin. Bins having less than this value will be merged with the closest bin. Has to be more than neighbours in imblearn. (Default: 6)
  • balanced_binning=False - Decides whether samples are to be distributed roughly equally across all classes. (Default: False)
  • verbose=2 - 0 will disable print by package, 1 will print info about class mergers and 2 will also print class distributions.
# Performs resampling and returns the resampled dataframe/numpy matrices in the form of data and target variable.
resample(sampler_obj, trainX, trainY)
  • sampler_obj - Your favourite resampling algorithm's object (currently supports imblearn)
  • trainX - Either a pandas dataframe or numpy matrix. Data to be resampled. Also, contains the target variable
  • trainY - Numpy array of psuedo classes obtained from fit function.

Important Note

All functions return the same data type as provided in input.

How to import?

from reg_resampler import resampler

Usage

# Initialize the resampler object
rs = resampler()

# You might recieve info about class merger for low sample classes
# Generate classes
Y_classes = rs.fit(df_train, target=target, bins=num_bins)
# Create the actual target variable
Y = df_train[target]

# Create a smote (over-sampling) object from imblearn
smote = SMOTE(random_state=27)

# Now resample
final_X, final_Y = rs.resample(smote, df_train, Y_classes)

Tutorials

You can find further tutorials on how to use this library for cross-validation

Future Ideas

  • Support for more resampling techniques

Feature Request

Drop us an email at atif.hit.hassan@gmail.com or pvsaikrithik@gmail.com if you want any particular feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reg_resampler-2.1.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

reg_resampler-2.1.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file reg_resampler-2.1.1.tar.gz.

File metadata

  • Download URL: reg_resampler-2.1.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for reg_resampler-2.1.1.tar.gz
Algorithm Hash digest
SHA256 41ceaa75d6375978f071384a85f5625deda965b8e62406bb6e7aa6dc15f1aab0
MD5 32308d34868ac067ae37dcb0cad437c1
BLAKE2b-256 0399dcab55cfe6ebf9c221937a10a14a03415f957381084a00a4ba7ee61399ae

See more details on using hashes here.

File details

Details for the file reg_resampler-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: reg_resampler-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for reg_resampler-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d2780d6e5e29101273216bb5a84b3b50f69cef1c89ed287e57324394cdcfd95
MD5 dabdc976576ceadcb6f1ad1d60e47f2b
BLAKE2b-256 7bfe911c5f0ee409b18ffeff45cda6e7e7f2196fe604feaf6c4f3483ede46c27

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page