Skip to main content

A model specialized for imbalanced class learning.

Project description

ad-hoc-boost

Welcome to ad-hoc-boost--a model that is specialized for classification in a severely imbalanced-class scenario.

About

Many data science problems have severely imbalanced classes (e.g. predicting fraudulent transactions, predicting order-cancellations in food-delivery, predicting if a day in Berlin will be sunny). In these situations, predicting the positive class is hard! This module aims to alleviate some of that.

The AdHocBoost model works by creating n sequential models. The first n-1 models can most aptly be thought of as dataset filtering models, i.e. each one does a good job at classifying rows as "definitely not the positive class" versus "maybe the positive class". The nth model only works on this filtered "maybe positive" data.

Like this, the class imbalance is alleviated at each filter-step, such that by the time the dataset is filtered for final classification by the nth model, the classes are considerably more balanced.

Run Instructions

  1. Clone this module to a location of your choice.
  2. Set an environment variable in your src file of choice (e.g. ~/.zshrc or ~/.bash_profile) corresponding to the location of where you cloned the module. It should read something like export AD_HOC_BOOST_HOME="path/to/your/ad_hoc_boost".
  3. Use the herein contained env.yml file to create an environment, by running conda env create --file env.yml --prefix $AD_HOC_BOOST_HOME/env. This can take some time, as much as ~15 minutes, as some dependencies are large and difficult to resolve (e.g. google-cloud).
  4. Activate your new environment with conda activate $AD_HOC_BOOST_HOME/env. It probably works similarly with pip--we leave that as an exercise for the reader ;)
  5. To see an example in action, check out the file at ./scripts/example.py.
  6. To run ./scripts/example.py, you'll need to hit bigquery! Make sure that you have a GOOGLE_CLOUD_PROJECT=<your-project-here> configured in your src file of choice.

Other Notes and Documentation

The AdHocBoost conforms to a sklearn-like API: to use it, you simply instantiate it, and then use .fit(), .predict(), and .predict_proba() as you see... fit ;).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adhocboost-0.0.1.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

adhocboost-0.0.1-py3-none-any.whl (7.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page