A model specialized for imbalanced class learning.
Project description
ad-hoc-boost
Welcome to ad-hoc-boost--a model that is specialized for classification in a severely imbalanced-class scenario.
About
Many data science problems have severely imbalanced classes (e.g. predicting fraudulent transactions, predicting order-cancellations in food-delivery, predicting if a day in Berlin will be sunny). In these situations, predicting the positive class is hard! This module aims to alleviate some of that.
The AdHocBoost
model works by creating n
sequential models. The first n-1
models can most aptly be thought of
as dataset filtering models, i.e. each one does a good job at classifying rows as "definitely not the positive class"
versus "maybe the positive class". The nth
model only works on this filtered "maybe positive" data.
Like this, the class imbalance is alleviated at each filter-step, such that by the time the dataset is filtered for
final classification by the nth
model, the classes are considerably more balanced.
Run Instructions
- Clone this module to a location of your choice.
- Set an environment variable in your
src
file of choice (e.g.~/.zshrc
or~/.bash_profile
) corresponding to the location of where you cloned the module. It should read something likeexport AD_HOC_BOOST_HOME="path/to/your/ad_hoc_boost"
. - Use the herein contained
env.yml
file to create an environment, by runningconda env create --file env.yml --prefix $AD_HOC_BOOST_HOME/env
. This can take some time, as much as ~15 minutes, as some dependencies are large and difficult to resolve (e.g.google-cloud
). - Activate your new environment with
conda activate $AD_HOC_BOOST_HOME/env
. It probably works similarly with pip--we leave that as an exercise for the reader ;) - To see an example in action, check out the file at
./scripts/example.py
. - To run
./scripts/example.py
, you'll need to hit bigquery! Make sure that you have aGOOGLE_CLOUD_PROJECT=<your-project-here>
configured in yoursrc
file of choice.
Other Notes and Documentation
The AdHocBoost
conforms to a sklearn-like API: to use it, you simply instantiate it, and then use
.fit()
, .predict()
, and .predict_proba()
as you see... fit ;).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for adhocboost-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8ce09898395ec70143995126dd502b4d3a4866b4ed23a0e4598981e91e26326 |
|
MD5 | 5c95184fc9ca36e757f17049ad8a1e78 |
|
BLAKE2b-256 | 49b9738f0f812363260808dae46a6c1c498a8b3711ee0f19dec9d59af3e1a4da |