Classic approaches of Uplift modelling in scikit-learn style in python
Project description
scikit-uplift
scikit-uplift is a Python module for classic approaches for uplift modeling built on top of scikit-learn.
Uplift prediction aims to estimate the causal impact of a treatment at the individual level.
More about uplift modelling problem read in russian on habr.com: Part 1 and Part 2.
Features:
Comfortable and intuitive style of modelling like scikit-learn;
Applying any estimator adheres to scikit-learn conventions;
All approaches can be used in sklearn.pipeline (see example (EN , RU ))
Almost all implemented approaches solve both the problem of classification and regression;
A lot of metrics (Such as Area Under Uplift Curve or Area Under Qini Curve) are implemented to evaluate your uplift model;
Useful graphs for analyzing the built model.
Installation
Install the package by the following command from PyPI:
pip install scikit-uplift
Or install from source:
git clone https://github.com/maks-sh/scikit-uplift.git
cd scikit-uplift
python setup.py install
Documentation
The full documentation is available at scikit-uplift.readthedocs.io.
Or you can build the documentation locally using Sphinx 1.4 or later:
cd docs
pip install -r requirements.txt
make html
And if you now point your browser to _build/html/index.html, you should see a documentation site.
Quick Start
See the RetailHero tutorial notebook (EN , RU ) for details.
Train and predict uplift model
# import approaches
from sklift.models import SoloModel, ClassTransformation, TwoModels
# import any estimator adheres to scikit-learn conventions.
from catboost import CatBoostClassifier
# define approach
sm = SoloModel(CatBoostClassifier(verbose=100, random_state=777))
# fit model
sm = sm.fit(X_train, y_train, treat_train, estimator_fit_params={{'plot': True})
# predict uplift
uplift_sm = sm.predict(X_val)
Evaluate your uplift model
# import metrics to evaluate your model
from sklift.metrics import qini_auc_score, uplift_auc_score, uplift_at_k
# Uplift@30%
sm_uplift_at_k = uplift_at_k(y_true=y_val, uplift=uplift_sm, treatment=treat_val, k=0.3)
# Area Under Qini Curve
sm_qini_auc_score = qini_auc_score(y_true=y_val, uplift=uplift_sm, treatment=treat_val)
# Area Under Uplift Curve
sm_uplift_auc_score = uplift_auc_score(y_true=y_val, uplift=uplift_sm, treatment=treat_val)
Vizualize the results
# import vizualisation tools
from sklift.viz import plot_uplift_preds, plot_uplift_qini_curves
# get conditional predictions (probabilities) of performing a target action
# with interaction for each object
sm_trmnt_preds = sm.trmnt_preds_
# get conditional predictions (probabilities) of performing a target action
# without interaction for each object
sm_ctrl_preds = sm.ctrl_preds_
# draw probability distributions and their difference (uplift)
plot_uplift_preds(trmnt_preds=sm_trmnt_preds, ctrl_preds=sm_ctrl_preds);
# draw Uplift and Qini curves
plot_uplift_qini_curves(y_true=y_val, uplift=uplift_sm, treatment=treat_val);
Development
We welcome new contributors of all experience levels.
Important links
Official source code repo: https://github.com/maks-sh/scikit-uplift/
Issue tracker: https://github.com/maks-sh/scikit-uplift/issues
Release History: https://scikit-uplift.readthedocs.io/en/latest/changelog.html
Papers and materials
- Gutierrez, P., & Gérardy, J. Y.
Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).
- Artem Betlei, Criteo Research; Eustache Diemert, Criteo Research; Massih-Reza Amini, Univ. Grenoble Alpes
Dependent and Shared Data Representations improve Uplift Prediction in Imbalanced Treatment Conditions FAIM’18 Workshop on CausalML.
- Eustache Diemert, Artem Betlei, Christophe Renaudin, and Massih-Reza Amini. 2018.
A Large Scale Benchmark for Uplift Modeling. In Proceedings of AdKDD & TargetAd (ADKDD’18). ACM, New York, NY, USA, 6 pages.
- Athey, Susan, and Imbens, Guido. 2015.
Machine learning methods for estimating heterogeneous causal effects. Preprint, arXiv:1504.01132. Google Scholar.
- Oscar Mesalles Naranjo. 2012.
Testing a New Metric for Uplift Models. Dissertation Presented for the Degree of MSc in Statistics and Operational Research.
- Kane, K., V. S. Y. Lo, and J. Zheng. 2014.
Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods. Journal of Marketing Analytics 2 (4): 218–238.
- Maciej Jaskowski and Szymon Jaroszewicz.
Uplift modeling for clinical trial data. ICML Workshop on Clinical Data Analysis, 2012.
- Lo, Victor. 2002.
The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.
- Zhao, Yan & Fang, Xiao & Simchi-Levi, David. 2017.
Uplift Modeling with Multiple Treatments and General Response Types. 10.1137/1.9781611974973.66.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scikit_uplift-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bf781c2f0aed26a4c9024aaadbb3fbfea2e47fec022103259f2212340fe8070 |
|
MD5 | 46a3c523f68f0775cab11bf82bab0f3e |
|
BLAKE2b-256 | 52bf3e229bd58f8def6d0b43eaa9142ef605be7a84516284c7de359ce52b5fce |