Classic approaches of Uplift modelling in scikit-learn style in python
Project description
scikit-uplift
scikit-uplift is a Python module for classic approaches for uplift modeling built on top of scikit-learn.
Uplift prediction aims to estimate the causal impact of a treatment at the individual level.
More about uplift modelling problem read in russian on habr.com: Part 1 and Part 2.
Features:
Comfortable and intuitive style of modelling like scikit-learn;
Applying any estimator adheres to scikit-learn conventions;
Almost all implemented approaches solve both the problem of classification and regression;
A lot of metrics (Such as Area Under Uplift Curve or Area Under Qini Curve) are implemented to evaluate your uplift model.
Installation
Install the package by the following command from PyPI:
pip install scikit-uplift
Or install from source:
git clone https://github.com/maks-sh/scikit-uplift.git
cd scikit-uplift
python setup.py install
Documentation
The full documentation is available at scikit-uplift.readthedocs.io.
Or you can build the documentation locally using Sphinx 1.4 or later:
cd docs
pip install -r requirements.txt
make html
And if you now point your browser to _build/html/index.html, you should see a documentation site.
Quick Start
See the RetailHero tutorial notebook (EN , RU ) for details.
Train and predict uplift model
# import approaches
from sklift.models import SoloModel, ClassTransformation, TwoModels
# import any estimator adheres to scikit-learn conventions.
from catboost import CatBoostClassifier
# define approach
sm = SoloModel(CatBoostClassifier(verbose=100, random_state=777))
# fit model
sm = sm.fit(X_train, y_train, treat_train, estimator_fit_params={{'plot': True})
# predict uplift
uplift_sm = sm.predict(X_val)
Evaluate your uplift model
# import metrics to evaluate your model
from sklift.metrics import qini_auc_score, uplift_auc_score, uplift_at_k
# Uplift@30%
sm_uplift_at_k = uplift_at_k(y_true=y_val, uplift=uplift_sm, treatment=treat_val, k=0.3)
# Area Under Qini Curve
sm_qini_auc_score = qini_auc_score(y_true=y_val, uplift=uplift_sm, treatment=treat_val)
# Area Under Uplift Curve
sm_uplift_auc_score = uplift_auc_score(y_true=y_val, uplift=uplift_sm, treatment=treat_val)
Vizualize the results
# import vizualisation tools
from sklift.viz import plot_uplift_preds, plot_uplift_qini_curves
# get conditional predictions (probabilities) of performing a target action
# with interaction for each object
sm_trmnt_preds = sm.trmnt_preds_
# get conditional predictions (probabilities) of performing a target action
# without interaction for each object
sm_ctrl_preds = sm.ctrl_preds_
# draw probability distributions and their difference (uplift)
plot_uplift_preds(trmnt_preds=sm_trmnt_preds, ctrl_preds=sm_ctrl_preds);
# draw Uplift and Qini curves
plot_uplift_qini_curves(y_true=y_val, uplift=uplift_sm, treatment=treat_val);
Development
We welcome new contributors of all experience levels.
Important links
Official source code repo: https://github.com/maks-sh/scikit-uplift/
Issue tracker: https://github.com/maks-sh/scikit-uplift/issues
Papers and materials
- Gutierrez, P., & Gérardy, J. Y.
Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).
- Artem Betlei, Criteo Research; Eustache Diemert, Criteo Research; Massih-Reza Amini, Univ. Grenoble Alpes
Dependent and Shared Data Representations improve Uplift Prediction in Imbalanced Treatment Conditions FAIM’18 Workshop on CausalML.
- Eustache Diemert, Artem Betlei, Christophe Renaudin, and Massih-Reza Amini. 2018.
A Large Scale Benchmark for Uplift Modeling. In Proceedings of AdKDD & TargetAd (ADKDD’18). ACM, New York, NY, USA, 6 pages.
- Athey, Susan, and Imbens, Guido. 2015.
Machine learning methods for estimating heterogeneous causal effects. Preprint, arXiv:1504.01132. Google Scholar.
- Oscar Mesalles Naranjo. 2012.
Testing a New Metric for Uplift Models. Dissertation Presented for the Degree of MSc in Statistics and Operational Research.
- Kane, K., V. S. Y. Lo, and J. Zheng. 2014.
Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods. Journal of Marketing Analytics 2 (4): 218–238.
- Maciej Jaskowski and Szymon Jaroszewicz.
Uplift modeling for clinical trial data. ICML Workshop on Clinical Data Analysis, 2012.
- Lo, Victor. 2002.
The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.
- Zhao, Yan & Fang, Xiao & Simchi-Levi, David. 2017.
Uplift Modeling with Multiple Treatments and General Response Types. 10.1137/1.9781611974973.66.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scikit_uplift-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3ce21238b6be78cedb47ecfdb36852199f945e65a5bff9ccff81f913fe118db |
|
MD5 | e92a8b2ef03108cca99e8a55a54a4d11 |
|
BLAKE2b-256 | 16bc373659bb6142cf7ddca1d75f471fddc6cfde045b2c824dabc9c5b1b41f8c |