Classic approaches of Uplift modelling in scikit-learn style in python
Project description
scikit-uplift
scikit-uplift is a Python module for classic approaches for uplift modeling built on top of scikit-learn.
Uplift prediction aims to estimate the causal impact of a treatment at the individual level.
Read more about uplift modeling problem in User Guide, also articles in russian on habr.com: Part 1 and Part 2.
Features:
- Comfortable and intuitive style of modelling like scikit-learn;
- Applying any estimator adheres to scikit-learn conventions;
- All approaches can be used in sklearn.pipeline (see example (EN
, RU
));
- Almost all implemented approaches solve both the problem of classification and regression;
- A lot of metrics (Such as Area Under Uplift Curve or Area Under Qini Curve) are implemented to evaluate your uplift model;
- Useful graphs for analyzing the built model.
Installation
Install the package by the following command from PyPI:
pip install scikit-uplift
Or install from source:
git clone https://github.com/maks-sh/scikit-uplift.git
cd scikit-uplift
python setup.py install
Documentation
The full documentation is available at scikit-uplift.readthedocs.io.
Or you can build the documentation locally using Sphinx 1.4 or later:
cd docs
pip install -r requirements.txt
make html
And if you now point your browser to _build/html/index.html, you should see a documentation site.
Quick Start
See the RetailHero tutorial notebook (EN , RU
) for details.
Train and predict uplift model
# import approaches from sklift.models import SoloModel, ClassTransformation, TwoModels # import any estimator adheres to scikit-learn conventions. from catboost import CatBoostClassifier # define models treatment_model = CatBoostClassifier(iterations=50, thread_count=3, random_state=42, silent=True) control_model = CatBoostClassifier(iterations=50, thread_count=3, random_state=42, silent=True) # define approach tm = TwoModels(treatment_model, control_model, method='vanilla') # fit model tm = tm.fit(X_train, y_train, treat_train) # predict uplift uplift_preds = tm.predict(X_val)
Evaluate your uplift model
# import metrics to evaluate your model from sklift.metrics import ( uplift_at_k, uplift_auc_score, qini_auc_score, weighted_average_uplift ) # Uplift@30% tm_uplift_at_k = uplift_at_k(y_true=y_val, uplift=uplift_preds, treatment=treat_val, strategy='overall', k=0.3) # Area Under Qini Curve tm_qini_auc = qini_auc_score(y_true=y_val, uplift=uplift_preds, treatment=treat_val) # Area Under Uplift Curve tm_uplift_auc = uplift_auc_score(y_true=y_val, uplift=uplift_preds, treatment=treat_val) # Weighted average uplift tm_wau = weighted_average_uplift(y_true=y_val, uplift=uplift_preds, treatment=treat_val)
Vizualize the results
# import vizualisation tools from sklift.viz import plot_qini_curve plot_qini_curve(y_true=y_val, uplift=uplift_preds, treatment=treat_val)
Development
We welcome new contributors of all experience levels.
- Please see our Contributing Guide for more details.
- By participating in this project, you agree to abide by its Code of Conduct.
Important links
- Official source code repo: https://github.com/maks-sh/scikit-uplift/
- Issue tracker: https://github.com/maks-sh/scikit-uplift/issues
- Documentation: https://scikit-uplift.readthedocs.io/en/latest/
- User Guide: https://scikit-uplift.readthedocs.io/en/latest/user_guide/index.html
- Contributing guide: https://scikit-uplift.readthedocs.io/en/latest/contributing.html
- Release History: https://scikit-uplift.readthedocs.io/en/latest/changelog.html
Papers and materials
- Gutierrez, P., & Gérardy, J. Y.
- Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).
- Artem Betlei, Criteo Research; Eustache Diemert, Criteo Research; Massih-Reza Amini, Univ. Grenoble Alpes
- Dependent and Shared Data Representations improve Uplift Prediction in Imbalanced Treatment Conditions FAIM’18 Workshop on CausalML.
- Eustache Diemert, Artem Betlei, Christophe Renaudin, and Massih-Reza Amini. 2018.
- A Large Scale Benchmark for Uplift Modeling. In Proceedings of AdKDD & TargetAd (ADKDD’18). ACM, New York, NY, USA, 6 pages.
- Athey, Susan, and Imbens, Guido. 2015.
- Machine learning methods for estimating heterogeneous causal effects. Preprint, arXiv:1504.01132. Google Scholar.
- Oscar Mesalles Naranjo. 2012.
- Testing a New Metric for Uplift Models. Dissertation Presented for the Degree of MSc in Statistics and Operational Research.
- Kane, K., V. S. Y. Lo, and J. Zheng. 2014.
- Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing Methods. Journal of Marketing Analytics 2 (4): 218–238.
- Maciej Jaskowski and Szymon Jaroszewicz.
- Uplift modeling for clinical trial data. ICML Workshop on Clinical Data Analysis, 2012.
- Lo, Victor. 2002.
- The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.
- Zhao, Yan & Fang, Xiao & Simchi-Levi, David. 2017.
- Uplift Modeling with Multiple Treatments and General Response Types. 10.1137/1.9781611974973.66.
- Nicholas J Radcliffe. 2007.
- Using control groups to target on predicted lift: Building and assessing uplift model. Direct Marketing Analytics Journal, (3):14–21, 2007.
- Devriendt, F., Guns, T., & Verbeke, W. 2020.
- Learning to rank for uplift modeling. ArXiv, abs/2002.05897.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size scikit_uplift-0.2.0-py3-none-any.whl (20.5 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size scikit-uplift-0.2.0.tar.gz (17.6 kB) | File type Source | Python version None | Upload date | Hashes View |
Hashes for scikit_uplift-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90307b3db0ef7feba2059e0aa7dceb34585c18a195b5f3a4c1e6dd5b3215b3f8 |
|
MD5 | 62715f3ba211005ea56f63ce7dbee434 |
|
BLAKE2-256 | be1a26765130065ebbee5b7cbfbbba3767745a34be37e1335db135e11aa43198 |