Skip to main content

This package contains several methods for calculating Conditional Average Treatment Effects

Project description

Build Status PyPI version PyPI wheel Supported Python versions

EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation

EconML is a Python package for estimating heterogeneous treatment effects from observational data via machine learning. This package was designed and built as part of the ALICE project at Microsoft Research with the goal to combine state-of-the-art machine learning techniques with econometrics to bring automation to complex causal inference problems. The promise of EconML:

  • Implement recent techniques in the literature at the intersection of econometrics and machine learning
  • Maintain flexibility in modeling the effect heterogeneity (via techniques such as random forests, boosting, lasso and neural nets), while preserving the causal interpretation of the learned model and often offering valid confidence intervals
  • Use a unified API
  • Build on standard Python packages for Machine Learning and Data Analysis

In a nutshell, this toolkit is designed to measure the causal effect of some treatment variable(s) T on an outcome variable Y, controlling for a set of features X. For detailed information about the package, consult the documentation at https://econml.azurewebsites.net/.

Table of Contents

Introduction

About Treatment Effect Estimation

One of the biggest promises of machine learning is to automate decision making in a multitude of domains. At the core of many data-driven personalized decision scenarios is the estimation of heterogeneous treatment effects: what is the causal effect of an intervention on an outcome of interest for a sample with a particular set of features?

Such questions arise frequently in customer segmentation (what is the effect of placing a customer in a tier over another tier), dynamic pricing (what is the effect of a pricing policy on demand) and medical studies (what is the effect of a treatment on a patient). In many such settings we have an abundance of observational data, where the treatment was chosen via some unknown policy, but the ability to run control A/B tests is limited.

Example Applications

Customer Targeting

Businesses offer personalized incentives to customers to increase sales and level of engagement. Any such personalized intervention corresponds to a monetary investment and the main question that business analytics are called to answer is: what is the return on investment? Analyzing the ROI is inherently a treatment effect question: what was the effect of any investment on a customer's spend? Understanding how ROI varies across customers can enable more targeted investment policies and increased ROI via better targeting.

Personalized Pricing

Personalized discounts have are widespread in the digital economy. To set the optimal personalized discount policy a business needs to understand what is the effect of a drop in price on the demand of a customer for a product as a function of customer characteristics. The estimation of such personalized demand elasticities can also be phrased in the language of heterogeneous treatment effects, where the treatment is the price on the demand as a function of observable features of the customer.

Stratification in Clinical Trials

Which patients should be selected for a clinical trial? If we want to demonstrate that a clinical treatment has an effect on at least some subset of a population then fully randomized clinical trials are inappropriate as they will solely estimate average effects. Using heterogeneous treatment effect techniques, we can use observational data to come up with estimates of these effects and identify good candidate patients for a clinical trial that our model estimates have high treatment effects.

Learning Click-Through-Rates

In the design of a page layout and ad placement, it is important to understand the click-through-rate of page components on different positions of a page. Modern approaches may be to run multiple A/B tests, but when such page component involve revenue considerations, then observational data can help guide correct A/B tests to run. Heterogeneous treatment effect estimation can provide estimates of the click-through-rate of page components from observational data. In this setting, the treatment is simply whether the component is placed on that page position and the response is whether the user clicked on it.

News

January 10, 2020: Release v0.6.1, see release notes here

December 6, 2019: Release v0.6, see release notes here

November 21, 2019: Release v0.5, see release notes here.

June 3, 2019: Release v0.4, see release notes here.

May 3, 2019: Release v0.3, see release notes here.

April 10, 2019: Release v0.2, see release notes here.

March 6, 2019: Release v0.1, welcome to have a try and provide feedback.

Getting Started

Installation

Install the latest release from PyPI:

pip install econml

To install from source, see For Developers section below.

Usage Examples

  • Double Machine Learning

    from econml.dml import LinearDMLCateEstimator
    from sklearn.linear_model import LassoCV
    
    est = LinearDMLCateEstimator(model_y=LassoCV(), model_t=LassoCV())
    est.fit(Y, T, X, W, inference='statsmodels') # W -> high-dimensional confounders, X -> features
    treatment_effects = est.effect(X_test)
    lb, ub = est.effect_interval(X_test, alpha=0.05) # Confidence intervals via OLS asymptotics
    
    from econml.dml import LinearDMLCateEstimator
    from sklearn.linear_model import LassoCV
    from econml.inference import BootstrapInference
    
    est = LinearDMLCateEstimator(model_y=LassoCV(), model_t=LassoCV())
    est.fit(Y, T, X, W, inference='bootstrap')  # with default bootstrap parameters
    est.fit(Y, T, X, W, inference=BootstrapInference(n_bootstrap_samples=100))  # or customized
    treatment_effects = est.effect(X_test)
    lb, ub = est.effect_interval(X_test, alpha=0.05) # Bootstrap confidence intervals
    
    from econml.dml import SparseLinearDMLCateEstimator
    from sklearn.linear_model import LassoCV
    
    est = SparseLinearDMLCateEstimator(model_y=LassoCV(), model_t=LassoCV())
    est.fit(Y, T, X, W, inference='debiasedlasso') # X -> high dimensional features
    treatment_effects = est.effect(X_test)
    lb, ub = est.effect_interval(X_test, alpha=0.05) # Confidence intervals via debiased lasso
    
    from econml.dml import ForestDMLCateEstimator
    from sklearn.ensemble import GradientBoostingRegressor
    
    est = ForestDMLCateEstimator(model_y=GradientBoostingRegressor(), model_t=GradientBoostingRegressor())
    est.fit(Y, T, X, W, inference='blb') 
    treatment_effects = est.effect(X_test)
    # Confidence intervals via Bootstrap-of-Little-Bags for forests
    lb, ub = est.effect_interval(X_test, alpha=0.05)
    
  • Orthogonal Random Forests

    from econml.ortho_forest import ContinuousTreatmentOrthoForest
    from econml.sklearn_extensions.linear_model import WeightedLasso, WeightedLassoCV
    # Use defaults
    est = ContinuousTreatmentOrthoForest()
    # Or specify hyperparameters
    est = ContinuousTreatmentOrthoForest(n_trees=500, min_leaf_size=10, 
                                        max_depth=10, subsample_ratio=0.7,
                                        lambda_reg=0.01,
                                        model_T=WeightedLasso(alpha=0.01), model_Y=WeightedLasso(alpha=0.01),
                                        model_T_final=WeightedLassoCV(cv=3), model_Y_final=WeightedLassoCV(cv=3))
    est.fit(Y, T, X, W)
    treatment_effects = est.effect(X_test)
    
  • Meta-Learners

    from econml.metalearners import XLearner
    from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
    
    est = XLearner(models=GradientBoostingRegressor(),
                  propensity_model=GradientBoostingClassifier(),
                  cate_models=GradientBoostingRegressor())
    est.fit(Y, T, np.hstack([X, W]))
    treatment_effects = est.effect(np.hstack([X_test, W_test]))
    
    # Fit with bootstrap confidence interval construction enabled
    est.fit(Y, T, np.hstack([X, W]), inference='bootstrap')
    treatment_effects = est.effect(np.hstack([X_test, W_test]))
    lb, ub = est.effect_interval(np.hstack([X_test, W_test]), alpha=0.05) # Bootstrap CIs
    
    from econml.metalearners import SLearner
    from sklearn.ensemble import GradientBoostingRegressor
    
    est = SLearner(overall_model=GradientBoostingRegressor())
    est.fit(Y, T, np.hstack([X, W]))
    treatment_effects = est.effect(np.hstack([X_test, W_test]))
    
    from econml.metalearners import TLearner
    from sklearn.ensemble import GradientBoostingRegressor
    
    est = TLearner(models=GradientBoostingRegressor())
    est.fit(Y, T, np.hstack([X, W]))
    treatment_effects = est.effect(np.hstack([X_test, W_test]))
    
  • Doubly Robust Learner

    from econml.drlearner import LinearDRLearner
    from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
    
    est = LinearDRLearner(model_propensity=GradientBoostingClassifier(),
                          model_regression=GradientBoostingRegressor())
    est.fit(Y, T, X, W, inference='statsmodels')
    treatment_effects = est.effect(X_test)
    lb, ub = est.effect_interval(X_test, alpha=0.05)
    
    from econml.drlearner import SparseLinearDRLearner
    from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
    
    est = SparseLinearDRLearner(model_propensity=GradientBoostingClassifier(),
                                model_regression=GradientBoostingRegressor())
    est.fit(Y, T, X, W, inference='debiasedlasso')
    treatment_effects = est.effect(X_test)
    lb, ub = est.effect_interval(X_test, alpha=0.05)
    
    from econml.drlearner import ForestDRLearner
    from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
    
    est = ForestDRLearner(model_propensity=GradientBoostingClassifier(),
                          model_regression=GradientBoostingRegressor())
    est.fit(Y, T, X, W, inference='blb') 
    treatment_effects = est.effect(X_test)
    lb, ub = est.effect_interval(X_test, alpha=0.05)
    
  • Deep Instrumental Variables

    import keras
    from econml.deepiv import DeepIVEstimator
    
    treatment_model = keras.Sequential([keras.layers.Dense(128, activation='relu', input_shape=(2,)),
                                       keras.layers.Dropout(0.17),
                                       keras.layers.Dense(64, activation='relu'),
                                       keras.layers.Dropout(0.17),
                                       keras.layers.Dense(32, activation='relu'),
                                       keras.layers.Dropout(0.17)])
    response_model = keras.Sequential([keras.layers.Dense(128, activation='relu', input_shape=(2,)),
                                      keras.layers.Dropout(0.17),
                                      keras.layers.Dense(64, activation='relu'),
                                      keras.layers.Dropout(0.17),
                                      keras.layers.Dense(32, activation='relu'),
                                      keras.layers.Dropout(0.17),
                                      keras.layers.Dense(1)])
    est = DeepIVEstimator(n_components=10, # Number of gaussians in the mixture density networks)
                          m=lambda z, x: treatment_model(keras.layers.concatenate([z, x])), # Treatment model
                          h=lambda t, x: response_model(keras.layers.concatenate([t, x])), # Response model
                          n_samples=1 # Number of samples used to estimate the response
                          )
    est.fit(Y, T, X, Z) # Z -> instrumental variables
    treatment_effects = est.effect(X_test)
    

To see more complex examples, go to the notebooks section of the repository. For a more detailed description of the treatment effect estimation algorithms, see the EconML documentation.

For Developers

You can get started by cloning this repository. We use setuptools for building and distributing our package. We rely on some recent features of setuptools, so make sure to upgrade to a recent version with pip install setuptools --upgrade. Then from your local copy of the repository you can run python setup.py develop to get started.

Running the tests

This project uses pytest for testing. To run tests locally after installing the package, you can use python setup.py pytest.

Generating the documentation

This project's documentation is generated via Sphinx. Note that we use graphviz's dot application to produce some of the images in our documentation, so you should make sure that dot is installed and in your path.

To generate a local copy of the documentation from a clone of this repository, just run python setup.py build_sphinx -W -E -a, which will build the documentation and place it under the build/sphinx/html path.

The reStructuredText files that make up the documentation are stored in the docs directory; module documentation is automatically generated by the Sphinx build process.

Blogs and Publications

Citation

If you use EconML in your research, please cite us as follows:

Microsoft Research. EconML: A Python Package for ML-Based Heterogeneous Treatment Effects Estimation. https://github.com/microsoft/EconML, 2019. Version 0.x.

BibTex:

@misc{econml,
  author={Microsoft Research},
  title={{EconML}: {A Python Package for ML-Based Heterogeneous Treatment Effects Estimation}},
  howpublished={https://github.com/microsoft/EconML},
  note={Version 0.x},
  year={2019}
}

Contributing and Feedback

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

References

V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis. Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2019 (Spotlight Presentation)

D. Foster, V. Syrgkanis. Orthogonal Statistical Learning. Proceedings of the 32nd Annual Conference on Learning Theory (COLT), 2019 (Best Paper Award)

M. Oprescu, V. Syrgkanis and Z. S. Wu. Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.

V. Chernozhukov, D. Nekipelov, V. Semenova, V. Syrgkanis. Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models. Arxiv preprint arxiv:1806.04823, 2018.

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A flexible approach for counterfactual prediction. Proceedings of the 34th International Conference on Machine Learning, ICML'17, 2017.

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. Double Machine Learning for Treatment and Causal Parameters. ArXiv preprint arXiv:1608.00060, 2016.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

econml-0.6.1.tar.gz (254.2 kB view details)

Uploaded Source

Built Distribution

econml-0.6.1-py3-none-any.whl (278.3 kB view details)

Uploaded Python 3

File details

Details for the file econml-0.6.1.tar.gz.

File metadata

  • Download URL: econml-0.6.1.tar.gz
  • Upload date:
  • Size: 254.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for econml-0.6.1.tar.gz
Algorithm Hash digest
SHA256 f8e383b882ed527bdb3e22eadf8541c494faf1a3fd8a5e93a32f78420cb35cf7
MD5 28ab9ac6fae9de29fcd2949a87fb0cde
BLAKE2b-256 85665595f9b9fe2d241183891fd7e4aa6255affb26785706af00ee3198a8d60a

See more details on using hashes here.

File details

Details for the file econml-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: econml-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 278.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for econml-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3298224c236bd117eed85e4c8b242e4a7caa9ddccb1381d1969ee046528dd23
MD5 5463242492ea0ce0b206120a9d3c8c08
BLAKE2b-256 904fdd76f7be30ec7294b1cfd46b5622b64c29bcdd6a1709205826eb14176c52

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page