Skip to main content

A Python package for flexible and modular causal inference modeling

Project description

CI Status Code Climate coverage PyPI Documentation Status Binder Slack channel Slack channel Downloads

Causal Inference 360

A Python package for inferring causal effects from observational data.

Description

Causal inference analysis enables estimating the causal effect of an intervention on some outcome from real-world non-experimental observational data.

This package provides a suite of causal methods, under a unified scikit-learn-inspired API. It implements meta-algorithms that allow plugging in arbitrarily complex machine learning models. This modular approach supports highly-flexible causal modelling. The fit-and-predict-like API makes it possible to train on one set of examples and estimate an effect on the other (out-of-bag), which allows for a more "honest"1 effect estimation.

The package also includes an evaluation suite. Since most causal-models utilize machine learning models internally, we can diagnose poor-performing models by re-interpreting known ML evaluations from a causal perspective.

If you use the package, please consider citing Shimoni et al., 2019:

Reference
@article{causalevaluations,
  title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference},
  author={Shimoni, Yishai and Karavani, Ehud and Ravid, Sivan and Bak, Peter and Ng, Tan Hung and Alford, Sharon Hensley and Meade, Denise and Goldschmidt, Yaara},
  journal={arXiv preprint arXiv:1906.00442},
  year={2019}
}

1 Borrowing Wager & Athey terminology of avoiding overfit.

Installation

pip install causallib

Usage

The package is imported using the name causallib. Each causal model requires an internal machine-learning model. causallib supports any model that has a sklearn-like fit-predict API (note some models might require a predict_proba implementation). For example:

from sklearn.linear_model import LogisticRegression
from causallib.estimation import IPW 
from causallib.datasets import load_nhefs

data = load_nhefs()
ipw = IPW(LogisticRegression())
ipw.fit(data.X, data.a)
potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)
effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])

Comprehensive Jupyter Notebooks examples can be found in the examples directory.

Community support

We use the Slack workspace at causallib.slack.com for informal communication. We encourage you to ask questions regarding causal-inference modelling or usage of causallib that don't necessarily merit opening an issue on Github.

Use this invite link to join causallib on Slack.

Approach to causal-inference

Some key points on how we address causal-inference estimation

1. Emphasis on potential outcome prediction

Causal effect may be the desired outcome. However, every effect is defined by two potential (counterfactual) outcomes. We adopt this two-step approach by separating the effect-estimating step from the potential-outcome-prediction step. A beneficial consequence to this approach is that it better supports multi-treatment problems where "effect" is not well-defined.

2. Stratified average treatment effect

The causal inference literature devotes special attention to the population on which the effect is estimated on. For example, ATE (average treatment effect on the entire sample), ATT (average treatment effect on the treated), etc. By allowing out-of-bag estimation, we leave this specification to the user. For example, ATE is achieved by model.estimate_population_outcome(X, a) and ATT is done by stratifying on the treated: model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])

3. Families of causal inference models

We distinguish between two types of models:

  • Weight models: weight the data to balance between the treatment and control groups, and then estimates the potential outcome by using a weighted average of the observed outcome. Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models.
  • Direct outcome models: uses the covariates (features) and treatment assignment to build a model that predicts the outcome directly. The model can then be used to predict the outcome under any assignment of treatment values, specifically the potential-outcome under assignment of all controls or all treated.
    These models are usually known as Standardization models, and it should be noted that, currently, they are the only ones able to generate individual effect estimation (otherwise known as CATE).
4. Confounders and DAGs

One of the most important steps in causal inference analysis is to have proper selection on both dimensions of the data to avoid introducing bias:

  • On rows: thoughtfully choosing the right inclusion\exclusion criteria for individuals in the data.
  • On columns: thoughtfully choosing what covariates (features) act as confounders and should be included in the analysis.

This is a place where domain expert knowledge is required and cannot be fully and truly automated by algorithms. This package assumes that the data provided to the model fit the criteria. However, filtering can be applied in real-time using a scikit-learn pipeline estimator that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causallib-0.10.0.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

causallib-0.10.0-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file causallib-0.10.0.tar.gz.

File metadata

  • Download URL: causallib-0.10.0.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for causallib-0.10.0.tar.gz
Algorithm Hash digest
SHA256 e4e384ef3668c371c7c05515100d114ef2c1ba06f78a30d5b9a1de8922afbc0a
MD5 60cf1f64b4eedadb6af1c0bcd92d7d42
BLAKE2b-256 145f1107d0a84ba164977d176d7435257e4ad65c1aaeae31ea5bcf76161240dc

See more details on using hashes here.

File details

Details for the file causallib-0.10.0-py3-none-any.whl.

File metadata

  • Download URL: causallib-0.10.0-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for causallib-0.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebb0293ce633df7c667fa284706538c6c7e2c66376870d2c1325d0e1a4bbd76e
MD5 5479148802a4321e0ec26ecf2e7d3dba
BLAKE2b-256 ba61bdfef05e198e6e8702c84efb92993bcd6fd611b9cfdb0ae1ac19f3f005b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page