A library of causal inference tools by IBM Haifa Research Labs
Project description
Causal Inference 360
A Python package for inferring causal effects from observational data.
Description
Causal inference analysis enables estimating the causal effect of an intervention on some outcome from realworld nonexperimental observational data.
This package provides a suite of causal methods,
under a unified scikitlearninspired API.
It implements metaalgorithms that allow plugging in arbitrarily complex machine learning models.
This modular approach supports highlyflexible causal modelling.
The fitandpredictlike API makes it possible to train on one set of examples
and estimate an effect on the other (outofbag),
which allows for a more "honest"^{1} effect estimation.
The package also includes an evaluation suite. Since most causalmodels utilize machine learning models internally, we can diagnose poorperforming models by reinterpreting known ML evaluations from a causal perspective. See arXiv:1906.00442 for more details on how.
^{1} Borrowing Wager & Athey terminology of avoiding overfit.
Installation
pip install causallib
Usage
In general, the package is imported using the name causallib
.
Every causal model requires an internal machinelearning model.
causallib
supports any model that has a sklearnlike fitpredict API
(note some models might require a predict_proba
implementation).
For example:
from sklearn.linear_model import LogisticRegression from causallib.estimation import IPW from causallib.datasets import load_nhefs data = load_nhefs() ipw = IPW(LogisticRegression()) ipw.fit(data.X, data.a) potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y) effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])
Comprehensive Jupyter Notebooks examples can be found in the examples directory.
Approach to causalinference
Some key points on how we address causalinference estimation
1. Emphasis on potential outcome prediction
Causal effect may be the desired outcome.
However, every effect is defined by two potential (counterfactual) outcomes.
We adopt this twostep approach by separating the effectestimating step
from the potentialoutcomeprediction step.
A beneficial consequence to this approach is that it better supports
multitreatment problems where "effect" is not welldefined.
2. Stratified average treatment effect
The causal inference literature devotes special attention to the population
on which the effect is estimated on.
For example, ATE (average treatment effect on the entire sample),
ATT (average treatment effect on the treated), etc.
By allowing outofbag estimation, we leave this specification to the user.
For example, ATE is achieved by model.estimate_population_outcome(X, a)
and ATT is done by stratifying on the treated: model.estimate_population_outcome(X.loc[a==1], a.loc[a==1])
3. Families of causal inference models
We distinguish between two types of models:
 Weight models: weight the data to balance between the treatment and control groups,
and then estimates the potential outcome by using a weighted average of the observed outcome.
Inverse Probability of Treatment Weighting (IPW or IPTW) is the most known example of such models.  Direct outcome models: uses the covariates (features) and treatment assignment to build a
model that predicts the outcome directly. The model can then be used to predict the outcome
under any assignment of treatment values, specifically the potentialoutcome under assignment of
all controls or all treated.
These models are usually known as Standardization models, and it should be noted that, currently, they are the only ones able to generate individual effect estimation (otherwise known as CATE).
4. Confounders and DAGs
One of the most important steps in causal inference analysis is to have proper selection on both dimensions of the data to avoid introducing bias:
 On rows: thoughtfully choosing the right inclusion\exclusion criteria for individuals in the data.
 On columns: thoughtfully choosing what covariates (features) act as confounders and should be included in the analysis.
This is a place where domain expert knowledge is required and cannot be fully and truly automated
by algorithms.
This package assumes that the data provided to the model fit the criteria.
However, filtering can be applied in realtime using a scikitlearn pipeline estimator
that chains preprocessing steps (that can filter rows and select columns) with a causal model at the end.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size causallib0.6.0py3noneany.whl (2.0 MB)  File type Wheel  Python version py3  Upload date  Hashes View 
Filename, size causallib0.6.0.tar.gz (1.9 MB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for causallib0.6.0py3noneany.whl
Algorithm  Hash digest  

SHA256  780d4123b1c9803aff90042185096bd44407c492f72563a5d7b865b682514200 

MD5  7101352d1d367e7253a3b66bb5e8b9fb 

BLAKE2256  c5335c61207f2fa81c560bf0fba6c7152c873adb7e561545180f0f353d2693ab 