Tools for causal inference

These details have not been verified by PyPI

Project links

Homepage

Project description

Causality

This package contains tools for causal analysis using observational (rather than experimental) datasets.

Installation

Assuming you have pip installed, just run

pip install causality

Causal Analysis

The simplest interface to this package is probably through the CausalDataFrame object in causality.analysis.CausalDataFrame. This is just an extension of the pandas.DataFrame object, and so it inherits the same methods.

The CausalDataFrame current supports two kinds of causal analysis. First, it has a CausalDataFrame.zmean method. This method lets you control for a set of variables, z, when you're trying to estimate the effect of a discrete variable x on a continuous variable, y. It supports both returning the y estimates at each x value, as well as providing bootstrap error bars. For more details, check out the readme here.

The second kind of analysis supported is plotting to show the effect of discrete or continuous x on continous y while controlling for z. You can do this with the CausalDataFrame.zplot method. For details, check out the readme here.

Measuring Causal Effects

the causality.estimation module contains tools for estimating causal effects from observational and experimental data. Most tools are parametric, like PropensityScoreMatching, and can be found in causality.estimation.parametric. Other models are non-parametric, and rely on directly estimating densities and using the g-estimation approach.

DAG Inference

The causality.inference module will contain various algorithms for inferring causal DAGs. Currently (2016/01/23), the only algorithm implemented is the IC* algorithm from Pearl (2000). It has decent test coverage, but feel free to write some more! I've left some stubs in tests/unit/test\_IC.py.

To run a graph search on a dataset, you can use the algorithms like (using IC* as an example):

import numpy
import pandas as pd

from causality.inference.search import IC
from causality.inference.independence_tests import RobustRegressionTest

# generate some toy data:
SIZE = 2000
x1 = numpy.random.normal(size=SIZE)
x2 = x1 + numpy.random.normal(size=SIZE)
x3 = x1 + numpy.random.normal(size=SIZE)
x4 = x2 + x3 + numpy.random.normal(size=SIZE)
x5 = x4 + numpy.random.normal(size=SIZE)

# load the data into a dataframe:
X = pd.DataFrame({'x1' : x1, 'x2' : x2, 'x3' : x3, 'x4' : x4, 'x5' : x5})

# define the variable types: 'c' is 'continuous'.  The variables defined here
# are the ones the search is performed over  -- NOT all the variables defined
# in the data frame.
variable_types = {'x1' : 'c', 'x2' : 'c', 'x3' : 'c', 'x4' : 'c', 'x5' : 'c'}

# run the search
ic_algorithm = IC(RobustRegressionTest)
graph = ic_algorithm.search(X, variable_types)

Now, we have the inferred graph stored in graph. In this graph, each variable is a node (named from the DataFrame columns), and each edge represents statistical dependence between the nodes that can't be eliminated by conditioning on the variables specified for the search. If an edge can be oriented with the data available, the arrowhead is indicated in 'arrows'. If the edge also satisfies the local criterion for genuine causation, then that directed edge will have marked=True. If we print the edges from the result of our search, we can see which edges are oriented, and which satisfy the local criterion for genuine causation:

>>> graph.edges(data=True)
[('x2', 'x1', {'arrows': [], 'marked': False}),
 ('x2', 'x4', {'arrows': ['x4'], 'marked': False}),
 ('x3', 'x1', {'arrows': [], 'marked': False}),
 ('x3', 'x4', {'arrows': ['x4'], 'marked': False}),
 ('x4', 'x5', {'arrows': ['x5'], 'marked': True})]

We can see the edges from 'x2' to 'x4', 'x3' to 'x4', and 'x4' to 'x5' are all oriented toward the second of each pair. Additionally, we see that the edge from 'x4' to 'x5' satisfies the local criterion for genuine causation. This matches the structure given in figure 2.3(d) in Pearl (2000).

Nonparametric Effects Estimation

The causality.nonparametric module contains a tool for non-parametrically estimating a causal distribution from an observational data set. You can supply an "admissable set" of variables for controlling, and the measure either the causal effect distribution of an effect given the cause, or the expected value of the effect given the cause.

I've recently added adjustment for direct causes, where you can estimate the causal effect of fixing a set of X variables on a set of Y variables by adjusting for the parents of X in your graph. Using the dataset above, you can run this like

from causality.estimation.adjustments import AdjustForDirectCauses
from networkx import DiGraph

g = DiGraph()

g.add_nodes_from(['x1','x2','x3','x4', 'x5'])
g.add_edges_from([('x1','x2'),('x1','x3'),('x2','x4'),('x3','x4')])
adjustment = AdjustForDirectCauses()

Then, you can see the set of variables being adjusted for by

>>> print(adjustment.admissable_set(g, ['x2'], ['x3']))
set(['x1'])

If we hadn't adjusted for 'x1' we would have incorrectly found that 'x2' had a causal effect on 'x3' due to the counfounding pathway x2, x1, x3. Adjustment for 'x1' removes this bias.

You can see the causal effect of intervention, P(x3|do(x2)) using the measured causal effect in adjustment,

>>> from causality.estimation.nonparametric import CausalEffect
>>> admissable_set = adjustment.admissable_set(g,['x2'], ['x3'])
>>> effect = CausalEffect(X, ['x2'], ['x3'], variable_types=variable_types, admissable_set=list(admissable_set))
>>> x = pd.DataFrame({'x2' : [0.], 'x3' : [0.]})
>>> effect.pdf(x)
0.268915603296

Which is close to the correct value of 0.282 for a gaussian with mean 0. and variance 2. If you adjust the value of 'x2', you'll find that the probability of 'x3' doesn't change. This is untrue with just the conditional distribution, P(x3|x2), since in this case, observation and intervention are not equivalent.

Other Notes

This repository is in its early phases. The run-time for the tests is long. Many optimizations will be made in the near future, including

Implement fast mutual information calculation, O( N log N )
Speed up integrating out variables for controlling
Take a user-supplied graph, and find the set of admissable sets
Front-door criterion method for determining causal effects

Pearl, Judea. Causality. Cambridge University Press, (2000).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.11

Mar 11, 2025

0.0.10

Nov 6, 2021

0.0.9

Dec 20, 2018

0.0.8

Dec 1, 2018

0.0.7

Dec 1, 2018

0.0.6

Jan 15, 2018

0.0.5

Nov 4, 2017

0.0.4

Jun 22, 2017

0.0.3

Feb 1, 2016

0.0.2

Jan 25, 2016

0.0.1

Jan 25, 2016

0.0.1a1 pre-release

Jan 25, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causality-0.0.11.tar.gz (24.8 kB view details)

Uploaded Mar 11, 2025 Source

Built Distribution

causality-0.0.11-py3-none-any.whl (19.9 kB view details)

Uploaded Mar 11, 2025 Python 3

File details

Details for the file causality-0.0.11.tar.gz.

File metadata

Download URL: causality-0.0.11.tar.gz
Upload date: Mar 11, 2025
Size: 24.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for causality-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`0de1a3aec8be9be0bb9d4e3ea010d64ab48e873b9de36388a50fdb6197972116`
MD5	`37c1735cf09837c6b0545e8e46445422`
BLAKE2b-256	`d4d171dca24bceee4d65eb2415dcdc2c4c0a3d2fee62e48876a1569a52a72083`

See more details on using hashes here.

File details

Details for the file causality-0.0.11-py3-none-any.whl.

File metadata

Download URL: causality-0.0.11-py3-none-any.whl
Upload date: Mar 11, 2025
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for causality-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`495fed80cfc321192a1cb80b366fd19a9480bea51924e886f155713fce3109f1`
MD5	`c94def89739d29ae8709121aa3326168`
BLAKE2b-256	`b0dc4a6ca4a818b14b317f41366c62b5d71252eaf89ce96b61c994893cee0b07`

See more details on using hashes here.

causality 0.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Causality

Installation

Causal Analysis

Measuring Causal Effects

DAG Inference

Nonparametric Effects Estimation

Other Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes