Skip to main content

Structural Causal Models

Project description

OS Status
Linux L Py 3.7 - 3.9
Windows W Py 3.7 - 3.9
Mac M Py 3.7 - 3.9

A Python package implementing Structural Causal Models (SCM).

The library uses the CAS library SymPy to allow the user to state arbitrary assignment functions and noise distributions as supported by SymPy and builds the DAG with networkx.

It supports the features:

  • Sampling
  • Intervening
  • Plotting
  • Printing

and by extension all methods on a DAG provided by networkx after accessing the member variable dag

Installation

Either install via pip

pip install scmodels

or via cloning the repository and running the setup.py file

git clone https://github.com/maichmueller/scm
cd scm
python setup.py install

Building an SCM

To build the DAG

X \rightarrow Y \leftarrow Z \rightarrow X

with the assignments

Z ~ LogLogistic(alpha=1, beta=1)

X = 3Z^2{\cdot}N

Y = 2Z + \sqrt{X} + N

There are 3 different ways of declaring the SCM:

1. List Of Strings

Describe the assignments as strings of the form:

'VAR = FUNC(Noise, parent1, parent2, ...), Noise ~ DistributionXYZ'

Note that - out of convenience - in this case, one does not need to (and isn't allowed to) restate the noise symbol string in the distribution (as would otherwise be necessary in constructing sympy distributions).

from scmodels import SCM

assignment_seq = [
    "Z = M, M ~ LogLogistic(alpha=1, beta=1)",
    "X = N * 3 * Z ** 2, N ~ LogNormal(mean=1, std=1)",
    "Y = P + 2 * Z + sqrt(X), P ~ Normal(mean=2, std=1)"
]

myscm = SCM(assignment_seq)

Agreements:

  • The name of the noise variable in the distribution specification (e.g. P ~ Normal(mean=2, std=1)) has to align with the noise variable name (P) of the assignment string.

2. Assignment Map

One can construct the SCM via an assignment map with the variables as keys and a tuple defining the assignment and the noise.

2-Tuple: Assignments via SymPy parsing

To refer to SymPy's string parsing capability (this includes numpy functions) provide a dict entry with a 2-tuple as value of the form:

'var': ('assignment string', noise)

from sympy.stats import LogLogistic, LogNormal, Normal


assignment_map = {
   "Z": (
       "M",
       LogLogistic("M", alpha=1, beta=1)
   ),
   "X": (
       "N * 3 * Z ** 2",
       LogNormal("N", mean=1, std=1),
   ),
   "Y": (
       "P + 2 * Z + sqrt(X)",
       Normal("P", mean=2, std=1),
   ),
}

myscm2 = SCM(assignment_map)

Agreements:

  • the name of the noise distribution provided in its constructor (e.g. Normal("N", mean=2, std=1)) has to align with the noise variable name (N) of the assignment string.

3-Tuple: Assignments with arbitrary callables

One can also declare the SCM via specifying the variable assignment in a dictionary with the variables as keys and as values a sequence of length 3 of the form:

'var': (['parent1', 'parent2', ...], Callable, Noise)

This allows the user to supply complex functions outside the space of analytical functions.

import numpy as np


def Y_assignment(p, z, x):
    return p + 2 * z + np.sqrt(x)


functional_map = {
   "Z": (
       [],
       lambda m: m,
       LogLogistic("M", alpha=1, beta=1)
   ),
   "X": (
       ["Z"],
       lambda n, z: n * 3 * z ** 2,
       LogNormal("N", mean=1, std=1),
   ),
   "Y": (
       ["Z", "X"],
       Y_assignment,
       Normal("P", mean=2, std=1),
   ),
}

myscm3 = SCM(functional_map)

Agreements:

  • The callable's first parameter MUST be the noise input (unless the noise distribution is None).
  • The order of variables in the parents list determines the semantic order of input for parameters in the functional (left to right).

Features

Prettyprint

The SCM supports a form of informative printing of its current setup, which includes mentioning active interventions and the assignments.

print(myscm)
Structural Causal Model of 3 variables: Z, X, Y
Variables with active interventions: []
Assignments:
Z := f(M) = M	 [ M ~ LogLogistic(alpha=1, beta=1) ]
X := f(N, Z) = N * 3 * Z ** 2	 [ N ~ LogNormal(mean=1, std=1) ]
Y := f(P, Z, X) = P + 2 * Z + sqrt(X)	 [ P ~ Normal(mean=2, std=1) ]

In the case of custom callable assignments, the output is less informative

print(myscm3)
Structural Causal Model of 3 variables: Z, X, Y
Variables with active interventions: []
Assignments:
Z := f(M) = __unknown__	 [ M ~ LogLogistic(alpha=1, beta=1) ]
X := f(N, Z) = __unknown__	 [ N ~ LogNormal(mean=1, std=1) ]
Y := f(P, Z, X) = __unknown__	 [ P ~ Normal(mean=2, std=1) ]

Interventions

One can easily perform interventions on the variables, e.g. a Do-intervention or also general interventions, which remodel the connections, assignments, and noise distributions. For general interventions, the passing structure is dict of the following form:

{var: (New Parents (Optional), New Assignment (optional), New Noise (optional))}

Any part of the original variable state, that is meant to be left unchanged, has to be passed as None. E.g. to assign a new callable assignment to variable X without changing parents or noise, one would call:

my_new_callable = lambda n, z: n + z

myscm.intervention({"X": (None, my_new_callable, None)})

For the example of the do-intervention \text{do}(X=1=), one can use the helper method do_intervention. The pendant for noise interventions is called soft_intervention:

myscm.do_intervention([("X", 1)])

from sympy.stats import FiniteRV

myscm.soft_intervention([("X", FiniteRV(str(myscm["X"].noise), density={-1: .5, 1: .5}))])

Calling undo_intervention restores the original state of all variables from construction time, that have been passed. One can optionally specify, If no variables are specified (variables=None), all interventions are undone.

myscm.undo_intervention(variables=["X"])

Sampling

The SCM allows drawing as many samples as needed through the method myscm.sample(n).

n = 5
myscm.sample(n)
Z X Y
0 3.130168 25.518928 13.524461
1 0.730453 6.036398 7.148895
2 0.179568 0.156701 3.149104
3 0.879909 6.787311 6.056273
4 1.710136 20.079351 8.894617

If infinite sampling is desired, one can also receive a sampling generator through

container = {var: [] for var in myscm}
sampler = myscm.sample_iter(container)

container is an optional target dictionary to store the computed samples in.

import pandas as pd

for i in range(n):
    next(sampler)

pd.DataFrame.from_dict(container)
Z X Y
0 0.341271 1.271099 4.547078
1 2.722751 235.765034 22.591202
2 0.081638 0.107539 3.898544
3 2.745713 210.743838 21.806575
4 1.528015 9.768679 9.058807

If the target container is not provided, the generator returns a new dict for every sample.

sample = next(myscm.sample_iter())
pd.DataFrame.from_dict(sample)
Z X Y
0 0.399457 3.369994 6.946475

Plotting

If you have graphviz installed, you can plot the DAG by calling

myscm.plot(node_size=1000, alpha=1)

example_plot


No history capture as of yet.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scmodels-0.2.tar.gz (23.7 kB view hashes)

Uploaded Source

Built Distribution

scmodels-0.2-py3-none-any.whl (18.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page