A package for causal discovery and causal inference algorithms

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

MagPy: Causal Discovery and Effect Estimation Framework

MagPy is a Python framework for causal discovery and effect estimation, aimed at uncovering causal relationships in observational data and estimate the impact of interventinos/counterfactuals.

This is an experimental project, currently under active development.

Installation

pip install causal-magpy

git clone https://github.com/ergodic-ai/magpy.git
cd magpy
pip install .

pip install git+https://github.com/ergodic-ai/magpy.git

1. AStarSearch

The AStarSearch class implements the A* search algorithm for Bayesian network structure learning. Our goal with this algorithm is to be able to decouple the scoring function from the search algorithm.

A simple graph search using the BIC score:

import pandas as pd
from magpy.search.astar import AStarSearch, bic_score_node

data: pd.DataFrame = ...

astar = AStarSearch(data)
astar.run_scoring(func=bic_score_node, parallel=False)
result = astar.search()

Or a more complex search using a polynomial scoring function:

from typing import Optional
import numpy
from sklearn.preprocessing import PolynomialFeatures


def bic_score_node_poly(
    y: numpy.ndarray,
    X: Optional[numpy.ndarray] = None,
    node: str | None = None,
    parent_set: set | None = None,
    degree=3,
    include_bias=True,
):
    n = len(y)

    if X is None:
        residual = numpy.sum(y**2)
        dof = 0

    else:
        Xf = PolynomialFeatures(degree=degree, include_bias=include_bias).fit_transform(
            X
        )
        n, dof = Xf.shape
        _, residual, _, _ = numpy.linalg.lstsq(a=Xf, b=y, rcond=None)

    bic = n * numpy.log(residual / n) + dof * numpy.log(n)
    return bic.item()

astar = AStarSearch(data)
astar.run_scoring(func=bic_score_node, parallel=False)
result = astar.search()

The philosophy behind this algorithm is that:

You know the data! And that should be reflected in your choice of scoring function. As long as it adheres to the above interface, it will work.
The algorithm executes the search.

2. Skeleton Learning

As part of the causal discovery pipeline, we need to learn the skeleton of the graph. This is typically done using the PC algorithm. The PC algorithm is essentially a wrapper around a conditional independence test, which we'll call an oracle.

An oracle is an object that tests whether a certin variable X is independent of Y given a set of covariates Z: $(X \perp Y | Z)$.

We are currently supporting two main oracles:

MixedDataOracle, which can handle continuous, binary, and categorical variables, and assumes linear relationships.
BaseOracle, which is the base class to implement your own oracle.

2.1 BaseOracle

Similarly to the AStar algorithm, our objective with the BaseOracle is to decouple the underlying "learner" from the hypothesis testing.

from magpy.oracles.oracles import BaseOracle, linear
import numpy

z = numpy.random.randn(1000)
x = z + numpy.random.randn(1000) * 0.1
y = x + numpy.random.randn(1000) * 0.1

df = pandas.DataFrame({"x": x, "y": y, "z": z})


oracle = BaseOracle(df, threshold=0.05, learner=linear)

print("linear: ")
print("independent: ", oracle("y", "x", ["z"]))
print("pvalue: ", oracle._run("y", "x", ["z"]))

The learner object is a function that accepts X, y and returns the RSS of a regression and the number of degrees of freedom within the model..

Here's an example of how to implement a learner based on polynomial regression:

from typing import Optional, Union
import pandas
import numpy
from sklearn.preprocessing import PolynomialFeatures


def poly_rss(
    X: Union[pandas.DataFrame, None],
    y: pandas.Series,
    node: Optional[str] = None,
    parent_set: Optional[set] = None,
    degree: int = 3,
):
    """Perform polynomial regression and return residual sum of squares and degrees of freedom.

    Args:
        X (Union[pandas.DataFrame, None]): Feature matrix. If None, only intercept is used.
        y (pandas.Series): Target variable.
        node (Optional[str], optional): Node name, not used but included for API compatibility. Defaults to None.
        parent_set (Optional[set], optional): Parent set, not used but included for API compatibility. Defaults to None.
        degree (int, optional): Degree of polynomial features. Defaults to 3.

    Returns:
        tuple: (rss, p)
            rss (float): Residual sum of squares from polynomial regression
            p (int): Number of parameters (degrees of freedom) in the model
    """

    if X is None:
        X_values = numpy.ones(shape=(y.shape[0], 1))

    else:
        X_values = X.values
        X_values = PolynomialFeatures(degree=degree).fit_transform(X_values)

    y_values: numpy.ndarray = y.values  # type: ignore

    _, [rss], _, _ = numpy.linalg.lstsq(X_values, y_values, rcond=None)

    p = X_values.shape[1]

    return rss, p

Using this learner, we can now model more complex relationships:

from magpy.oracles.oracles import BaseOracle, linear
import numpy

z = numpy.random.randn(1000)
x = z**2 + numpy.random.randn(1000) * 0.1
y = x**2 + numpy.random.randn(1000) * 0.1

df = pandas.DataFrame({"x": x, "y": y, "z": z})


oracle = BaseOracle(df, threshold=0.05, learner=linear)

print("linear: ")
print("independent: ", oracle("y", "x", ["z"]))
print("pvalue: ", oracle._run("y", "x", ["z"]))



oracle = BaseOracle(df, threshold=0.05, learner=poly_rss)

print("polynomial: ")
print("independent: ", oracle("y", "x", ["z"]))
print("pvalue: ", oracle._run("y", "x", ["z"]))

Again, our philosophy is that you know your data best, and you should be able to implement a learner that best captures the relationship you are interested in.

2.2 MixedDataOracle

We developed this oracle because dealing with mixed data types is a pain. One-hot encoding and praying isn't necessarily a good idea, and this provides a quick way to handle this with some science behind it.

This is losely based on the work of Tsagris et al.

Here's a silly example:

from magpy.oracles.mixed import MixedDataOracle
import pandas
import numpy

z = numpy.random.randn(1000)
x = z + numpy.random.randn(1000)
y = z + numpy.random.randn(1000)
y_d = [str(int(elm.clip(-2, 2))) for elm in y]

df = pandas.DataFrame({"x": x, "y": y_d, "z": z})

oracle = MixedDataOracle(df, threshold=0.05)
print("Independent: ", oracle("y", "x", ["z"]))
print("pvalue: ", oracle._run("y", "x", ["z"]))

The oracle automatically tags variables as continuous or binary/categorical based on the data. If you want an integer to be treated as categorical, make sure to cast it as a string or object before.

2.3 PC Algorithm

We haven't implemented the full PC algorithm yet, our goal is to actually separate it into the various components:

Skeleton search
V-structures detection
Further edge orientation

For now let's stick to skeleton search:

from magpy.search.pcskeleton import pc_skeleton
from magpy.oracles.oracles import BaseOracle, linear, cubic
import pandas
from typing import Callable



def pc_skeleton_magpy(
    X: pandas.DataFrame,
    learner: Callable = linear,
    intersection_or_union: str = "union",
):
    oracle = BaseOracle(X, threshold=0.05, learner=learner)
    skeleton, sepsets = pc_skeleton(
        oracle, X.columns, intersection_or_union=intersection_or_union
    )
    return skeleton

There are a number of niceties inside the PC skeleton implementation, we'll update the documentation soon to expose them. If you are working with continuous data, we strongly recommend composing the PC skeleton algorithm with a direct search method for orientation.

Composing

Our goal here is to allow for composition of different parts of the causal discovery pipeline. For instance, this is how you will perform a skeleton search using the PC skeleton:

from magpy.search.pcskeleton import pc_skeleton
from magpy.search.astar import AStarSearch, bic_score_node_poly
import pandas
from typing import Callable


def full_composite_search(
    X: pandas.DataFrame,
    learner_pc: Callable = cubic,
    learner_astar: Callable = bic_score_node_poly,
    intersection_or_union: str = "union",
    force=True,
):
    # Fix colinearity
    fix_colinearity(X)

    skeleton = pc_skeleton_magpy(
        X, intersection_or_union=intersection_or_union, learner=learner_pc
    )

    priors = skeleton.copy() * 0
    priors.loc["known_parent", "known_child"] = 1

    astar = AStarSearch(X, super_graph=skeleton, include_graph=priors)
    astar.run_scoring(parallel=False, func=learner_astar, verbose=False)
    y_df = astar.search()

    return y_df

3. Effect Estimation

Under deep development. The SF and the Diabetes notebooks are good starting points.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.1

Dec 7, 2024

0.1.0

Dec 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal-magpy-0.1.1.tar.gz (27.0 kB view details)

Uploaded Dec 7, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causal_magpy-0.1.1-py3-none-any.whl (31.2 kB view details)

Uploaded Dec 7, 2024 Python 3

File details

Details for the file causal-magpy-0.1.1.tar.gz.

File metadata

Download URL: causal-magpy-0.1.1.tar.gz
Upload date: Dec 7, 2024
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for causal-magpy-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d8dc84dc916b9932aeab7e8a855282a19fb1016a33e8c6c152efd36b93cb5752`
MD5	`9bf9806651cd4a0763ac93d56ac56a37`
BLAKE2b-256	`d42d4fdf06f67885edabc28139abf36884010e2c400263d328ab217a57a7f6f3`

See more details on using hashes here.

File details

Details for the file causal_magpy-0.1.1-py3-none-any.whl.

File metadata

Download URL: causal_magpy-0.1.1-py3-none-any.whl
Upload date: Dec 7, 2024
Size: 31.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for causal_magpy-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19603d98df648929c1d3fd4027a0c6cc6a153828b06c4e2ea39516dd1b9eb76e`
MD5	`e0241ccd4e44eacfe44ed91ce3e754a5`
BLAKE2b-256	`d5676fa5d669c81988da97ba7fb83716d52e2cf96572d787b93940d20574ce7e`

See more details on using hashes here.

causal-magpy 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MagPy: Causal Discovery and Effect Estimation Framework

Installation

1. AStarSearch

2. Skeleton Learning

2.1 BaseOracle

2.2 MixedDataOracle

2.3 PC Algorithm

Composing

3. Effect Estimation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes