Skip to main content

Adjustment Identification Distance: A ๐š๐šŠ๐š๐š“๐š’๐š for Causal Structure Learning

Project description

Adjustment Identification Distance: A ๐š๐šŠ๐š๐š“๐š’๐š for Causal Structure Learning

This is an early release of ๐š๐šŠ๐š๐š“๐š’๐š ๐Ÿฅ and feedback is very welcome! Just open an issue on our github repository.

If you publish research using ๐š๐šŠ๐š๐š“๐š’๐š, please cite our article

@article{henckel2024adjustment,
    title = {{Adjustment Identification Distance: A gadjid for Causal Structure Learning}},
    author = {Leonard Henckel and Theo Wรผrtzen and Sebastian Weichwald},
    journal = {{arXiv preprint arXiv:2402.08616}},
    year = {2024},
    doi = {10.48550/arXiv.2402.08616},
}

Get Started Real Quick ๐Ÿš€ โ€“ Introductory Example

Just pip install gadjid to install the latest release of ๐š๐šŠ๐š๐š“๐š’๐š
and run python -c "import gadjid; help(gadjid)" to get started (or see install alternatives).

import gadjid
from gadjid import example, ancestor_aid, oset_aid, parent_aid, shd
import numpy as np

help(gadjid)

example.run_parent_aid()

Gtrue = np.array([
    [0, 1, 1, 1, 1],
    [0, 0, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
], dtype=np.int8)
Gguess = np.array([
    [0, 0, 1, 1, 1],
    [1, 0, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
], dtype=np.int8)

print(ancestor_aid(Gtrue, Gguess))
print(shd(Gtrue, Gguess))

๐š๐šŠ๐š๐š“๐š’๐š is implemented in Rust and can conveniently be called from Python via our Python wrapper (implemented using maturin and PyO3).

Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop polynomial-time reachability algorithms to compute the distances efficiently. In our package ๐š๐šŠ๐š๐š“๐š’๐š, we provide implementations of our distances; they are orders of magnitude faster than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.

Implemented Distances

  • ancestor_aid(Gtrue, Gguess)
  • oset_aid(Gtrue, Gguess)
  • parent_aid(Gtrue, Gguess)
  • for convenience, the following distances are implemented, too
    • shd(Gtrue, Gguess)
    • sid(Gtrue, Gguess) โ€“ only for DAGs!

where Gtrue and Gguess are adjacency matrices of a DAG or CPDAG. The functions are not symmetric in their input: To calculate a distance, identifying formulas for causal effects are inferred in the graph Gguess and verified against the graph Gtrue. Distances return a tuple (normalised_distance, mistake_count) of the fraction of causal effects inferred in Gguess that are wrong relative to Gtrue, normalised_distance, and the number of wrongly inferred causal effects, mistake_count. There are $p(p-1)$ pairwise causal effects to infer in graphs with $p$ nodes and we define normalisation as normalised_distance = mistake_count / p(p-1).

All graphs are assumed simple, that is, at most one edge is allowed between any two nodes. An adjacency matrix for a DAG may only contain 0s and 1s; a 1 in row s and column t codes a directed edge Xโ‚› โ†’ Xโ‚œ; DAG inputs are validated for acyclicity. An adjacency matrix for a CPDAG may only contain 0s, 1s and 2s; a 2 in row s and column t codes a undirected edge Xโ‚› โ€” Xโ‚œ (an additional 2 in row t and column s is ignored; only one of the two entries is required to code an undirected edge); CPDAG inputs are not validated and the user needs to ensure the adjacency matrix indeed codes a valid CPDAG (instead of just a PDAG). You may also calculate the SID between DAGs via parent_aid(DAGtrue, DAGguess), but we recommend ancestor_aid and oset_aid and for CPDAG inputs our parent_aid does not coincide with the SID (see also our accompanying article).

Empirical Runtime Analysis

Experiments run on a laptop with 8 GB RAM and 4-core i5-8365U processor. Here, for a graph with $p$ nodes, sparse graphs have $10p$ edges in expectation, dense graphs have $0.3p(p-1)/2$ edges in expectation, and sparse graphs have $0.75p$ edges in expectation.

Maximum graph size feasible within 1 minute

Method sparse dense
Parent-AID 13005 960
Ancestor-AID 8200 932
Oset-AID 546 250
SID in R 255 239

Average runtime

Method x-sparse ($p=1000$) sparse ($p=256$) dense ($p=239$)
Parent-AID 6.3 ms 22.8 ms 189 ms
Ancestor-AID 2.7 ms 38.7 ms 226 ms
Oset-AID 3.2 ms 4.69 s 47.3 s
SID in R ~1โ€“2 h ~60 s ~60 s

LICENSE

๐š๐šŠ๐š๐š“๐š’๐š is available in source code form at https://github.com/CausalDisco/gadjid.

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.

See also the MPL-2.0 FAQ.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gadjid-0.0.1.tar.gz (37.0 kB view hashes)

Uploaded Source

Built Distributions

gadjid-0.0.1-pp310-pypy310_pp73-win_amd64.whl (217.6 kB view hashes)

Uploaded PyPy Windows x86-64

gadjid-0.0.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (742.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

gadjid-0.0.1-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (744.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

gadjid-0.0.1-pp310-pypy310_pp73-macosx_11_0_arm64.whl (331.3 kB view hashes)

Uploaded PyPy macOS 11.0+ ARM64

gadjid-0.0.1-pp310-pypy310_pp73-macosx_10_12_x86_64.whl (345.5 kB view hashes)

Uploaded PyPy macOS 10.12+ x86-64

gadjid-0.0.1-pp39-pypy39_pp73-win_amd64.whl (217.5 kB view hashes)

Uploaded PyPy Windows x86-64

gadjid-0.0.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (742.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

gadjid-0.0.1-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (744.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

gadjid-0.0.1-pp39-pypy39_pp73-macosx_11_0_arm64.whl (331.2 kB view hashes)

Uploaded PyPy macOS 11.0+ ARM64

gadjid-0.0.1-pp39-pypy39_pp73-macosx_10_12_x86_64.whl (345.4 kB view hashes)

Uploaded PyPy macOS 10.12+ x86-64

gadjid-0.0.1-cp38-abi3-win_arm64.whl (203.8 kB view hashes)

Uploaded CPython 3.8+ Windows ARM64

gadjid-0.0.1-cp38-abi3-win_amd64.whl (217.9 kB view hashes)

Uploaded CPython 3.8+ Windows x86-64

gadjid-0.0.1-cp38-abi3-musllinux_1_1_x86_64.whl (909.0 kB view hashes)

Uploaded CPython 3.8+ musllinux: musl 1.1+ x86-64

gadjid-0.0.1-cp38-abi3-musllinux_1_1_aarch64.whl (916.8 kB view hashes)

Uploaded CPython 3.8+ musllinux: musl 1.1+ ARM64

gadjid-0.0.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (742.8 kB view hashes)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

gadjid-0.0.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (744.8 kB view hashes)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

gadjid-0.0.1-cp38-abi3-macosx_11_0_arm64.whl (331.7 kB view hashes)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

gadjid-0.0.1-cp38-abi3-macosx_10_12_x86_64.whl (346.0 kB view hashes)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page