Adjustment Identification Distance: A ๐๐๐๐๐๐ for Causal Structure Learning
Project description
Adjustment Identification Distance: A ๐๐๐๐๐๐ for Causal Structure Learning
This is an early release of ๐๐๐๐๐๐ ๐ฅ and feedback is very welcome! Just open an issue on github.
If you publish research using ๐๐๐๐๐๐, please cite our article
@article{henckel2024adjustment,
title = {{Adjustment Identification Distance: A gadjid for Causal Structure Learning}},
author = {Leonard Henckel and Theo Wรผrtzen and Sebastian Weichwald},
journal = {{arXiv preprint arXiv:2402.08616}},
year = {2024},
doi = {10.48550/arXiv.2402.08616},
}
Get Started Real Quick ๐ โ Introductory Example
Just pip install gadjid
to install the latest release of ๐๐๐๐๐๐
and run python -c "import gadjid; help(gadjid)"
to get started
(or see install alternatives).
import gadjid
from gadjid import example, ancestor_aid, oset_aid, parent_aid, shd
import numpy as np
help(gadjid)
example.run_parent_aid()
Gtrue = np.array([
[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
], dtype=np.int8)
Gguess = np.array([
[0, 0, 1, 1, 1],
[1, 0, 1, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
], dtype=np.int8)
print(ancestor_aid(Gtrue, Gguess, edge_direction="from row to column"))
print(shd(Gtrue, Gguess))
๐๐๐๐๐๐ is implemented in Rust and can conveniently be called from Python via our Python wrapper (implemented using maturin and PyO3).
Evaluating graphs learned by causal discovery algorithms is difficult: The number of edges that differ between two graphs does not reflect how the graphs differ with respect to the identifying formulas they suggest for causal effects. We introduce a framework for developing causal distances between graphs which includes the structural intervention distance for directed acyclic graphs as a special case. We use this framework to develop improved adjustment-based distances as well as extensions to completed partially directed acyclic graphs and causal orders. We develop new reachability algorithms to compute the distances efficiently and to prove their low polynomial time complexity. In our package gadjid, we provide implementations of our distances; they are orders of magnitude faster with proven lower time complexity than the structural intervention distance and thereby provide a success metric for causal discovery that scales to graph sizes that were previously prohibitive.
Parallelism โ setting the number of threads
๐๐๐๐๐๐ uses rayon for parallelism
using, per default, as many threads as there are physical CPU cores.
The number of threads to use can be set via the environment variable RAYON_NUM_THREADS
.
We recommend to do so and to set the number of threads manually,
not least to be explicit and to avoid the small runtime overhead for determining the number of physical CPU cores.
Implemented Distances
ancestor_aid(Gtrue, Gguess, edge_direction)
oset_aid(Gtrue, Gguess, edge_direction)
parent_aid(Gtrue, Gguess, edge_direction)
- for convenience, the following distances are implemented, too
shd(Gtrue, Gguess)
sid(Gtrue, Gguess, edge_direction)
โ only for DAGs!
where Gtrue
and Gguess
are adjacency matrices of a DAG or CPDAG
and edge_direction
determines whether a 1
at r-th row and c-th column of an adjacency matrix
codes the edge r โ c
(edge_direction="from row to column"
) or c โ r
(edge_direction="from column to row"
).
The functions are not symmetric in their inputs:
To calculate a distance,
identifying formulas for causal effects are inferred in the graph Gguess
and verified against the graph Gtrue
.
Distances return a tuple (normalised_distance, mistake_count)
of the fraction of causal effects inferred in Gguess that are wrong relative to Gtrue, normalised_distance
,
and the number of wrongly inferred causal effects, mistake_count
.
There are $p(p-1)$ pairwise causal effects to infer in graphs with $p$ nodes
and we define normalisation as normalised_distance = mistake_count / p(p-1)
.
You may also calculate the SID between DAGs via parent_aid(DAGtrue, DAGguess, edge_direction)
,
but we recommend ancestor_aid
and oset_aid
and for CPDAG inputs the parent_aid
does not coincide with the SID
(see also our accompanying article).
If edge_direction="from row to column"
, then
a 1
in row r
and column c
codes a directed edge r โ c
;
if edge_direction="from column to row"
, then
a 1
in row r
and column c
codes a directed edge c โ r
;
for either setting of edge_direction
,
a 2
in row r
and column c
codes an undirected edge r โ c
(an additional 2
in row c
and column r
is ignored;
one of the two entries is sufficient to code an undirected edge).
An adjacency matrix for a DAG may only contain 0s and 1s. An adjacency matrix for a CPDAG may only contain 0s, 1s and 2s. DAG and CPDAG inputs are validated for acyclicity. However, for CPDAG inputs, the user needs to ensure the adjacency matrix indeed codes a valid CPDAG (instead of just a PDAG).
Empirical Runtime Analysis
Experiments run on a laptop with 8 GB RAM and 4-core i5-8365U processor. Here, for a graph with $p$ nodes, sparse graphs have $10p$ edges in expectation, dense graphs have $0.3p(p-1)/2$ edges in expectation, and x-sparse graphs have $0.75p$ edges in expectation.
Maximum graph size feasible within 1 minute
Method | sparse | dense |
---|---|---|
Parent-AID | 13601 | 962 |
Ancestor-AID | 8211 | 932 |
Oset-AID | 1105 | 508 |
SID in R | 256 | 239 |
Results obtained with ๐๐๐๐๐๐ v0.1.0 using the Python interface and the SID R package v1.1 from CRAN.
Average runtime
Method | x-sparse ($p=1000$) | sparse ($p=256$) | dense ($p=239$) |
---|---|---|---|
Parent-AID | 7.3 ms | 30.5 ms | 173 ms |
Ancestor-AID | 3.4 ms | 40.9 ms | 207 ms |
Oset-AID | 5.0 ms | 567 ms | 1.68 s |
SID in R | ~1โ2 h | ~60 s | ~60 s |
Results obtained with ๐๐๐๐๐๐ v0.1.0 using the Python interface and the SID R package v1.1 from CRAN.
LICENSE
๐๐๐๐๐๐ is available in source code form at https://github.com/CausalDisco/gadjid.
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.
See also the MPL-2.0 FAQ.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for gadjid-0.1.0-pp310-pypy310_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98e9491e54238ed0199b012223cbbbf7ec9d052ab92be1f4b718c6047b4c6639 |
|
MD5 | 77daaeec5c4154e82c76fc9d24dc2c3f |
|
BLAKE2b-256 | d34c10529ef353f4e10201fe275603f65c91e90dba1bdf691637ac4f79ceffc5 |
Hashes for gadjid-0.1.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0957e4f563ad915ec92b86b0ef7a3e9a4729275900de0e2d49a1a5106cc86553 |
|
MD5 | 86d84e6d729a3abdfbb4aa96b79724af |
|
BLAKE2b-256 | a6b703786cbf7b835a30dc1e1d84735969f517a3d47efe7ad3e5f8957581e66a |
Hashes for gadjid-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 233f0938afeb3221a304984e8d42724f98bd8663ef149d9b6b2c88ac62e077b5 |
|
MD5 | 53a81e8475ddb03c33cfebba3dd48d21 |
|
BLAKE2b-256 | 24ebacf7474953b28ca5e4bb9b145daee2385914100d98c8c67b28c683bf9a2f |
Hashes for gadjid-0.1.0-pp310-pypy310_pp73-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b26dee51744893bc9b62549f40fbdf62d7458652d8eb3c030ffc5dd74929a0b6 |
|
MD5 | 2c41f0563778fdf84e6346fc92d80457 |
|
BLAKE2b-256 | 856c460fd1ebe9996d56d00781c774cbf8cd84a8bc8a0e7c8ccf63edb6044337 |
Hashes for gadjid-0.1.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49ec4c724b05d8ad1eb55d405734a87e025147289f2d92499de0312a0a5831a3 |
|
MD5 | 87f0cda09a7d244d1349502c41125aea |
|
BLAKE2b-256 | 0d644871e4215c8f3a1656c5a88b295a59330589b12d672577c4c6cf15251d6e |
Hashes for gadjid-0.1.0-pp39-pypy39_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f70c8d9f372f779d89c1677ebc32ff09cfde71451a0fb448b319b2beaef61df3 |
|
MD5 | 4042afb238cff743995af8175be8725e |
|
BLAKE2b-256 | c1ddced718adf6558d5230c3eb10287d1c30d022b8465303948e3b5f9b290492 |
Hashes for gadjid-0.1.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aebb0c97e2062619b7967a54913f75d44facf678de2e5b69ce5803198932f0eb |
|
MD5 | 15e20336c116342827fa796a1e0eb2e5 |
|
BLAKE2b-256 | bb6ae40aa34237ee47ca114aaed3ec1b9eb8dcf5038273a0d549f0d24dcde57e |
Hashes for gadjid-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba289c6db73e0347978609475dbf6d655afb1153793a69e745e55f2f142402ae |
|
MD5 | 51cb2a0460c7420d4dfd961ea0a06a65 |
|
BLAKE2b-256 | 9b6c7e23a5bf5851966aad69a8bd69d05bc2d9f656011627eb67957fc1ca2390 |
Hashes for gadjid-0.1.0-pp39-pypy39_pp73-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bce721d9de32fec20821e3d3cd5d342324ba72e395caf3dcb6eab0b6780458c |
|
MD5 | 3926c3a7307f4826f2bd28726d19b47a |
|
BLAKE2b-256 | d896bd24333ac74e013048c1b0dd07599b3cee16c8e03e9c3b578a4d220b40f6 |
Hashes for gadjid-0.1.0-pp39-pypy39_pp73-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10ca98e04f129c6410eb7620edf7cbdd46637e53844084bf5e428284ae854416 |
|
MD5 | cc1961b2fbd7b03d10b1dc2b28244ee0 |
|
BLAKE2b-256 | fbd136cc0822e63e75a2050e0d220124212d02493348ce3d19f24f1bc9e0ffbd |
Hashes for gadjid-0.1.0-cp38-abi3-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f45ff3032f7e80b04cd83bace38c4dad935ae7ae0dc52271a4e91d1b397a162 |
|
MD5 | 7d8c1581b18d2f30c05f52eb8c3b8942 |
|
BLAKE2b-256 | a500864fef364c43cfa2690f7eb093a254bb17ceb3eb213a8642b8404e0e4dea |
Hashes for gadjid-0.1.0-cp38-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 551e132268ffac3215c920032112ace7ed39436f369e170e43fa126d694486c8 |
|
MD5 | 7bc8116a41ee33dbf2f89f9c07b262c6 |
|
BLAKE2b-256 | 07e1f27eb262ebc05837202d40d6dc93853649d03ffcc57272d64207c7d93608 |
Hashes for gadjid-0.1.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9310080a90fc4aaded93d081023b5bb9080590c0df24c66439cda178d003288 |
|
MD5 | 639b5beb26ec71d04ee4c4caf0b101ca |
|
BLAKE2b-256 | aed2a851c5f685d6f61a445a44f5019076a3f77c0070dbd3a2c9b76e6eba2e15 |
Hashes for gadjid-0.1.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 936513302decb33831d9403d7d21f0dc1f995d739ddc1e1984960c35e1737d2e |
|
MD5 | 52f27c085b395ac2dd2fb42ce2e67b61 |
|
BLAKE2b-256 | a389bf47e9ac904b96fd9a2920084928de77fc134ee5fbcada281b6a6cf573e4 |
Hashes for gadjid-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d734b47d544a466b16f0a64370d28df771e8d86e9ab074f3aae40fb19a13689 |
|
MD5 | 82b86d2c93ce59f299a6d36d30f247ec |
|
BLAKE2b-256 | ccea0bfd2c3211fad72934dc70e6d899f9aa74d728d19d3f1eb2d7256e9ec74b |
Hashes for gadjid-0.1.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffa49fe6fd891f1200d4d07ed6a6475ba485c1f28a3e78a8e59201687471dc75 |
|
MD5 | 204f711df2d84d3797ed4f1f0852e66c |
|
BLAKE2b-256 | 81dc87938a5abe33c557d57cbd5804b35c76e42ebefcaaf3f77dee09da3fa4f7 |
Hashes for gadjid-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18c70e77af471c7d885c9e721b1755065733c82602fc897fcdb981e0e92f0752 |
|
MD5 | ebfa6ac0a5bcbc651349c985fb65db27 |
|
BLAKE2b-256 | 32daf863cefd025484f242b292d9d70d85bbdd3af027ad3681aa50c80009597b |
Hashes for gadjid-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc237ad67c4bef18752700b861c3e47346b64579f132b33e4ea9504d60378ca7 |
|
MD5 | 1382d2d22d10529677d4b33523daeaaf |
|
BLAKE2b-256 | 1618e0f14798a1fb70b1c001f7c76e38a529af2c75d5ee693a05596d281848a8 |