Skip to main content

A package to extract the causal graph from continuous tabular data.

Project description

logo

License Python Platform Build Status codecov Documentation

causalexplain - A library to infer causal-effect relationships from tabular data

'causalexplain' is a library that implements methods to extract the causal graph, from tabular data, specifically the ReX method, and other compared methods like GES, PC, FCI, LiNGAM, CAM, and NOTEARS.

ReX is a causal discovery method that leverages machine learning (ML) models coupled with explainability techniques, specifically Shapley values, to identify and interpret significant causal relationships among variables. Comparative evaluations on synthetic datasets comprising tabular data reveal that ReX outperforms state-of-the-art causal discovery methods across diverse data generation processes, including non-linear and additive noise models. Moreover, ReX was tested on the Sachs single-cell protein-signaling dataset, achieving a precision of 0.952 and recovering key causal relationships with no incorrect edges. Taking together, these results showcase ReX’s effectiveness in accurately recovering true causal structures while minimizing false positive pre- dictions, its robustness across diverse datasets, and its applicability to real-world problems. By combining ML and explainability techniques with causal discovery, ReX bridges the gap between predictive modeling and causal inference, offering an effective tool for understanding complex causal structures.

ReX Schema

It is built using SKLearn estimators, so that it can be used in scikit-learn pipelines and (hyper)parameter search, while facilitating testing (including some API compliance), documentation, open source development, packaging, and continuous integration.

The datasets used in the examples can be generated using the generators module, which is also part of this library. But in case you want to reproduce results from the articles that we used as reference, you can find the datasets in the data folder.

Prerequisites without Docker

  • Operating System: Linux or macOS
  • Environment Manager: PyEnv or Conda
  • Programming Language: Python 3.10.12 or higher
  • Hardware: CPU

Installation

The project can be installed using pip:

$ pip install causalexplain

Data

The datasets used to reproduce the results presented in the manuscript are available under the data folder. The datasets were generated using the generators module.

Executing causalexplain

To run causalexplain on your data, you can use the causalexplain command:

$ python -m causalexplain
   ___                      _                 _       _       
  / __\__ _ _   _ ___  __ _| | _____  ___ __ | | __ _(_)_ __  
 / /  / _` | | | / __|/ _` | |/ _ \ \/ / '_ \| |/ _` | | '_ \ 
/ /__| (_| | |_| \__ \ (_| | |  __/>  <| |_) | | (_| | | | | |
\____/\__,_|\__,_|___/\__,_|_|\___/_/\_\ .__/|_|\__,_|_|_| |_|
                                       |_|                                        
usage: causalexplain [-h] -d DATASET [-m {rex,pc,fci,ges,lingam,cam,notears}] 
                   [-t TRUE_DAG] [-l LOAD_MODEL] [-T THRESHOLD] [-u UNION] 
                   [-i ITERATIONS] [-b BOOTSTRAP] [-r REGRESSOR] [-S SEED] 
                   [-s [SAVE_MODEL]] [-v] [-q] [-o OUTPUT]

that will present you with a menu to choose the dataset you want to use, the method you want to use to infer the causal graph, and the hyperparameters you want to use.

The minimum required to run causalexplain is a dataset file in CSV format, with the first row containing the names of the variables, and the rest of the rows containing the values of the variables. The method selected by default is ReX, but you can also choose between PC, FCI, GES, LiNGAM, CAM, NOTEARS. At the end of the execution, the edges of the plausible causal graph will be displayed along with the metrics obtained, if the true dag is provided (argument -t).

Example commands

The following command illustrates how to run causalexplain on the toy dataset using the ReX method:

$ python -m causalexplain -d /path/to/toy_dataset.csv -t /path/to/toy_dataset.dot

The same command can be used to run causalexplain on the toy dataset using the CAM method:

$ python -m causalexplain -d /path/to/toy_dataset.csv -m cam -t /path/to/toy_dataset.dot

For more information on command line options, run causalexplain -h or go to the Quickstart section in the documentation.

Additional Information

WIP

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causalexplain-0.5.1.tar.gz (211.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causalexplain-0.5.1-py3-none-any.whl (253.5 kB view details)

Uploaded Python 3

File details

Details for the file causalexplain-0.5.1.tar.gz.

File metadata

  • Download URL: causalexplain-0.5.1.tar.gz
  • Upload date:
  • Size: 211.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for causalexplain-0.5.1.tar.gz
Algorithm Hash digest
SHA256 c8729c16b06e6ccb12c7df9d661331eaa7c7ebc671f177a669c66c208497908c
MD5 f78037b0ee5c3702a4eb71a4f563d92f
BLAKE2b-256 71f8654a6b2d9efafef2f3e0fcb7534bd1622e345cdfb4e8bb3bb1a25e625b9a

See more details on using hashes here.

File details

Details for the file causalexplain-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: causalexplain-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 253.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for causalexplain-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1e01b1dca537a3e10532458b578432f78a9e6ad4241d7a5434e8beb6b5ddfe86
MD5 b6f42146d9516b9d15b9009ca7c1ed51
BLAKE2b-256 a178b96a1e5f277728c695b4eb81d96aef1d7a57a4be1ce3b28db1f800aeaa2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page