Skip to main content

A package of tools for working with the tetrad java program for causal discovery from CMU

Project description

fastcda

fastcda is a package for performing causal discovery analysis.

The primary driver of this project is to create a package that can be installed quickly with minimal friction, support multiple platforms (Linux, Windows, macOS) and have fast execution and the ability to handle large datasets efficiently. Consequently, the goal is to have the core causal search algorithms written in C with the other "glue" components written in Python.

During the initial phase, we use jpype to call methods in the Tetrad java program from Carnegie Mellon University (https://github.com/cmu-phil/tetrad). This will also facilitate comparison between algorithms. The default Tetrad version being used in 7.6.3. This corresponds to causal_cmd 1.12.0.

The code has been designed and tested to run on Windows11, macOS Sequoia and Ubuntu 22.04. It should run on other versions of these platforms.

For a simple sample usage example, try out the fastcda_demo_short.ipynb file in the github repository. This will run nicely within vscode.

Usage

1. Preliminaries

You will need a JDK21 or higher version which can be downloaded from here: https://www.oracle.com/java/technologies/downloads/#java21

You will also need the graphviz package which can be downloaded from here: https://graphviz.org/download/

2. Create a python virtual environment

python -m venv .venv

# Activate the virtual environment
# On Windows PowerShell:
.venv\Scripts\activate.ps1
# On macOS/Linux:
source .venv/bin/activate

# Then install the necessary packages using pip
pip install fastcda

3. Sample usage

a. Load the packages and create an instance of FastCDA

from fastcda import FastCDA
from dgraph_flex import DgraphFlex
import semopy
import pprint as pp

# create  an instance of FastCDA
fc = FastCDA()

b. Read in the built in sample ema dataset

# read in the sample data set in to a dataframe
df = fc.getEMAData()

# add the lags, with a suffix of '_lag'
df_lag = fc.add_lag_columns(df, lag_stub='_lag')

# standardize the data
df_lag_std = fc.standardize_df_cols(df_lag)

c. Create the prior knowledge content

# Create the knowledge prior content for temporal
# order. The lag variables can only be parents of the non
# lag variables
knowledge = {'addtemporal': {
                            0: ['alcohol_bev_lag',
                                'TIB_lag',
                                'TST_lag',
                                'PANAS_PA_lag',
                                'PANAS_NA_lag',
                                'worry_scale_lag',
                                'PHQ9_lag'],
                            1: ['alcohol_bev',
                                'TIB',
                                'TST',
                                'PANAS_PA',
                                'PANAS_NA',
                                'worry_scale',
                                'PHQ9']
                            }
            }

d. Run the search

# run model with run_model_search
result, graph = fc.run_model_search(df_lag_std, 
                             model = 'gfci',
                             score={'sem_bic': {'penalty_discount': 1.0}},
                             test={"fisher_z": {"alpha": .01}},
                             knowledge=knowledge
                             )

e. Show the causal graph

graph.show_graph()

Example Graph

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastcda-0.1.5.tar.gz (34.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastcda-0.1.5-py3-none-any.whl (34.6 MB view details)

Uploaded Python 3

File details

Details for the file fastcda-0.1.5.tar.gz.

File metadata

  • Download URL: fastcda-0.1.5.tar.gz
  • Upload date:
  • Size: 34.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for fastcda-0.1.5.tar.gz
Algorithm Hash digest
SHA256 395b38a1559800069fdc795da7f5764acb7469f8ae8c95896576431b3548c76b
MD5 95b5a0e52820b0c035aa8cc1f178d097
BLAKE2b-256 cac75929b20cfd62e59430e092103c605336bf4b69f844e7d6fe54d39727e9a2

See more details on using hashes here.

File details

Details for the file fastcda-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: fastcda-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 34.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for fastcda-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1281cd49de207ba5eb7ed23d417242fbbb19e096d335f470515a335b1019fffe
MD5 9acfa73b29390fcbb66d01d20814e083
BLAKE2b-256 0ccda7c015ee0fa59d2dd348c9c9b11f282d0949a3318ec937a36d0dbfe5a872

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page