Skip to main content

A Python package streamlining the causal discovery pipeline for easy use.

Project description

CausalPipe

License: MIT Python Version PyPI Version

CausalPipe is a Python wrapper built on Causal-Learn and Lavaan that offers a predefined and well-formalized process for causal analysis tailored for everyday users. It provides intuitive tools for data preparation, constructing and orienting causal graphs, and visualizing results, supporting both ordinal and continuous variables.

Table of Contents

Features

  • Data Preprocessing: Handle missing values using multiple imputation (MICE), encode categorical variables, standardize features, and perform feature selection based on correlation.
  • Skeleton Identification: Identify the global skeleton of the causal graph using methods like Fast Adjacency Search (FAS) or Bootstrap-based Causal Structure Learning (BCSL).
  • Edge Orientation: Orient edges in the skeleton using algorithms such as Fast Causal Inference (FCI) or Hill Climbing.
  • Causal Effect Estimation: Estimate causal effects using various methods, including Partial Pearson Correlation, Partial Spearman Correlation, Conditional Mutual Information (MI), Kernel Conditional Independence (KCI), Structural Equation Modeling (SEM), and Hill Climbing-based SEM.
  • Visualization: Generate and save visualizations for correlation graphs, skeletons, oriented graphs, and SEM results.
  • Modular Configuration: Easily configure different aspects of the pipeline through dataclasses, allowing for flexible and customizable causal discovery workflows.
  • Integration with R: Utilize R's lavaan package for advanced Structural Equation Modeling directly within Python using rpy2.

Installation

You can install causal-pipe via PyPI using pip:

pip install causal-pipe

Dependencies

CausalPipe relies on several Python and R packages. Ensure that you have the following dependencies installed:

  • Python 3.6 or higher
  • R: Required for Structural Equation Modeling (lavaan) and multiple imputation (mice).
  • Python Packages:
    • numpy>=1.18.0
    • scipy>=1.4.0
    • scikit-learn>=0.22.0
    • causal-learn==0.1.3.8
    • bcsl-python==0.8.0
    • rpy2==3.5.16
    • npeet-plus==0.2.0
    • networkx==3.2.1
    • pandas==2.2.3
    • factor_analyzer==0.5.1

Quick Start

1. Configuration

Begin by defining the configuration for your causal discovery pipeline using the CausalPipeConfig dataclass. This includes specifying variable types, preprocessing parameters, skeleton identification methods, edge orientation methods, and causal effect estimation methods.

from causal_pipe.pipe_config import (
    DataPreprocessingParams,
    CausalPipeConfig,
    VariableTypes,
    FASSkeletonMethod,
    FCIOrientationMethod,
    CausalEffectMethod,
)

# Define preprocessing parameters
preprocessor_params = DataPreprocessingParams(
    cat_to_codes=False,
    standardize=True,
    # keep_only_correlated_with=None,
    # filter_method="mi",
    # filter_threshold=0.1,
    handling_missing="impute",
    imputation_method="mice",
    use_r_mice=True,
    full_obs_cols=None,
)

# Define variable types
variable_types = VariableTypes(
    continuous=["age", "income"],
    ordinal=["education_level"],
    nominal=["gender", "diagnosis_1", "diagnosis_2"],
)

# Initialize the configuration
config = CausalPipeConfig(
    variable_types=variable_types,
    preprocessing_params=preprocessor_params,
    skeleton_method=FASSkeletonMethod(),
    orientation_method=FCIOrientationMethod(),
    causal_effect_methods=[CausalEffectMethod(name="pearson")],
    study_name="causal_analysis",
    output_path="./output",
    show_plots=True,
    verbose=True,
)

2. Initializing CausalPipe

Create an instance of the CausalPipe class by passing the configuration object.

from causal_pipe import CausalPipe

# Initialize the toolkit
causal_pipe = CausalPipe(config)

3. Running the Causal Discovery Pipeline

Use the run_pipeline method to execute the full causal discovery process, including data preprocessing, skeleton identification, edge orientation, and causal effect estimation.

import pandas as pd

# Load your data
data = pd.read_csv("your_data.csv")

# Run the causal discovery pipeline
causal_pipe.run_pipeline(data)

Usage Examples

Example: Running the Full Pipeline

Below is an example demonstrating how to configure and run the full causal discovery pipeline using CausalPipe.

import numpy as np
import pandas as pd 

# Create a dummy DataFrame
np.random.seed(42)
df = pd.DataFrame(
    {
        "age": np.random.randint(20, 70, size=100),
        "income": np.random.normal(50000, 15000, size=100),
        "education_level": np.random.randint(1, 5, size=100),
        "gender": np.random.choice(["Male", "Female"], size=100),
        "diagnosis_1": np.random.randint(0, 2, size=100),
        "diagnosis_2": np.random.randint(0, 2, size=100),
    }
)

# Run the causal discovery pipeline
causal_pipe.run_pipeline(df)

# Access causal effects
print("Causal Effects:", causal_pipe.causal_effects)

Example: Custom Configuration

Customize the skeleton identification and orientation methods to suit your specific analysis needs.

# Define preprocessing parameters
preprocessor_params = DataPreprocessingParams(
    cat_to_codes=True,
    standardize=False,
    keep_only_correlated_with=None,
    filter_method="pearson",
    filter_threshold=0.2,
    handling_missing="drop",
    imputation_method="mice",
    use_r_mice=True,
    full_obs_cols=["age"],
)
 
# Initialize the configuration with BCSL skeleton method and Hill Climbing orientation
config = CausalPipeConfig(
    variable_types=variable_types,
    preprocessing_params=preprocessor_params,
    skeleton_method=BCSLSkeletonMethod(
        num_bootstrap_samples=200,
        multiple_comparison_correction="fdr",
        bootstrap_all_edges=True,
        use_aee_alpha=0.05,
        max_k=3,
    ),
    orientation_method=HillClimbingOrientationMethod(
        max_k=3,
        multiple_comparison_correction="fdr",
    ),
    causal_effect_methods=[
        CausalEffectMethod(name="sem"),
        CausalEffectMethod(name="pearson"),
    ],
    study_name="custom_causal_analysis",
    output_path="./output/custom_analysis",
    show_plots=True,
    verbose=True,
)

# Initialize the toolkit
causal_pipe = CausalPipe(config)

# Load your data
data = pd.read_csv("your_custom_data.csv")

# Run the causal discovery pipeline
causal_pipe.run_pipeline(data)

# Access causal effects
print("Causal Effects:", causal_pipe.causal_effects)

Documentation

Comprehensive documentation is available to help you get started with CausalPipe and explore its full range of functionalities. Visit the CausalPipe Documentation for detailed guides, API references, and tutorials.

Contributing

Contributions are welcome! If you'd like to contribute to CausalPipe, please follow these steps:

  1. Fork the Repository: Click the "Fork" button at the top-right corner of the repository page.
  2. Clone Your Fork:
    git clone https://github.com/your-username/causal-pipe.git
    
  3. Create a Branch:
    git checkout -b feature/your-feature-name
    
  4. Commit Your Changes:
    git commit -m "Add your detailed description here"
    
  5. Push to Your Fork:
    git push origin feature/your-feature-name
    
  6. Open a Pull Request: Navigate to the original repository and click "Compare & pull request."

Please ensure that your code adheres to the project's coding standards and includes appropriate tests.

License

This project is licensed under the MIT License.

Contact

For any questions or suggestions, feel free to reach out:


Additional Notes

  • Visualization Outputs: Ensure that the output directory specified in the configuration exists or is created by CausalPipe. The toolkit will save visualizations like correlation graphs, skeletons, oriented graphs, and SEM results in the specified output_path.

  • R Package Dependencies: Since CausalPipe integrates with R's lavaan and mice packages, make sure that R is installed on your system and that these packages are accessible. The toolkit attempts to install missing R packages automatically, but you may need to configure R's library paths or permissions accordingly.

  • Error Handling: The toolkit includes error handling to catch and report issues during data preprocessing, model fitting, and causal effect estimation. Pay attention to console outputs for any warnings or error messages that may require your attention.

  • Extensibility: CausalPipe is designed to be modular. You can extend its functionalities by adding new methods for skeleton identification, edge orientation, or causal effect estimation by creating new dataclasses and integrating them into the pipeline.

  • Performance Considerations: Some methods, especially those involving multiple imputation or complex SEM models, can be computationally intensive. Ensure that your system has sufficient resources, and consider optimizing parameters like num_bootstrap_samples or max_iter based on your dataset's size and complexity.

By following this guide and leveraging the provided examples, you can effectively utilize CausalPipe to perform sophisticated causal discovery and analysis on your datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_pipe-0.9.6.tar.gz (77.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causal_pipe-0.9.6-py3-none-any.whl (84.3 kB view details)

Uploaded Python 3

File details

Details for the file causal_pipe-0.9.6.tar.gz.

File metadata

  • Download URL: causal_pipe-0.9.6.tar.gz
  • Upload date:
  • Size: 77.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for causal_pipe-0.9.6.tar.gz
Algorithm Hash digest
SHA256 333b4d7192a53b3ce5b5c6759e9aeeec78939dee2fe68f67c01becfdc47323f5
MD5 8727004cd0a097272abf33eda4083f53
BLAKE2b-256 1f241861efe464a8b3b97fad5d9720dc96f9b45ac6a8afdfce528d75876f345e

See more details on using hashes here.

File details

Details for the file causal_pipe-0.9.6-py3-none-any.whl.

File metadata

  • Download URL: causal_pipe-0.9.6-py3-none-any.whl
  • Upload date:
  • Size: 84.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for causal_pipe-0.9.6-py3-none-any.whl
Algorithm Hash digest
SHA256 887b61c3dc532cd1727e3ea4764d021c1f161c682baaf69ab9d9ca26628d617c
MD5 32ad47ea87f2158e3294153fee485191
BLAKE2b-256 6dffe6ab68089b9903e9dfd0308e7a9d80946e630b4b3acd7587ea2ac510331b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page