Skip to main content

A Python package streamlining the causal discovery pipeline for easy use.

Project description

CausalPipe

License: MIT Python Version PyPI Version

CausalPipe is a Python wrapper built on Causal-Learn and Lavaan that offers a predefined and well-formalized process for causal analysis tailored for everyday users. It provides intuitive tools for data preparation, constructing and orienting causal graphs, and visualizing results, supporting both ordinal and continuous variables.

Table of Contents

Features

  • Data Preprocessing: Handle missing values using multiple imputation (MICE), encode categorical variables, standardize features, and perform feature selection based on correlation.
  • Skeleton Identification: Identify the global skeleton of the causal graph using methods like Fast Adjacency Search (FAS) or Bootstrap-based Causal Structure Learning (BCSL).
  • Edge Orientation: Orient edges in the skeleton using algorithms such as Fast Causal Inference (FCI) or Hill Climbing.
  • Causal Effect Estimation: Estimate causal effects using methods such as Partial Pearson/Spearman correlations, Conditional Mutual Information (MI), Kernel Conditional Independence (KCI), Structural Equation Modeling (SEM), hill-climbing SEM, and PySR-based symbolic regression that learns nonlinear structural equations and can score cyclic models via pseudo-likelihood or MMD².
  • Visualization: Generate and save visualizations for correlation graphs, skeletons, oriented graphs, and SEM results.
  • Modular Configuration: Easily configure different aspects of the pipeline through dataclasses, allowing for flexible and customizable causal discovery workflows.
  • Integration with R: Utilize R's lavaan package for advanced Structural Equation Modeling directly within Python using rpy2.

Installation

You can install causal-pipe via PyPI using pip:

pip install causal-pipe

Dependencies

CausalPipe relies on several Python and R packages. Ensure that you have the following dependencies installed:

  • Python 3.6 or higher
  • R: Required for Structural Equation Modeling (lavaan) and multiple imputation (mice).
  • Python Packages:
    • numpy>=1.18.0
    • scipy>=1.4.0
    • scikit-learn>=0.22.0
    • causal-learn==0.1.3.8
    • bcsl-python==0.8.0
    • rpy2==3.5.16
    • npeet-plus==0.2.0
    • networkx==3.2.1
    • pandas==2.2.3
    • factor_analyzer==0.5.1

Quick Start

1. Configuration

Begin by defining the configuration for your causal discovery pipeline using the CausalPipeConfig dataclass. This includes specifying variable types, preprocessing parameters, skeleton identification methods, edge orientation methods, and causal effect estimation methods.

from causal_pipe.pipe_config import (
    DataPreprocessingParams,
    CausalPipeConfig,
    VariableTypes,
    FASSkeletonMethod,
    FCIOrientationMethod,
    PearsonCausalEffectMethod,
    SEMCausalEffectMethod,
)

# Define preprocessing parameters
preprocessor_params = DataPreprocessingParams(
    cat_to_codes=False,
    standardize=True,
    # keep_only_correlated_with=None,
    # filter_method="mi",
    # filter_threshold=0.1,
    handling_missing="impute",
    imputation_method="mice",
    use_r_mice=True,
    full_obs_cols=None,
)

# Define variable types
variable_types = VariableTypes(
    continuous=["age", "income"],
    ordinal=["education_level"],
    nominal=["gender", "diagnosis_1", "diagnosis_2"],
)

# Initialize the configuration
config = CausalPipeConfig(
    variable_types=variable_types,
    preprocessing_params=preprocessor_params,
    skeleton_method=FASSkeletonMethod(),
    orientation_method=FCIOrientationMethod(),
    causal_effect_methods=[PearsonCausalEffectMethod()],
    study_name="causal_analysis",
    output_path="./output",
    show_plots=True,
    verbose=True,
)

CausalPipe exposes several configuration dataclasses that can be combined as needed:

  • Variables & preprocessing: VariableTypes, DataPreprocessingParams
  • Skeleton methods: FASSkeletonMethod, BCSLSkeletonMethod
  • Orientation methods: FCIOrientationMethod, HillClimbingOrientationMethod
  • Causal effect methods: PearsonCausalEffectMethod, SpearmanCausalEffectMethod, MICausalEffectMethod, KCICausalEffectMethod, SEMCausalEffectMethod, SEMClimbingCausalEffectMethod, PYSRCausalEffectMethod, PYSRCausalEffectMethodHillClimbing

For a complete list and detailed field descriptions, see the API Reference.

2. Initializing CausalPipe

Create an instance of the CausalPipe class by passing the configuration object.

from causal_pipe import CausalPipe

# Initialize the toolkit
causal_pipe = CausalPipe(config)

3. Running the Causal Discovery Pipeline

Use the run_pipeline method to execute the full causal discovery process, including data preprocessing, skeleton identification, edge orientation, and causal effect estimation.

import pandas as pd

# Load your data
data = pd.read_csv("your_data.csv")

# Run the causal discovery pipeline
causal_pipe.run_pipeline(data)

Usage Examples

Example: Running the Full Pipeline

Below is an example demonstrating how to configure and run the full causal discovery pipeline using CausalPipe.

import numpy as np
import pandas as pd 

# Create a dummy DataFrame
np.random.seed(42)
df = pd.DataFrame(
    {
        "age": np.random.randint(20, 70, size=100),
        "income": np.random.normal(50000, 15000, size=100),
        "education_level": np.random.randint(1, 5, size=100),
        "gender": np.random.choice(["Male", "Female"], size=100),
        "diagnosis_1": np.random.randint(0, 2, size=100),
        "diagnosis_2": np.random.randint(0, 2, size=100),
    }
)

# Run the causal discovery pipeline
causal_pipe.run_pipeline(df)

# Access causal effects
print("Causal Effects:", causal_pipe.causal_effects)

Example: Custom Configuration

Customize the skeleton identification and orientation methods to suit your specific analysis needs.

# Define preprocessing parameters
preprocessor_params = DataPreprocessingParams(
    cat_to_codes=True,
    standardize=False,
    keep_only_correlated_with=None,
    filter_method="pearson",
    filter_threshold=0.2,
    handling_missing="drop",
    imputation_method="mice",
    use_r_mice=True,
    full_obs_cols=["age"],
)
 
# Initialize the configuration with BCSL skeleton method and Hill Climbing orientation
config = CausalPipeConfig(
    variable_types=variable_types,
    preprocessing_params=preprocessor_params,
    skeleton_method=BCSLSkeletonMethod(
        num_bootstrap_samples=200,
        multiple_comparison_correction="fdr",
        bootstrap_all_edges=True,
        use_aee_alpha=0.05,
        max_k=3,
    ),
    orientation_method=HillClimbingOrientationMethod(
        max_k=3,
        multiple_comparison_correction="fdr",
    ),
    causal_effect_methods=[
        SEMCausalEffectMethod(),
        PearsonCausalEffectMethod(),
    ],
    study_name="custom_causal_analysis",
    output_path="./output/custom_analysis",
    show_plots=True,
    verbose=True,
)

# Initialize the toolkit
causal_pipe = CausalPipe(config)

# Load your data
data = pd.read_csv("your_custom_data.csv")

# Run the causal discovery pipeline
causal_pipe.run_pipeline(data)

# Access causal effects
print("Causal Effects:", causal_pipe.causal_effects)

Example: Symbolic Regression with PySR

Learn nonlinear structural equations and optionally orient undirected edges using PySR's symbolic regression engine:

from causal_pipe.pipe_config import PYSRCausalEffectMethod

config.causal_effect_methods = [
    PYSRCausalEffectMethod(hc_orient_undirected_edges=True)
]

causal_pipe = CausalPipe(config)
results = causal_pipe.run_pipeline(data)
print(results["pysr"]["structural_equations"])

See examples/pysr_example.py for a complete walkthrough.

Documentation

Comprehensive documentation is available to help you get started with CausalPipe and explore its full range of functionalities. Visit the CausalPipe Documentation for tutorials and guides, and consult the API Reference for detailed class descriptions.

Contributing

Contributions are welcome! If you'd like to contribute to CausalPipe, please follow these steps:

  1. Fork the Repository: Click the "Fork" button at the top-right corner of the repository page.
  2. Clone Your Fork:
    git clone https://github.com/your-username/causal-pipe.git
    
  3. Create a Branch:
    git checkout -b feature/your-feature-name
    
  4. Commit Your Changes:
    git commit -m "Add your detailed description here"
    
  5. Push to Your Fork:
    git push origin feature/your-feature-name
    
  6. Open a Pull Request: Navigate to the original repository and click "Compare & pull request."

Please ensure that your code adheres to the project's coding standards and includes appropriate tests.

License

This project is licensed under the MIT License.

Contact

For any questions or suggestions, feel free to reach out:


Additional Notes

  • Visualization Outputs: Ensure that the output directory specified in the configuration exists or is created by CausalPipe. The toolkit will save visualizations like correlation graphs, skeletons, oriented graphs, and SEM results in the specified output_path.

  • R Package Dependencies: Since CausalPipe integrates with R's lavaan and mice packages, make sure that R is installed on your system and that these packages are accessible. The toolkit attempts to install missing R packages automatically, but you may need to configure R's library paths or permissions accordingly.

  • Error Handling: The toolkit includes error handling to catch and report issues during data preprocessing, model fitting, and causal effect estimation. Pay attention to console outputs for any warnings or error messages that may require your attention.

  • Extensibility: CausalPipe is designed to be modular. You can extend its functionalities by adding new methods for skeleton identification, edge orientation, or causal effect estimation by creating new dataclasses and integrating them into the pipeline.

  • Performance Considerations: Some methods, especially those involving multiple imputation or complex SEM models, can be computationally intensive. Ensure that your system has sufficient resources, and consider optimizing parameters like num_bootstrap_samples or max_iter based on your dataset's size and complexity.

By following this guide and leveraging the provided examples, you can effectively utilize CausalPipe to perform sophisticated causal discovery and analysis on your datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_pipe-0.9.14.tar.gz (103.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causal_pipe-0.9.14-py3-none-any.whl (110.3 kB view details)

Uploaded Python 3

File details

Details for the file causal_pipe-0.9.14.tar.gz.

File metadata

  • Download URL: causal_pipe-0.9.14.tar.gz
  • Upload date:
  • Size: 103.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for causal_pipe-0.9.14.tar.gz
Algorithm Hash digest
SHA256 539753e8a64d9bcc5cff1a587dd8103c79dfcea49cdfd2cc3d1348bacc46198a
MD5 78cf0e1e5fba7b3f1f0e08bfdfd2df82
BLAKE2b-256 9fbdb26e32803099e1c1e7038311be829a3354418f44fb7d58be7ce0c463f930

See more details on using hashes here.

File details

Details for the file causal_pipe-0.9.14-py3-none-any.whl.

File metadata

  • Download URL: causal_pipe-0.9.14-py3-none-any.whl
  • Upload date:
  • Size: 110.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for causal_pipe-0.9.14-py3-none-any.whl
Algorithm Hash digest
SHA256 d76598ff7f4533fd62983c2f201e24cc8b04e374021d4643f7029d3e6a9fd6e1
MD5 61982f276d4aa0428e1f77f4f91ae9ab
BLAKE2b-256 0282a37f3c44ef867707b2084c3d50789c4635b948d2d3169add95b2e141890f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page