Skip to main content

A Synthetic Benchmark Generator for Causal Machine Learning

Project description

CausalProfiler

Paper: arXiv

CausalProfiler is a synthetic benchmark generator for evaluating Causal ML methods under diverse conditions and assumptions. It allows rigorous, reproducible comparisons by sampling Structural Causal Models, data, and causal queries from user-defined Spaces of Interest, with built-in coverage guarantees.

Installation

For Users (Recommended)

Install directly from PyPI:

pip install causal_profiler

For Developers

We recommend using uv for development:

uv venv --python 3.12
source .venv/bin/activate
uv pip install -e ".[dev]"

This installs the package in editable mode along with all development dependencies (pytest, tox, pandas, scipy, statsmodels, networkx).

Alternatively, you can use pip:

pip install -e ".[dev]"

Additional Dependencies for Examples

If you plan to use the provided example evaluate.py file, install the examples dependencies:

uv pip install -e ".[examples]"
# or with pip:
pip install -e ".[examples]"

This includes: pyyaml, pandas, matplotlib, seaborn.

Project Configuration

The pyproject.toml file defines the project dependencies and optional dependency groups:

  • Main dependencies: numpy, torch (required for core functionality)
  • Dev dependencies ([dev]): pytest, tox, pandas, scipy, statsmodels, networkx (for testing and development)
  • Examples dependencies ([examples]): pyyaml, pandas, matplotlib, seaborn (for running evaluation examples)

You can install specific dependency groups using:

uv pip install -e ".[dev,examples]"  # Install both dev and examples dependencies

Usage Example:

To help you get started, we provide a full example in examples/evaluation/:

  1. spaces.yaml - Configuration file defining the spaces of interest to evaluate
  2. evaluate.py - Script to run evaluations for a specific method
  3. summarize_results.py - Script to analyze and visualize results from multiple methods

In this evaluate.py example we demonstrate how to:

  • Load benchmark settings from a config file
  • Set random seeds for reproducibility
  • Run your causal method on multiple synthetic structural causal models (SCMs)
  • Measure and log error, failure rate, and runtime
  • Save results for later analysis
  • Analyze the results

We've added a 🔧 EDIT note on everything one needs to change to use the example with their own method.

1. Replace dummy MyCausalMethod

In evaluate.py, replace from my_causal_method import MyCausalMethod with your own model. Please do check the 🔧 EDIT notes in evaluate.py to make sure your method is compatible.

2. Configure Your Space of Interests

In examples/evaluation/spaces.yaml, you can define multiple test spaces with different characteristics:

spaces:
  - name: linear_low_noise
    number_of_nodes: [5, 10]
    mechanism_family: LINEAR
    noise_distribution: GAUSSIAN
    noise_args: [0, 0.5]
    ...
    seed_list: [42, 43, 44]

Each space defines parameters for generating causal graphs, data, and queries. The framework properly handles ranges specified as lists (e.g., [5, 8]) by converting them to tuples.

3. Run the Evaluation

Once configured, run the evaluation script:

python evaluate.py --config spaces.yaml --output_dir results/method1
  • --config: Path to the configuration file
  • --output_dir: Directory to save results
  • --num_runs: Number of runs per seed (different datasets)
  • --num_tries: Number of tries per run (repeated estimations)
  • --wandb: Enable logging to Weights & Biases (optional)

This will:

  • Log progress to the terminal and log.txt
  • Save individual run results as JSON
  • Store a full summary.json in the output directory

The evaluation structure uses a nested loop approach:

for each seed:
  for each run:
    Generate a new dataset and queries
    for each try:
      Estimate queries
      Calculate error
    Calculate average error for the run

This structure captures both:

  • Variability between different causal graphs (runs)
  • Stability of method performance for the same graph (tries)

4. Analyze the Results

To analyze and compare your results, use the summary script:

python summarize_results.py results/method1 results/method2 --output_dir analysis/

This will:

  1. Load all result files from the specified directories
  2. Compute statistics at different levels (try, run, overall)
  3. Generate CSV summaries and visualizations

Output Files

  • summary.csv: Overall method performance by space
  • run_summary.csv: Run-level statistics
  • tries_data.csv: All individual try data
  • Visualization plots:
    • error_boxplot.png: Error distribution by method and space
    • runtime_boxplot.png: Runtime distribution by space
    • run_variability.png: Error variability across runs

File Structure Overview

evaluate.py                 # Main evaluation script
summarize_results.py        # Summary + plotting script
spaces.yaml                 # Config file for SCM/query spaces
results/
  method1/                  # Output directory for method 1
    result_*.json
    log.txt
    summary.json
analysis/
  summary.csv
  error_boxplot.png
  runtime_boxplot.png

Testing

The tests directory mirrors the structure of src and hosts all tests. To run tests:

pytest -s --ignore=tests/test_scm_sampling_performance.py # Run all tests
pytest tests/test_space_of_interest.py # Runs all tests in test_space_of_interest.py
pytest tests/test_space_of_interest.py::TestSpaceOfInterest::test_number_of_data_points # Runs a specific test function

Running Tests Across Multiple Python Versions

We use tox (included in dev dependencies) to test across multiple Python versions (3.10-3.14). To run tox:

# Run tests on all supported Python versions
tox

# Run all functionality tests (excluding the performance test)
tox -e py312  # or any specific Python version: py310, py311, py312, py313, py314

# Run all tests including benchmarking
tox -e slow

Note: You'll need the respective Python versions installed on your system for tox to work.

Verification experiments

Validates that our implementation correctly adheres to Pearl's Causal Hierarchy. Each verification experiment runs across a --parameter-grid and reports detailed results (the tables Appendix J of the paper). Note: Install dev dependencies (uv pip install -e ".[dev]") before running verification experiments.

Level 1: Associations (Statistics)

Verifies that d-separations in the graph imply conditional independence.

python verification/main.py \
    --parameter-grid test8 \
    --verifications-to-run l1_data_ci \
    --output-dir verification/L1

Level 2: Interventions (Do-calculus)

Verifies compliance with Pearl's three rules of do-calculus.

python verification/main.py \
    --parameter-grid test7 \
    --verifications-to-run l2_do_calculus \
    --output-dir verification/L2

Level 3: Counterfactuals (Structural)

Verifies compliance with the three structural counterfactual axioms.

python verification/main.py \
    --parameter-grid test5 \
    --verifications-to-run l3_structural_counterfactual_axioms \
    --output-dir verification/L3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_profiler-0.1.0.post1.tar.gz (67.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causal_profiler-0.1.0.post1-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file causal_profiler-0.1.0.post1.tar.gz.

File metadata

  • Download URL: causal_profiler-0.1.0.post1.tar.gz
  • Upload date:
  • Size: 67.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for causal_profiler-0.1.0.post1.tar.gz
Algorithm Hash digest
SHA256 35663432154aa1e7d4b887e07a4189a99b0b2c1a9b3e27486eb1d098699d47fe
MD5 076e22ede9964b234e7b6860fadfb8fb
BLAKE2b-256 6cf7dcca12479e212168151f6e641be01a4b6dea1fbc7be572b2b712e5ba4023

See more details on using hashes here.

File details

Details for the file causal_profiler-0.1.0.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for causal_profiler-0.1.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 05551c7a29d024c5f9f9355253c889996590ee54c9eb9b32a6c2c62ded768a32
MD5 91db4003415eb12bb56cd1812d2700e1
BLAKE2b-256 22ad59c72c8caa380c2a3e6151a96646f920566c9d845eee07f1c24eaa2bf697

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page