A Scalable, Reproducible Benchmarking Suite for Causal Machine Learning

These details have not been verified by PyPI

Project description

CausalProfiler

A Benchmark Generator for Causal Machine Learning

CausalProfiler is a synthetic benchmark generator for evaluating Causal ML methods under diverse conditions and assumptions. It allows rigorous, reproducible comparisons by sampling Structural Causal Models, data, and causal queries from user-defined Spaces of Interest, with built-in coverage guarantees.

Installation

For Users (Recommended)

Install directly from PyPI:

pip install causal_profiler

For Developers

We recommend using uv for development:

uv venv --python 3.12
source .venv/bin/activate
uv pip install -e ".[dev]"

This installs the package in editable mode along with all development dependencies (pytest, tox, pandas, scipy, statsmodels, networkx).

Alternatively, you can use pip:

pip install -e ".[dev]"

Additional Dependencies for Examples

If you plan to use the provided example evaluate.py file, install the examples dependencies:

uv pip install -e ".[examples]"
# or with pip:
pip install -e ".[examples]"

This includes: pyyaml, pandas, matplotlib, seaborn.

Project Configuration

The pyproject.toml file defines the project dependencies and optional dependency groups:

Main dependencies: numpy, torch (required for core functionality)
Dev dependencies ([dev]): pytest, tox, pandas, scipy, statsmodels, networkx (for testing and development)
Examples dependencies ([examples]): pyyaml, pandas, matplotlib, seaborn (for running evaluation examples)

You can install specific dependency groups using:

uv pip install -e ".[dev,examples]"  # Install both dev and examples dependencies

Usage Example:

To help you get started, we provide a full example in examples/evaluation/:

spaces.yaml - Configuration file defining the spaces of interest to evaluate
evaluate.py - Script to run evaluations for a specific method
summarize_results.py - Script to analyze and visualize results from multiple methods

In this evaluate.py example we demonstrate how to:

Load benchmark settings from a config file
Set random seeds for reproducibility
Run your causal method on multiple synthetic structural causal models (SCMs)
Measure and log error, failure rate, and runtime
Save results for later analysis
Analyze the results

We've added a 🔧 EDIT note on everything one needs to change to use the example with their own method.

1. Replace dummy `MyCausalMethod`

In evaluate.py, replace from my_causal_method import MyCausalMethod with your own model. Please do check the 🔧 EDIT notes in evaluate.py to make sure your method is compatible.

2. Configure Your Space of Interests

In examples/evaluation/spaces.yaml, you can define multiple test spaces with different characteristics:

spaces:
  - name: linear_low_noise
    number_of_nodes: [5, 10]
    mechanism_family: LINEAR
    noise_distribution: GAUSSIAN
    noise_args: [0, 0.5]
    ...
    seed_list: [42, 43, 44]

Each space defines parameters for generating causal graphs, data, and queries. The framework properly handles ranges specified as lists (e.g., [5, 8]) by converting them to tuples.

3. Run the Evaluation

Once configured, run the evaluation script:

python evaluate.py --config spaces.yaml --output_dir results/method1

--config: Path to the configuration file
--output_dir: Directory to save results
--num_runs: Number of runs per seed (different datasets)
--num_tries: Number of tries per run (repeated estimations)
--wandb: Enable logging to Weights & Biases (optional)

This will:

Log progress to the terminal and log.txt
Save individual run results as JSON
Store a full summary.json in the output directory

The evaluation structure uses a nested loop approach:

for each seed:
  for each run:
    Generate a new dataset and queries
    for each try:
      Estimate queries
      Calculate error
    Calculate average error for the run

This structure captures both:

Variability between different causal graphs (runs)
Stability of method performance for the same graph (tries)

4. Analyze the Results

To analyze and compare your results, use the summary script:

python summarize_results.py results/method1 results/method2 --output_dir analysis/

This will:

Load all result files from the specified directories
Compute statistics at different levels (try, run, overall)
Generate CSV summaries and visualizations

Output Files

summary.csv: Overall method performance by space
run_summary.csv: Run-level statistics
tries_data.csv: All individual try data
Visualization plots:
- error_boxplot.png: Error distribution by method and space
- runtime_boxplot.png: Runtime distribution by space
- run_variability.png: Error variability across runs

File Structure Overview

evaluate.py                 # Main evaluation script
summarize_results.py        # Summary + plotting script
spaces.yaml                 # Config file for SCM/query spaces
results/
  method1/                  # Output directory for method 1
    result_*.json
    log.txt
    summary.json
analysis/
  summary.csv
  error_boxplot.png
  runtime_boxplot.png

Testing

The tests directory mirrors the structure of src and hosts all tests. To run tests:

pytest -s --ignore=tests/test_scm_sampling_performance.py # Run all tests
pytest tests/test_space_of_interest.py # Runs all tests in test_space_of_interest.py
pytest tests/test_space_of_interest.py::TestSpaceOfInterest::test_number_of_data_points # Runs a specific test function

Running Tests Across Multiple Python Versions

We use tox (included in dev dependencies) to test across multiple Python versions (3.10-3.14). To run tox:

# Run tests on all supported Python versions
tox

# Run all functionality tests (excluding the performance test)
tox -e py312  # or any specific Python version: py310, py311, py312, py313, py314

# Run all tests including benchmarking
tox -e slow

Note: You'll need the respective Python versions installed on your system for tox to work.

Verification experiments

Validates that our implementation correctly adheres to Pearl's Causal Hierarchy. Each verification experiment runs across a --parameter-grid and reports detailed results (the tables Appendix J of the paper). Note: Install dev dependencies (uv pip install -e ".[dev]") before running verification experiments.

Level 1: Associations (Statistics)

Verifies that d-separations in the graph imply conditional independence.

python verification/main.py \
    --parameter-grid test8 \
    --verifications-to-run l1_data_ci \
    --output-dir verification/L1

Level 2: Interventions (Do-calculus)

Verifies compliance with Pearl's three rules of do-calculus.

python verification/main.py \
    --parameter-grid test7 \
    --verifications-to-run l2_do_calculus \
    --output-dir verification/L2

Level 3: Counterfactuals (Structural)

Verifies compliance with the three structural counterfactual axioms.

python verification/main.py \
    --parameter-grid test5 \
    --verifications-to-run l3_structural_counterfactual_axioms \
    --output-dir verification/L3

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Jan 5, 2026

0.1.0.post1

Dec 2, 2025

This version

0.1.0

Nov 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_profiler-0.1.0.tar.gz (67.3 kB view details)

Uploaded Nov 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causal_profiler-0.1.0-py3-none-any.whl (43.5 kB view details)

Uploaded Nov 28, 2025 Python 3

File details

Details for the file causal_profiler-0.1.0.tar.gz.

File metadata

Download URL: causal_profiler-0.1.0.tar.gz
Upload date: Nov 28, 2025
Size: 67.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for causal_profiler-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`41e4796e405f97f37a1975d40cfd4628865160fd0e1b46ff701899e51e1451e5`
MD5	`b7fcecbb0d8d7306c74660796dc6af09`
BLAKE2b-256	`d07e6a1efd08cd425ebb262b2f1e15dff5db17033dac1ecdc8d612dc024315ae`

See more details on using hashes here.

File details

Details for the file causal_profiler-0.1.0-py3-none-any.whl.

File metadata

Download URL: causal_profiler-0.1.0-py3-none-any.whl
Upload date: Nov 28, 2025
Size: 43.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for causal_profiler-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a997793015bf61c1cc53198ea7ed2ad8addddf448c1f507ab83f7fbe39c0c822`
MD5	`72b7e87057e740f2a697aaf7eddcf848`
BLAKE2b-256	`fe605f292d81c841421a8fcf5cb2b013e6f7481b4cdbc3ddf8cb706d0c0fc6ac`

See more details on using hashes here.

causal-profiler 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CausalProfiler

A Benchmark Generator for Causal Machine Learning

Installation

For Users (Recommended)

For Developers

Additional Dependencies for Examples

Project Configuration

Usage Example:

1. Replace dummy MyCausalMethod

2. Configure Your Space of Interests

3. Run the Evaluation

4. Analyze the Results

Output Files

File Structure Overview

Testing

Running Tests Across Multiple Python Versions

Verification experiments

Level 1: Associations (Statistics)

Level 2: Interventions (Do-calculus)

Level 3: Counterfactuals (Structural)

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Replace dummy `MyCausalMethod`