A Synthetic Benchmark Generator for Causal Machine Learning
Project description
CausalProfiler
CausalProfiler is a synthetic benchmark generator for evaluating Causal ML methods under diverse conditions and assumptions. It allows rigorous, reproducible comparisons by sampling Structural Causal Models, data, and causal queries from user-defined Spaces of Interest, with built-in coverage guarantees.
Installation
For Users (Recommended)
Install directly from PyPI:
pip install causal_profiler
For Developers
We recommend using uv for development:
uv venv --python 3.12
source .venv/bin/activate
uv pip install -e ".[dev]"
This installs the package in editable mode along with all development dependencies (pytest, tox, pandas, scipy, statsmodels, networkx).
Alternatively, you can use pip:
pip install -e ".[dev]"
Additional Dependencies for Examples
If you plan to use the provided example evaluate.py file, install the examples dependencies:
uv pip install -e ".[examples]"
# or with pip:
pip install -e ".[examples]"
This includes: pyyaml, pandas, matplotlib, seaborn.
Project Configuration
The pyproject.toml file defines the project dependencies and optional dependency groups:
- Main dependencies:
numpy,torch(required for core functionality) - Dev dependencies (
[dev]):pytest,tox,pandas,scipy,statsmodels,networkx(for testing and development) - Examples dependencies (
[examples]):pyyaml,pandas,matplotlib,seaborn(for running evaluation examples)
You can install specific dependency groups using:
uv pip install -e ".[dev,examples]" # Install both dev and examples dependencies
Usage Example:
To help you get started, we provide a full example in examples/evaluation/:
spaces.yaml- Configuration file defining the spaces of interest to evaluateevaluate.py- Script to run evaluations for a specific methodsummarize_results.py- Script to analyze and visualize results from multiple methods
In this evaluate.py example we demonstrate how to:
- Load benchmark settings from a config file
- Set random seeds for reproducibility
- Run your causal method on multiple synthetic structural causal models (SCMs)
- Measure and log error, failure rate, and runtime
- Save results for later analysis
- Analyze the results
We've added a 🔧 EDIT note on everything one needs to change to use the example with their own method.
1. Replace dummy MyCausalMethod
In evaluate.py, replace from my_causal_method import MyCausalMethod with your own model. Please do check the 🔧 EDIT notes in evaluate.py to make sure your method is compatible.
2. Configure Your Space of Interests
In examples/evaluation/spaces.yaml, you can define multiple test spaces with different characteristics:
spaces:
- name: linear_low_noise
number_of_nodes: [5, 10]
mechanism_family: LINEAR
noise_distribution: GAUSSIAN
noise_args: [0, 0.5]
...
seed_list: [42, 43, 44]
Each space defines parameters for generating causal graphs, data, and queries. The framework properly handles ranges specified as lists (e.g., [5, 8]) by converting them to tuples.
3. Run the Evaluation
Once configured, run the evaluation script:
python evaluate.py --config spaces.yaml --output_dir results/method1
--config: Path to the configuration file--output_dir: Directory to save results--num_runs: Number of runs per seed (different datasets)--num_tries: Number of tries per run (repeated estimations)--wandb: Enable logging to Weights & Biases (optional)
This will:
- Log progress to the terminal and
log.txt - Save individual run results as JSON
- Store a full
summary.jsonin the output directory
The evaluation structure uses a nested loop approach:
for each seed:
for each run:
Generate a new dataset and queries
for each try:
Estimate queries
Calculate error
Calculate average error for the run
This structure captures both:
- Variability between different causal graphs (runs)
- Stability of method performance for the same graph (tries)
4. Analyze the Results
To analyze and compare your results, use the summary script:
python summarize_results.py results/method1 results/method2 --output_dir analysis/
This will:
- Load all result files from the specified directories
- Compute statistics at different levels (try, run, overall)
- Generate CSV summaries and visualizations
Output Files
summary.csv: Overall method performance by spacerun_summary.csv: Run-level statisticstries_data.csv: All individual try data- Visualization plots:
error_boxplot.png: Error distribution by method and spaceruntime_boxplot.png: Runtime distribution by spacerun_variability.png: Error variability across runs
File Structure Overview
evaluate.py # Main evaluation script
summarize_results.py # Summary + plotting script
spaces.yaml # Config file for SCM/query spaces
results/
method1/ # Output directory for method 1
result_*.json
log.txt
summary.json
analysis/
summary.csv
error_boxplot.png
runtime_boxplot.png
Testing
The tests directory mirrors the structure of src and hosts all tests. To run tests:
pytest -s --ignore=tests/test_scm_sampling_performance.py # Run all tests
pytest tests/test_space_of_interest.py # Runs all tests in test_space_of_interest.py
pytest tests/test_space_of_interest.py::TestSpaceOfInterest::test_number_of_data_points # Runs a specific test function
Running Tests Across Multiple Python Versions
We use tox (included in dev dependencies) to test across multiple Python versions (3.10-3.14). To run tox:
# Run tests on all supported Python versions
tox
# Run all functionality tests (excluding the performance test)
tox -e py312 # or any specific Python version: py310, py311, py312, py313, py314
# Run all tests including benchmarking
tox -e slow
Note: You'll need the respective Python versions installed on your system for tox to work.
Verification experiments
Validates that our implementation correctly adheres to Pearl's Causal Hierarchy.
Each verification experiment runs across a --parameter-grid and reports detailed results (the tables Appendix J of the paper).
Note: Install dev dependencies (uv pip install -e ".[dev]") before running verification experiments.
Level 1: Associations (Statistics)
Verifies that d-separations in the graph imply conditional independence.
python verification/main.py \
--parameter-grid test8 \
--verifications-to-run l1_data_ci \
--output-dir verification/L1
Level 2: Interventions (Do-calculus)
Verifies compliance with Pearl's three rules of do-calculus.
python verification/main.py \
--parameter-grid test7 \
--verifications-to-run l2_do_calculus \
--output-dir verification/L2
Level 3: Counterfactuals (Structural)
Verifies compliance with the three structural counterfactual axioms.
python verification/main.py \
--parameter-grid test5 \
--verifications-to-run l3_structural_counterfactual_axioms \
--output-dir verification/L3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causal_profiler-0.1.1.tar.gz.
File metadata
- Download URL: causal_profiler-0.1.1.tar.gz
- Upload date:
- Size: 71.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56df40c2bb696cc51257f868445e372efa55f2903412716aff8c7c1ffd2c53f3
|
|
| MD5 |
23dc2241bcc68522a0e122bbb1845a37
|
|
| BLAKE2b-256 |
ef6ed76c5324b4ed30a4751ebc06032f097573094cd10ccf8c80fcd9a202e8d9
|
File details
Details for the file causal_profiler-0.1.1-py3-none-any.whl.
File metadata
- Download URL: causal_profiler-0.1.1-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d03930f1d5e1bed0bf892901811c38b8e181713f50cbfe26c45b308dbd12f72c
|
|
| MD5 |
302f14ae736cfa2ac7d5382900048727
|
|
| BLAKE2b-256 |
453580c38e3d2cdca7c219de1907492a539b2c658ff67e6a887ce9decb4c29a7
|