Configurable causal DAG simulator for synthetic mixed-type data and CI test benchmarks
Project description
dagsampler
Configurable causal DAG simulator for synthetic mixed-type data and CI test benchmarks.
What it provides
CausalDataGeneratorclass for configurable simulation- Support for
customandrandomDAGs - Mixed continuous/binary/categorical nodes (configurable categorical cardinality)
- Structural forms:
linear,polynomial,interaction,sigmoid,cos,sin,stratum_means - Optional element-wise
post_transform(tanh,sin,cos,exp_neg_abs,sqrt_abs,relu,sign) - Cross-type mechanisms:
- continuous -> categorical (
categorical_model.name = "threshold") - categorical -> continuous (
functional_form.name = "stratum_means", including mixed-parent cases withmetric_weights)
- continuous -> categorical (
- Noise models:
- additive (
gaussian,student_t,gamma,exponential,laplace,cauchy,uniform) - multiplicative (
gaussian,student_t,gamma,exponential) - heteroskedastic (
abs_first_parent,abs_parent_plus_const,mean_abs_plus_const)
- additive (
- Random weight sampling controls (including exclusion band around zero)
force_uniform_marginalsfor balanced exogenous binary / categorical draws- Template helpers (
chain_config,fork_config,collider_config,independence_config) - Reproducibility via
seed_structureandseed_data(or singleseed) - Optional d-separation CI oracle output (
store_ci_oracle=true)
Installation
From PyPI:
pip install dagsampler
Or with uv:
uv venv
source .venv/bin/activate
uv pip install dagsampler
From GitHub (latest main):
uv pip install "dagsampler @ git+https://github.com/averinpa/dagsampler.git"
Random weights away from zero
To guarantee a minimum signal strength on every edge — so randomly sampled weights don't end up effectively muting a parent — configure:
{
"simulation_params": {
"random_weight_low": -1.5,
"random_weight_high": 1.5,
"random_weight_min_abs": 0.1
}
}
This samples random structural weights from:
[-1.5, -0.1] U [0.1, 1.5]
By default, categorical parents are not allowed with metric functional forms
(linear, polynomial, interaction). Set:
"categorical_parent_metric_form_policy": "stratum_means"to auto-redirect those cases tostratum_means.
Quick start (Python API)
from dagsampler import CausalDataGenerator
config = {
"simulation_params": {"n_samples": 200, "seed": 42},
"graph_params": {
"type": "custom",
"nodes": ["X", "Y", "Z1"],
"edges": [["X", "Z1"], ["Y", "Z1"]],
},
}
result = CausalDataGenerator(config).simulate()
data = result["data"]
dag = result["dag"]
params = result["parametrization"]
CLI
The package exposes dagsampler-generate.
dagsampler-generate \
--config config.json \
--output dataset.csv \
--params-out params.json \
--edges-out edges.json
config.json must contain the same structure used by CausalDataGenerator.
For heteroskedastic noise, use noise_model.func from:
abs_first_parentabs_parent_plus_constmean_abs_plus_const
Development
uv pip install -e ".[dev]"
pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dagsampler-0.1.0.tar.gz.
File metadata
- Download URL: dagsampler-0.1.0.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23e0c1bead42fa035a08b86951172ece058549ba670f1fdd675482713244f4f0
|
|
| MD5 |
de9692d826a07d7b0aec0855fbb1c5c3
|
|
| BLAKE2b-256 |
c20bcc506a54a607f3d55464b10509ca4303c418b876acdf3f19f158ee5f9398
|
File details
Details for the file dagsampler-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dagsampler-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4635b777cdf947674320e9f522ba914880c24bfd11c692b9b08ceb359929871
|
|
| MD5 |
70aed151d45f3a0c70d2ab49c4eb4731
|
|
| BLAKE2b-256 |
4a030f730b1336dc8b2ab4d19ee9ec611d16141d1f70e1c9daf55f9434501147
|