Skip to main content

Unitless, Unrestricted, Markov-Consistent Random SCM and Data Generation.

Project description

UUMC SCM and Data Generation for Causal Benchmarking

Unitless, Unrestricted, Markov-Consistent Random Static (or Time Series, beta) SCM and Data Generation [paper].

Our work focuses on generation of linear additive Gaussian Structural Causal Models (SCM) given a causal graph. In addition to our proposed approach, our package supports other approaches examined in the UUMC paper for comparison:

Method Description
UUMC Produces unitless, unrestricted, Markov-consistent SCMs. Introduced here; default option.
unit-variance-noise Draws coefficients uniformly from [-HIGH, -LOW] U [LOW, HIGH], and sets all noise variances to 1. Defaults LOW=.5, HIGH=2.
iSCM Begins with UVN SCM generation. The SCM is not complete until calling GEN_DATA. During data generation, the structural parameters (and data) for each variable are standardized by the sample standard deviation of the generated data before moving on to the next variable in the topological order.
IPA The structural parameters for each variable are scaled down by the variance the variable would have had if its parents were independent.
50-50 Begins with UVN SCM generation. The SCM is not complete until calling GEN_DATA. During data generation, data for each variable is generated first without noise, then the causal coefficients and data are scaled down by $\sqrt{2}$ times the sample standard deviation to have a variance of 1/2, and noise with variance 1/2 is added before moving on to the next variable in the topological order.
DaO DAG Adaptation of the Onion Method.

To generate a random SCM and sample data from it (example):

  1. Initialize a graph in a CausalModel or tsCausalModel object (CausalModel.py). This can be done:
    • randomly using Erdös-Rényi sampling
    • from a user-provided array where $a_{ji}=1 \Leftrightarrow X_j \rightarrow X_i$. For time series, $a_{ji\tau}=1 \Leftrightarrow X_j(t-\tau)\rightarrow X_i(t)$.
  2. Call gen_coefficients() on the CausalModel using the options from the table above. This sets the coefficient matrix A and the noise vector s. (Time series UUMC SCM generation is available, but under development.)
  3. Call gen_data() on the CausalModel, providing the number of samples. This returns a Data or TimeSeires object (Data.py) which is also stored in the data attribute of the CausalModel.

Var- and R2-sortability can be examined by calling sortability() on the CausalModel.

Large datasets over multiple SCMs can be generated using CausalModel.gen_dataset(), and AnalysisPlotting.py and UUMC.ipynb contain code that can be used to re-create figures from the UUMC paper.

Citations

Please cite the following papers depending on which method you use:

  • Paul Erdös and Alfréd Rényi. "On the evolution of random graphs." Publ. Math. Inst. Hungar. Acad. Sci, 5:17–61 (1960).
  • Herman, Rebecca J., Jonas Wahl, Urmi Ninad, and Jakob Runge. "Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery." arXiv preprint (2025). (For the 4th Conference on Causal Learning and Reasoning) https://doi.org/10.48550/arXiv.2503.17037
  • Xun Zheng, Bryon Aragam, Pradeep K Ravikumar, and Eric P Xing. "Dags with no tears: Continuous optimization for structure learning." In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/e347c51419ffb23ca3fd5050202f9c3d-Paper.pdf
  • Weronika Ormaniec, Scott Sussex, Lars Lorch, Bernhard Sch¨olkopf, and Andreas Krause. "Standardizing structural causal models" (2024). https://arxiv.org/abs/2406.11601
  • Joris M. Mooij, Sara Magliacane, and Tom Claassen. "Joint causal inference from multiple contexts." Journal of Machine Learning Research, 21(99):1–108 (2020). http://jmlr.org/papers/v21/17-123.html
  • Chandler Squires, Annie Yun, Eshaan Nichani, Raj Agrawal, and Caroline Uhler. "Causal structure discovery between clusters of nodes induced by latent factors." In Bernhard Schölkopf, Caroline Uhler, and Kun Zhang, editors, Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 of Proceedings of Machine Learning Research, pages 669–687. PMLR, 11–13 (Apr 2022). https://proceedings.mlr.press/v177/squires22a.html
  • Andrews, Bryan, and Erich Kummerfeld. "Better simulations for validating causal discovery with the dag-adaptation of the onion method." arXiv preprint (2024). https://arxiv.org/abs/2405.13100
  • Alexander Reisach, Christof Seiler, and Sebastian Weichwald. "Beware of the simulated dag! causal discovery benchmarks may be easy to game." In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 27772–27784. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/e987eff4a7c7b7e580d659feb6f60c1a-Paper.pdf
  • Alexander Reisach, Myriam Tami, Christof Seiler, Antoine Chambaz, and Sebastian Weichwald. "A scale-invariant sorting criterion to find a causal order in additive noise models." In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 785–807. Curran Associates, Inc. (2023). https://proceedings.neurips.cc/paper_files/paper/2023/file/027e86facfe7c1ea52ca1fca7bc1402b-Paper-Conference.pdf

Project details


Release history Release notifications | RSS feed

This version

0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uumcdata-0.0.tar.gz (41.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uumcdata-0.0-py2.py3-none-any.whl (41.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file uumcdata-0.0.tar.gz.

File metadata

  • Download URL: uumcdata-0.0.tar.gz
  • Upload date:
  • Size: 41.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for uumcdata-0.0.tar.gz
Algorithm Hash digest
SHA256 bac2b2e5042556b1a4af960bf132c4b971a889e2048e219620d08e8cbc065733
MD5 e7afbefa7135af01d02b3ee1ec389bbe
BLAKE2b-256 92e96497c2d239c1e767ed41c0d1c3358b95f7194f49945f280633c5ed6705e1

See more details on using hashes here.

File details

Details for the file uumcdata-0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: uumcdata-0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 41.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for uumcdata-0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 066d7c17b025f6b08b73d148e58f6fe2831283d8335e13168e1781eb9f52187c
MD5 e0aec3d43ec051d83d4d184c5e590ede
BLAKE2b-256 8898b54de0d35ad6a433bcbbb9cb3bd23d4bb88df7debd6257362b2f9ab8be63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page