Skip to main content

Sample from general structural causal models (SCMs).

Project description

Sempler: generate synthetic and realistic semi-synthetic data with known ground truth for causal discovery

PyPI version Downloads License

Real and semi-synthetic data produced from the Sachs dataset

[Documentation at https://sempler.readthedocs.io/en/latest/]

Sempler allows you to generate synthetic data from SCMs and semi-synthetic data with known causal ground truth but distributions closely resembling those of a real data set of choice. It is one of the software contributions of the paper "Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions" by Juan L. Gamella, Armeen Taeb, Christina Heinze-Deml and Peter Bühlmann. You can find more details in Appendix E of the paper.

If you find this code useful, please consider citing:

@article{gamella2022characterization,
  title={Characterization and greedy learning of Gaussian structural causal models under unknown interventions},
  author={Gamella, Juan L and Taeb, Armeen and Heinze-Deml, Christina and B{\"u}hlmann, Peter},
  journal={arXiv preprint arXiv:2211.14897},
  year={2022}
}

Overview

The semi-synthetic data generation procedure is implemented in the class sempler.DRFNet (see docs). A detailed explanation of the procedure can be found in Appendix E of the paper.

Additionally, you can generate purely synthetic data from general additive-noise models. Two classes are defined for this purpose.

  • sempler.ANM is for general (acyclic) additive noise SCMs. Any assignment function is possible, as are the distributions of the noise terms.
  • sempler.LGANM is for linear Gaussian SCMs. While this is also possible with sempler.ANM, this class simplifies the interface and offers the additional functionality of sampling "in the population setting", i.e. by returning a symbolic gaussian distribution (see sempler.LGANM.sample and sempler.NormalDistribution).

To allow for random generation of SCMs and interventional distributions, the module sempler.generators contains functions to sample random DAGs and intervention targets.

Installation

You can clone this repo or install using pip. To install sempler in its most basic form, i.e. to generate purely synthetic data with sempler.ANM and sempler.LGANM, simply run

pip install sempler

To install the additional dependencies needed for the semi-synthetic data generation procedure, run

pip install sempler[DRFNet]

which will install sempler with the additional rpy2 dependency. You will also need:

  • an R installation; you can find an installation guide here
  • the R package drf, which you can install by typing install.packages("drf") in an R terminal

Versioning

Sempler is still at its infancy and its API is subject to change. Non backward-compatible changes to the API are reflected by a change to the minor or major version number,

e.g. code written using sempler==0.1.2 will run with sempler==0.1.3, but may not run with sempler==0.2.0.

Documentation

You can find the full documentation at https://sempler.readthedocs.io/en/latest/.

Feedback

Feedback is most welcome! You can add an issue or send an email.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sempler-0.2.14.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sempler-0.2.14-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file sempler-0.2.14.tar.gz.

File metadata

  • Download URL: sempler-0.2.14.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.3

File hashes

Hashes for sempler-0.2.14.tar.gz
Algorithm Hash digest
SHA256 20f4d2a56789668db5e2ec055382c5294f65fc91e65fc90ebe4d4b16bfe92dff
MD5 f110fa4780042df59bc6c5292329033f
BLAKE2b-256 e4626c9a15d7e117606dfeb5a6e88808fd9743ae592fc70e36fcbf0b01f59468

See more details on using hashes here.

File details

Details for the file sempler-0.2.14-py3-none-any.whl.

File metadata

  • Download URL: sempler-0.2.14-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.3

File hashes

Hashes for sempler-0.2.14-py3-none-any.whl
Algorithm Hash digest
SHA256 a8f1528ffbfe9a37d5c668c08f6aadf2b16529cf0ea5b4bf6b745fa2aba9c036
MD5 de605f5f7d63a06765093f0d66768dd0
BLAKE2b-256 ba150446ef38f25107a917ee7f112c6d8020c7d9305e48cf68d8596fb5ebb1c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page