Data generation for causal structure learning based on mixed additive noise model (MANM)
Project description
MANM-CS
Data generation module for benchmarking methods for causal structure learning (CSL) from mixed discrete-continuous and nonlinear observational data based upon the mixed additive noise model (MANM). The related paper "MANM-CS: Data Generation for Benchmarking Causal Structure Learning from Mixed Discrete-Continuous and Nonlinear Data" was published at the NeurIPS-21 Workshop "Causal Inference & Machine Learning: Why now?".
Getting started
Installation / Usage via pip
python3 -m pip install manm-cs
python3 -m manm_cs --num_nodes 10 --edge_density 0.5 --num_samples 10000 --discrete_node_ratio 0.5
Installation from Source
Get the code
Start by cloning this repository.
git clone git@github.com:hpi-epic/manm-cs.git
cd manm-cs
Install requirements within venv
Please make sure you have Python 3 installed. We tested the execution of our data generation with Python 3.9. We recommend installing the requirements defined in requirements.txt using virtualenv.
MacOS / Linux
# Install virtualenv
python3 -m pip install --user virtualenv
# Create a new virtual environment
python3 -m venv env
# Activate the virtual environment
source env/bin/activate
Windows
# Install virtualenv
py -m pip install --user virtualenv
# Create a new virtual environment
py -m venv env
# Activate the virtual environment
.\env\Scripts\activate
After the creation of a new virtual environment, we can install the project dependencies defined in setup.cfg for both platforms.
python3 -m pip install .
Execute data generation
You can start the data generation with following command. The generated graph and the dataset are saved as ground_truth.gml and samples.csv in the current working directory. Available parameters for data generation can be seen with python3 -m manm_cs --help
.
python3 -m manm_cs \
--num_nodes 10 \
--edge_density 0.5 \
--num_samples 10000 \
--discrete_node_ratio 0.5
Build Release
python3 -m pip install --upgrade build twine
python3 -m build
# Upload to testPyPi
# use __token__ as username and the pypi token as password
python3 -m twine upload --repository testpypi dist/*
# Upload to PyPi
python3 -m twine upload dist/*
Parameters
name | Value Range | Default | Description |
---|---|---|---|
num_nodes | [1, Inf) | None | Defines the number of nodes to be in the generated DAG. |
edge_density | [0, 1] | None | Defines the density of edges in the generated DAG. |
discrete_node_ratio | [0, 1] | None | Defines the percentage of nodes that shall be of discrete type. Depending on its value the appropriate model (multivariate normal, mixed gaussian, discrete only) is chosen. |
num_samples | [1, Inf) | None | Defines the number of samples that shall be generated from the DAG. |
discrete_signal_to_noise_ratio | [0, 1] | 0.5 | Defines the ratio of uniform noise added within the mixed additive noise model, i.e., 0 = no noise, and 1 = uniform discrete noise |
min_discrete_value_classes | [2, Inf) | 3 | Defines the minimum number of discrete classes a discrete variable shall have. |
max_discrete_value_classes | [2, Inf) | 4 | Defines the maximum number of discrete classes a discrete variable shall have. |
continuous_noise_std | [0, Inf) | 1.0 | Defines the standard deviation of gaussian noise added to continuous variables. |
functions | ([0, 1], func) | id | A list of probabilities and mathmatical functions for relationships between two continuous nodes. Note, the input are tuples (probability, function), where the sum of all probabilities has to equal 1. Command line supported functions are: [linear, quadratic, cubic, tanh, sin, cos] |
num_processes | [1, Inf) | 1 | Number of processes used for data sampling |
conditional_gaussian | 0 or 1 | 1 | '1' Defines that conditional gaussian model is assumed for a mixture of variables. Otherwise '0', discrete variables can have continuous parents. |
beta_lower_limit | (0, Inf) | 0.5 | Lower limit for beta values for influence of continuous parents. Betas are sampled uniform from the union of [-upper,-lower] and [lower,upper]. Upper limit see below. |
beta_upper_limit | (0, Inf) | 1 | Upper limit for beta values for influence of continuous parents. Betas are sampled uniform from the union of [-upper,-lower] and [lower,upper]. Lower limit see above. |
graph_structure_file | None | Defines a path to a .gml file for a fixed DAG structure (ignoring node and edge characteristics) used during manm_cs graph building. Note graph_structure_file is mutually exclusive to num_nodes and edge_density. | |
variables_scaling | {'normal', 'standard', 'rank', 'rank', 'uniform'} | None | 'Scale the continuous variables ('normal' or standard') or all variables ('rank' or 'uniform’) in the dataset once all samples are generated. |
scale_parents | 0 or 1 | 0 | Defines if the influence of the parents on a child node is scaled, e.g., the sum of values of the parents is divided by the number of parents for a continuous child node. |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file manm_cs-0.1.7.tar.gz
.
File metadata
- Download URL: manm_cs-0.1.7.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 659de50e542bbd11b4682b77e5aab2509425c1ec2daf4fc584828cf41e86578c |
|
MD5 | 2c150415d2daf8faf185cc698301da46 |
|
BLAKE2b-256 | beb796e1b118c4af519922b755d97d025658cf725cdf192261a9c03a394e3f55 |
File details
Details for the file manm_cs-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: manm_cs-0.1.7-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7a0046dfb32f3bb483e1b01f04d69c71e212ae72d17ec7f032d1b6b88277529 |
|
MD5 | ce49027e9f573ad6a948ec43c66f99bc |
|
BLAKE2b-256 | 3706514b7b0b10365b01dc3fb6e60371ff11af971f76d4a22def8c1d7f357957 |