Skip to main content

causal-sampler is a python package that integrates multiple causal sampling techniques, e.g., causal bootstrapping and causally weighted Gaussian Mixture Models, offering standardized pipeline and interfaces.

Project description

causal-sampler

causal-sampler is a Python package that integrates multiple causal (re-)sampling techniques, e.g., causal bootstrapping (CB) and causally weighted Gaussian Mixture Models (CW-GMMs), offering a standardized and user-friendly high-level pipeline and interfaces.

By performing causal resampling, the causally biased observational data can be deconfounded in a way that approximates the experimental data that was collected from a well-controlled environment, such as Randomized Controlled trials. Given this, the resampled deconfounded data should benefit many downstream applications, e.g., enable the "causal-blind" machine learning to learn the intended causal relationship between the high-dimensional features and the prediction target variable rather than potentially biased and spurious correlations.

Example of "background brightness" confounding in MNIST dataset


Figure 1. Backdoor confounding.

In a backdoor confounding setting, an existing confounder acting as a common cause of the cause ($Y$) and effect ($X$) variables of interest may lead to so-called "selection bias". A dataset collected in such an environment can be severly causally biased due to the confounder $U$. When a machine leanring model which is blind to the backend causal relationships between variables is trained with the confounded observational dataset, it is exposed to risks of learning unreliable or even spurious associations between the prediction target and the features. A simple and intuitive example is as below:

MNIST background
Figure 2. Example digits from the confounded (a) and non-confounded (b) background-MNIST datasets. In (a), background brightness is manipulated so that it is a confounding factor with digit class (e.g., "6" is brighter than "2"); in (b), the brightness-digit association is randomized.

In the confounded dataset (Fig. 2 (a)), images of digit "6" tend to have brighter backgrounds than images of digit "2", but this confounding effect is not present for the non-confounded dataset (Fig. 2 (b)). Here, we consider the digit categories ("2" or "6" as the cause variable $Y$, the handwriting images as the effeect variable $X$ and the background brightness is the confounder $U$ that acts as common cause that connects the backdoor path between the images and categories.

A standard supervised classifier trained on the confounded MNIST dataset is likely to make predictions based on the input image's average brightness rather than the handwriting digit's actual shape because the brightness feature is strongly associated with the label. Thus, any supervised learning algorithm will use this spurious brightness information to maximize prediction accuracy. However, this brightness information is not what we expect the classifier to learn; should the predictor be applied to data without this brightness confounder, it would be of no use, despite the apparently high out-of-sample accuracy of the predictor.

The causal-sampler package provides multiple causal resampling techniques that can utilize the confounded dataset to generate deconfounded dataset like how Fig.2 (b) shows.

Citing

Please use one of the following to cite the code of this repository.

@article{mao2024mechanism,
  title={Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping},
  author={Mao, Jianqiao and Little, Max A},
  journal={arXiv preprint arXiv:2410.20057},
  year={2024}
}

@article{little2019causal,
  title={Causal bootstrapping},
  author={Little, Max A and Badawy, Reham},
  journal={arXiv preprint arXiv:1910.09648},
  year={2019}
}

Installation and getting started

We currently offer seamless installation with pip.

Simply:

pip install causal-sampler

Alternatively, download the current distribution of the package, and run:

pip install .

in the root directory of the decompressed package.

To import the high-level interfaces:

import causal_sampler.pipeline as cs_pipe

The current version provides two causal sampling techniques:

  • Causal Bootstrapping [1]: CausalBootstrapSampler
  • Causally Weighted Gaussian Mixture Model [2]: CausalGMMSampler

For more detailed demonstration, please see: Demo.

Update plan

  • Causal Dirichlet Process Mixture Models
    • source code development
    • Toy example experiments
    • Interface design and development
  • Causal-aware Markov chain Monte Carlo-based Block Gibbs sampling
    • source code development
    • Toy example experiments
    • Interface design and development
  • Denoising Diffusion Probabilistic Models (DDPM) based causal sample generation
    • source code development
    • Toy example experiments
    • Interface design and development

Reference

[1] Little, Max A., and Reham Badawy. "Causal bootstrapping." arXiv preprint arXiv:1910.09648 (2019).

[2] Mao, Jianqiao. "mechanism-learn" Github repo. https://github.com/JianqiaoMao/mechanism-learn, https://zenodo.org/records/17306651.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_sampler-0.0.4.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causal_sampler-0.0.4-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file causal_sampler-0.0.4.tar.gz.

File metadata

  • Download URL: causal_sampler-0.0.4.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for causal_sampler-0.0.4.tar.gz
Algorithm Hash digest
SHA256 b5895576c1785b1749b0525ce9360712e9990e3d31cd29a8ec692d03137050a1
MD5 2651834b172394ce8e69c3184b2e44d0
BLAKE2b-256 b280807da8c7f3f9b8319a3030ad830334116885b9cf124a248e36bd47ee9843

See more details on using hashes here.

File details

Details for the file causal_sampler-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: causal_sampler-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for causal_sampler-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b2f308eb25badf73917c275766ffd4df1b5d1c1faf7622da89f75576be6212d5
MD5 9ecaefad4ac7c85b905575853a3dd21a
BLAKE2b-256 2a32dee5657e248f376be6d8d8a1b4071ca028bd562c2e9b6be6e5b03524a6ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page