causal-sampler is a python package that integrates multiple causal sampling techniques, e.g., causal bootstrapping and causally weighted Gaussian Mixture Models, offering standardized pipeline and interfaces.
Project description
causal-sampler
causal-sampler is a Python package that integrates multiple causal (re-)sampling techniques, e.g., causal bootstrapping (CB) and causally weighted Gaussian Mixture Models (CW-GMMs), offering a standardized and user-friendly high-level pipeline and interfaces.
By performing causal resampling, the causally biased observational data can be deconfounded in a way that approximates the experimental data that was collected from a well-controlled environment, such as Randomized Controlled trials. Given this, the resampled deconfounded data should benefit many downstream applications, e.g., enable the "causal-blind" machine learning to learn the intended causal relationship between the high-dimensional features and the prediction target variable rather than potentially biased and spurious correlations.
Example of "background brightness" confounding in MNIST dataset
Figure 1. Backdoor confounding.
In a backdoor confounding setting, an existing confounder acting as a common cause of the cause ($Y$) and effect ($X$) variables of interest may lead to so-called "selection bias". A dataset collected in such an environment can be severly causally biased due to the confounder $U$. When a machine leanring model which is blind to the backend causal relationships between variables is trained with the confounded observational dataset, it is exposed to risks of learning unreliable or even spurious associations between the prediction target and the features. A simple and intuitive example is as below:
Figure 2. Example digits from the confounded (a) and non-confounded (b) background-MNIST datasets. In (a), background brightness is manipulated so that it is a confounding factor with digit class (e.g., "6" is brighter than "2"); in (b), the brightness-digit association is randomized.
In the confounded dataset (Fig. 2 (a)), images of digit "6" tend to have brighter backgrounds than images of digit "2", but this confounding effect is not present for the non-confounded dataset (Fig. 2 (b)). Here, we consider the digit categories ("2" or "6" as the cause variable $Y$, the handwriting images as the effeect variable $X$ and the background brightness is the confounder $U$ that acts as common cause that connects the backdoor path between the images and categories.
A standard supervised classifier trained on the confounded MNIST dataset is likely to make predictions based on the input image's average brightness rather than the handwriting digit's actual shape because the brightness feature is strongly associated with the label. Thus, any supervised learning algorithm will use this spurious brightness information to maximize prediction accuracy. However, this brightness information is not what we expect the classifier to learn; should the predictor be applied to data without this brightness confounder, it would be of no use, despite the apparently high out-of-sample accuracy of the predictor.
The causal-sampler package provides multiple causal resampling techniques that can utilize the confounded dataset to generate deconfounded dataset like how Fig.2 (b) shows.
Citing
Please use one of the following to cite the code of this repository.
@article{mao2024mechanism,
title={Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping},
author={Mao, Jianqiao and Little, Max A},
journal={arXiv preprint arXiv:2410.20057},
year={2024}
}
@article{little2019causal,
title={Causal bootstrapping},
author={Little, Max A and Badawy, Reham},
journal={arXiv preprint arXiv:1910.09648},
year={2019}
}
Installation and getting started
We currently offer seamless installation with pip.
Simply:
pip install causal-sampler
Alternatively, download the current distribution of the package, and run:
pip install .
in the root directory of the decompressed package.
To import the high-level interfaces:
import causal_sampler.pipeline as cs_pipe
The current version provides two causal sampling techniques:
- Causal Bootstrapping [1]:
CausalBootstrapSampler - Causally Weighted Gaussian Mixture Model [2]:
CausalGMMSampler
For more detailed demonstration, please see: Demo.
Update plan
- Causal Dirichlet Process Mixture Models
- source code development
- Toy example experiments
- Interface design and development
- Causal-aware Markov chain Monte Carlo-based Block Gibbs sampling
- source code development
- Toy example experiments
- Interface design and development
- Denoising Diffusion Probabilistic Models (DDPM) based causal sample generation
- source code development
- Toy example experiments
- Interface design and development
Reference
[1] Little, Max A., and Reham Badawy. "Causal bootstrapping." arXiv preprint arXiv:1910.09648 (2019).
[2] Mao, Jianqiao. "mechanism-learn" Github repo. https://github.com/JianqiaoMao/mechanism-learn, https://zenodo.org/records/17306651.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causal_sampler-0.0.4.tar.gz.
File metadata
- Download URL: causal_sampler-0.0.4.tar.gz
- Upload date:
- Size: 54.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5895576c1785b1749b0525ce9360712e9990e3d31cd29a8ec692d03137050a1
|
|
| MD5 |
2651834b172394ce8e69c3184b2e44d0
|
|
| BLAKE2b-256 |
b280807da8c7f3f9b8319a3030ad830334116885b9cf124a248e36bd47ee9843
|
File details
Details for the file causal_sampler-0.0.4-py3-none-any.whl.
File metadata
- Download URL: causal_sampler-0.0.4-py3-none-any.whl
- Upload date:
- Size: 41.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2f308eb25badf73917c275766ffd4df1b5d1c1faf7622da89f75576be6212d5
|
|
| MD5 |
9ecaefad4ac7c85b905575853a3dd21a
|
|
| BLAKE2b-256 |
2a32dee5657e248f376be6d8d8a1b4071ca028bd562c2e9b6be6e5b03524a6ac
|