CausalBootstrapping is an easy-access implementation and extention of causal bootstrapping (CB) technique for causal analysis. With certain input of observational data, causal graph and variable distributions, CB resamples the data by adjusting the variable distributions which follow intended causal effects.

Project description

CausalBootstrapping

Confounding

In a backdoor setting, an existing confounder may lead to so-called "selection bias". And thus a machine leanring model which is blind to the backend causal relationships between variables is exposed to risks of learning biased and unreliable associations between the predicting target and the features. A simple and intuitive example is as below:

In the figure, the model trained on confounded dataset (for example, the observational data collected from uncontrolled experiments) is biased due to the existence of the confounder. Causal Bootstrapping can aid this challenge by adjusting the observational data's distribution, and thus the model is supposed to learn from the data given the generative distribution of $P(X|do(Y))$ instead of $P(X|Y)$. That is, the model trained on de-confounded dataset by performing backdoor causal bootstrapping shows a proper behavior eliminating the influence imposed by the confounder $U$ as expected (the de-confounded decision boundary is closer to the true class boundary).

Citing

Please use one of the following to cite the code of this repository.

@article{little2019causal,
  title={Causal bootstrapping},
  author={Little, Max A and Badawy, Reham},
  journal={arXiv preprint arXiv:1910.09648},
  year={2019}
}

Installation and getting started

We currently offer seamless installation with pip.

Simply:

pip install CausalBootstrapping

Alternatively, download the current distribution of the package, and run:

pip install .

in the root directory of the decompressed package.

To import the package:

import causalBootstrapping as cb

Example Demo.

Please refer to Tutorials for more instructions and examples.

Import causalBootstrapping lib and other libs for demo.

import causalBootstrapping as cb
from distEst_lib import MultivarContiDistributionEstimator
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.metrics import classification_report

Define a causal graph

causal_graph = '"General Causal Graph"; \
                Y; X; U; Z; \
                U -> Y; \
                Y -> Z; \
                U -> Z; \
                Z -> X; \
                X <-> Y;'

The above causal graph is equivalent to:

Analyse the causal graph and output the weights function expression and required distributions

weight_func_lam, weight_func_str = cb.general_cb_analysis(causal_graph = causal_graph, 
                                                          effect_var_name = 'X', 
                                                          cause_var_name = 'Y',
                                                          info_print = True)

This code is expected to output as below:

Interventional prob.:p_{Y}(X)=\sum_{U,Z,Y'}[p(X|U,Z,Y')p(Z|U,Y)p(U,Y')]
Causal bootstrapping weights function: [P(U,Y')P(U,Y,Z)]/N*[P(U,Y',Z)P(U,Y)]
Required distributions:
1: P(U,Y')
2: P(U,Y,Z)
3: P(U,Y',Z)
4: P(U,Y)

Read the demo. data for causal bootstrapping bootstraping

# Read demo data
testdata_dir = "../test_data/complex_scenario/"
X_train = pd.read_csv(testdata_dir + "X_train.csv")
Y_train = pd.read_csv(testdata_dir + "Y_train.csv")
Z_train = pd.read_csv(testdata_dir + "Z_train.csv")
U_train = pd.read_csv(testdata_dir + "U_train.csv")
# Reform the data to the acceptable format for the causalbootstrapping interfaces
X_train = np.array(X_train)
Y_train = np.array(Y_train)
Z_train = np.array(Z_train)
U_train = np.array(U_train)
data = {"Y'": Y_train,
        "X": X_train,
        "Z": Z_train,
        "U": U_train}

Estimate the desired distributions (as shown in previous output of general_cb_analysis()). User is also encourged to define the distribution functions if certain domain knowledge has been obtained.

#Set number of the bins for histogram becasue all variables follow discrete distributions.
n_bins_uyz = [0,0,0,0]
n_bins_uy = [0,0]
data_uyz = np.concatenate((U_train, Y_train, Z_train), axis = 1)
data_uy = np.concatenate((U_train, Y_train), axis = 1)

dist_estimator_uyz = MultivarContiDistributionEstimator(data_fit=data_uyz, n_bins = n_bins_uyz)
pdf_uyz, puyz = dist_estimator_uyz.fit_histogram()
dist_estimator_uy = MultivarContiDistributionEstimator(data_fit=data_uy, n_bins = n_bins_uy)
pdf_uy, puy = dist_estimator_uy.fit_histogram()

Construct the distribution mapping dict

dist_map = {tuple(sorted(["U","Y","Z"])): lambda U, Y, Z: pdf_uyz([U, Y, Z]),
            tuple(sorted(["U","Y'","Z"])): lambda U, Y_prime, Z: pdf_uyz([U, Y_prime, Z]),
            tuple(sorted(["U","Y'"])): lambda U, Y_prime: pdf_uy([U,Y_prime]),
            tuple(sorted(["U","Y"])): lambda U, Y: pdf_uy([U, Y])}

bootstrap the dataset given the weight function expression

cb_data = cb.general_causal_bootstrapping_simple(weight_func_lam = weight_func_lam, 
                                                 dist_map = dist_map, data = data, 
                                                 intv_var_name = "Y", kernel = None)

Train two linear support vector machines using confounded and de-confounded datasets

clf_conf = svm.SVC(kernel = 'linear', C=2)
clf_conf.fit(X_train, Y_train.reshape(-1))

clf_cb = svm.SVC(kernel = 'linear', C=2)
clf_cb.fit(cb_data['X'], cb_data["intv_Y"].reshape(-1))

Compare their performance on an un-confounded test set

X_test = pd.read_csv(testdata_dir +  "X_test.csv")
Y_test = pd.read_csv(testdata_dir +  "Y_test.csv")
X_test = np.array(X_test)
Y_test = np.array(Y_test)

y_pred_conf = clf_conf.predict(X_test)
print("Report of confonded model:")
print(classification_report(Y_test, y_pred_conf))

y_pred_deconf = clf_cb.predict(X_test)
print("Report of de-confonded model:")
print(classification_report(Y_test, y_pred_deconf))

The expected output should be similar to:

Report of confonded model:
              precision    recall  f1-score   support

           1       0.56      0.88      0.68       865
           2       0.84      0.46      0.60      1135

    accuracy                           0.65      2000
   macro avg       0.70      0.67      0.64      2000
weighted avg       0.72      0.65      0.63      2000

Report of de-confonded model:
              precision    recall  f1-score   support

           1       0.63      0.84      0.72       865
           2       0.84      0.63      0.72      1135

    accuracy                           0.72      2000
   macro avg       0.73      0.73      0.72      2000
weighted avg       0.75      0.72      0.72      2000

Compare models' decision boundaries

#confounding boundary
conf_x2, conf_x3 = np.meshgrid(np.linspace(-6, 6, 20), np.linspace(-6, 6, 20))
conf_x1 = np.zeros((20,20))
# real boundary
real_x1, real_x2 = np.meshgrid(np.linspace(-6, 6, 20), np.linspace(-6, 6, 20))
real_x3 = np.full_like(real_x1, 0)

# confounded svm boundary
xx1, xx2= np.meshgrid(np.linspace(-6, 6, 50), np.linspace(-6, 6, 50))
xx_conf = (-clf_conf.intercept_[0] - clf_conf.coef_[0][0] * xx1 - clf_conf.coef_[0][1] * xx2) / clf_conf.coef_[0][2]

# deconfounded svm boundary
xx1, xx2= np.meshgrid(np.linspace(-6, 6, 50), np.linspace(-6, 6, 50))
xx_cb = (-clf_cb.intercept_[0] - clf_cb.coef_[0][0] * xx1 - clf_cb.coef_[0][1] * xx2) / clf_cb.coef_[0][2]

plt.figure()
ax = plt.axes(projection='3d')
ax.scatter3D(X_test[:,0],X_test[:,1],X_test[:,2],c=Y_test, s = 5, alpha = 0.5)
surf1 = ax.plot_surface(conf_x1, conf_x2, conf_x3, alpha=0.5, rstride=100, cstride=100, color = "yellow", label = "confounding boundary")
surf2 = ax.plot_surface(real_x1, real_x2, real_x3, alpha=0.5, rstride=100, cstride=100, color = "green", label = "real boundary")
surf3 = ax.plot_surface(xx1, xx2, xx_conf, color='red', alpha=0.5, rstride=100, cstride=100, label = "confounded decision boundary")
surf4 = ax.plot_surface(xx1, xx2, xx_cb, color='blue', alpha=0.5, rstride=100, cstride=100, label = "confounded decision boundary")
ax.set_xlabel('X1')
ax.set_ylabel('X2')
ax.set_zlabel('X3')
surf1._facecolors2d=surf1._facecolors
surf1._edgecolors2d=surf1._edgecolors
surf2._facecolors2d=surf2._facecolors
surf2._edgecolors2d=surf2._edgecolors
surf3._facecolors2d=surf3._facecolors
surf3._edgecolors2d=surf3._edgecolors
surf4._facecolors2d=surf4._facecolors
surf4._edgecolors2d=surf4._edgecolors
ax.legend(["Unconfounded test data", "confounding boundary", "real boundary", "confounded decision boundary", "deconfounded decision boundary"])
plt.title('Decision boundary comparison')
plt.tight_layout()
plt.show()

The expected output of the image should be similar to:

Project details

Release history Release notifications | RSS feed

0.2.5

Jan 1, 2026

0.2.4

Jan 1, 2026

0.2.3

Dec 30, 2025

0.2.2

Dec 30, 2025

0.2.1

Dec 30, 2025

0.2.0

Oct 9, 2025

0.1.5

Oct 17, 2024

0.1.4

Oct 13, 2024

0.1.3

Jun 1, 2024

0.1.2

Feb 7, 2024

0.1.1

Feb 2, 2024

0.1.0

Feb 2, 2024

0.0.3

Feb 2, 2024

This version

0.0.2

Oct 9, 2025

0.0.1

Jan 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causalbootstrapping-0.0.2.tar.gz (56.0 kB view details)

Uploaded Oct 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causalbootstrapping-0.0.2-py3-none-any.whl (42.0 kB view details)

Uploaded Oct 9, 2025 Python 3

File details

Details for the file causalbootstrapping-0.0.2.tar.gz.

File metadata

Download URL: causalbootstrapping-0.0.2.tar.gz
Upload date: Oct 9, 2025
Size: 56.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for causalbootstrapping-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`c6e8b7edb937c5fa034b3b74a32b0b3ec79a09f21668e9e9cb1221bf5fbf04d1`
MD5	`8aa02acb204bc5e205f8f1d26e219092`
BLAKE2b-256	`0988ee5225b311940427a2ebcb4f3d82ceae8a6f179eb1b8117bdb5962ff1679`

See more details on using hashes here.

File details

Details for the file causalbootstrapping-0.0.2-py3-none-any.whl.

File metadata

Download URL: causalbootstrapping-0.0.2-py3-none-any.whl
Upload date: Oct 9, 2025
Size: 42.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for causalbootstrapping-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10a2ab83a3a9fced6ba2fd1a14139bd0bb1b7dc725009f376eb4e32531100042`
MD5	`8b4005c6ab873ddd6a1b5a5af2c7ef02`
BLAKE2b-256	`dcb66acaa9531a78c9ee761c50fa124cda844222a3774bc35214e1707220a7e3`

See more details on using hashes here.

causalbootstrapping 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

CausalBootstrapping

Confounding

Citing

Installation and getting started

Example Demo.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes