An approach for semi-parametric causal mediation analysis to estimate the natural (in)direct effects of a binary treatment on an outcome of interest.
Project description
DeepMed:Python Implementation
A Python package for semi-parametric causal mediation analysis to estimate the natural (in)direct effects of a binary treatment on an outcome of interest. DeepMed adopts the deep neural networks and other competing methods(Lasso/RandomForest/GBM) to estimate the nuisance parameters involved in the influence functions of the causal parameters.
Setup
DeepMed depends on numpy, pandas,multiprocess,tensorflow,kerasand sklearn.
Installation
Users can install DeepMed by running the command below in command line:
pip install DeepMed
Import the module
from DeepMed import DeepMed
Parameters
DeepMed(y,d,m,x,method="DNN",hyper_grid=NA,epochs=500,batch_size=100,trim=0.05)
y: A numeric vector for the outcome variable in causal mediation analysis.
d: A numeric vector for the binary treatment variable in causal mediation analysis, which is coded as 0 or 1.
m: A numeric vector for the mediator variable in causal mediation analysis.
x: A numeric vector or a numeric matrix with p columns for p covariates in causal mediation analysis.
method: The method used to estimate the nuisance functions with a 3-fold cross-fitting. Four methods are provided: deep neural network ("DNN"), gradient boosting machine ("GBM"), random forest ("RF") and Lasso ("Lasso"). See details below. By default, method="DNN".
hyper_grid: A dataframe containing a grid of candidate hyperparameters for "DNN", "GBM", or "RF" (see details below). A 3-fold cross-validation is used to select the hyperparameters over hyper_grid based on the cross-entropy loss for binary response and the mean-squared loss for continuous response. If method=="Lasso", this argument will be ignored.
epochs: The maximum number of candidate epochs in deep neural network. By default, epochs=500. If method!="DNN", this argument will be ignored.
batch_size: The batch size of deep neural network. By default, batch_size=100. If method!="DNN", this argument will be ignored.
trim: The trimming rate for preventing conditional treatment or mediator probabilities from being zero. Observations with any denominators in the potential outcomes smaller than the trimming rate will be excluded from the analysis. By default, trim=0.05.
Value
results: The estimates (effect), standard errors (se) and P values (pval) of the total treatment effect (total), (in)direct treatment effect in treated ((in)dir.treat), and (in)direct treatment effect in control group ((in)dir.control).
ntrimmed: The number of observations being excluded due to the denominators in the potential outcomes smaller than the trimming rate.
Details
All binary variables in the data should be coded as 0 or 1.
Four methods are provided to estimate the nuisance functions. hyper_grid is a dataframe for the candidate hyperparameters of "DNN", "GBM", or "RF". If method=="DNN", it has three columns for the L1 regularization parameter in the input layer, the number of hidden layers, and the number of hidden units, respectively. If method=="GBM", it has two columns for the maximum depth of each tree and the total number of trees, respectively. If method=="RF", it has two columns for the minimum size of terminal nodes and the number of trees, respectively. A 3-fold cross-validation is used to select the hyperparameters over hyper_grid. Other hyperparameters involved in these methods are set to be the default values in the corresponding packages.
References
Xu S, Liu L and Liu Z. DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning. NeurIPS 2022. Official R Implementation of DeepMed: DeepMed in R GitHub repository.
Examples
# read files
import pyreadr
data = pyreadr.read_r('/data/example.RData')
x=np.array(data['x'])
y=np.array(data['y'])
d=np.array(data['d'])
m=np.array(data['m'])
# DNN
l1 = np.array([0,0.05]) # the L1 regularizition parameter of the input layer
layer =np.array([1,2]) # the number of hidden layers
unit =np.array([10,20]) # the number of hidden units
hyper_grid = expand_grid(l1,layer,unit) # create a grid of candidate hyperparameters
# run DeepMed on the example data with 1000 observations and two covariates.
test= DeepMed(y,d,m,x,method="DNN",hyper_grid = hyper_grid)
result = test.run()
# GBM
depth = np.array([1,2,3]) # the maximum depth of each tree
trees = np.array([10,50,100]) # the total number of trees
hyper_grid = expand_grid.grid(depth,trees)
test= DeepMed(y,d,m,x,method="GBM",hyper_grid=hyper_grid)
result = test.run()
# Random Forest
nodes = np.array([1,2,3]) # the minimum size of terminal nodes
trees = np.array([10,20,30]) # the number of trees
hyper_grid = expand_grid(nodes,trees)
test= DeepMed(y,d,m,x,method="RF",hyper_grid=hyper_grid)
result = test.run()
# Lasso
test=DeepMed(y,d,m,x,method="Lasso")
result = test.run()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file deepmed-0.2.2.tar.gz.
File metadata
- Download URL: deepmed-0.2.2.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b14b43eb55ff7a0a71c7dfa339aafc3e122c1fe205fa8c568019784d1207cf7
|
|
| MD5 |
434ce83805d798dd06cc07ca4edea913
|
|
| BLAKE2b-256 |
00b7d943c5d1ec608b615b2663f76837fb5823896e9432bbfb7f838e993bcec1
|