Deep direct likelihood knockoffs
Project description
DDLK: Deep direct likelihood knockoffs
This package implements DDLK, a method for variable selection with explicit control of the false discovery rate. Install with:
pip install ddlk
Controlled variable selection with DDLK
Suppose you have a set of features and a response. DDLK identifies the features most predictive of the response at a pre-specified false discovery rate (FDR) threshold. For example, if you choose an FDR of 20%, DDLK can guarantee that no more than 20% of the selected features will be unimportant. To learn more about how it works, check out our paper.
Running DDLK
Variable selection with DDLK involves three stages:
- Fit a joint distribution to model features
- Fit a knockoff generator
- Sample knockoffs and apply knockoff filter to select variables at a pre-specified FDR
To see a complete working example, check our synthetic data example, used to generate the gif above. Below is an exceprt of how to run DDLK.
Fitting a joint distribution
This implementation of DDLK uses the fast and easy PyTorch Lightning framework to fit q_joint
:
# initialize data
x, y = ...
# put your data in standard PyTorch format
trainloader = ...
# initialize joint distribution model with mean and std of data
((X_mu, ), (X_sigma, )) = utils.get_two_moments(trainloader)
hparams = argparse.Namespace(X_mu=X_mu, X_sigma=X_sigma)
q_joint = mdn.MDNJoint(hparams)
# create and fit a PyTorch Lightning trainer
trainer = pl.Trainer()
trainer.fit(q_joint, train_dataloader=trainloader)
Fitting a knockoff generator
# initialize and fit a DDLK knockoff generator
q_knockoff = ddlk.DDLK(hparams, q_joint=q_joint)
trainer = pl.Trainer()
trainer.fit(q_knockoff, train_dataloader=trainloader)
Variable selection
Using the knockoff generator, we sample knockoffs, and run a Holdout Randomization Test:
xTr_tilde = q_knockoff.sample(xTr)
knockoff_test = hrt.HRT_Knockoffs()
knockoff_test.fit(xTr, yTr, xTr_tilde)
Citing this code
If you use this code, please cite the following paper (available here):
Deep Direct Likelihood Knockoffs
M. Sudarshan, W. Tansey, R. Ranganath
arXiv preprint arXiv:2007.15835
Bibtex entry:
@misc{sudarshan2020deep,
title={Deep Direct Likelihood Knockoffs},
author={Mukund Sudarshan and Wesley Tansey and Rajesh Ranganath},
year={2020},
eprint={2007.15835},
archivePrefix={arXiv},
primaryClass={stat.ML}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ddlk-0.0.0.1.tar.gz
.
File metadata
- Download URL: ddlk-0.0.0.1.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19f13e4dada7d9e064f9a0c3c9a4927be73047793a81e2102da2a0b1b87c7a7f |
|
MD5 | 0b58b02217e0b9c28b2bd2c1831357cb |
|
BLAKE2b-256 | fe26ab4be78f9c2843a30929a8fa56c7f905459f218a549305169551573700a9 |
File details
Details for the file ddlk-0.0.0.1-py3-none-any.whl
.
File metadata
- Download URL: ddlk-0.0.0.1-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 078b06814457253e12eb3ee751b3e6adda1a1d87dd3ce12022dca177ac8b5096 |
|
MD5 | 63da5453aafc2b790d7df106a4f55871 |
|
BLAKE2b-256 | ed0050cf3a7981c86da454d06ed469ebade1ca75dd40d7687e53c92a66f8f12e |