Skip to main content

Deep direct likelihood knockoffs

Project description

DDLK: Deep direct likelihood knockoffs

This package implements DDLK, a method for variable selection with explicit control of the false discovery rate. Install with:

pip install ddlk

Controlled variable selection with DDLK

Suppose you have a set of features and a response. DDLK identifies the features most predictive of the response at a pre-specified false discovery rate (FDR) threshold. For example, if you choose an FDR of 20%, DDLK can guarantee that no more than 20% of the selected features will be unimportant. To learn more about how it works, check out our paper.

Running DDLK

Variable selection with DDLK involves three stages:

  1. Fit a joint distribution to model features
  2. Fit a knockoff generator
  3. Sample knockoffs and apply knockoff filter to select variables at a pre-specified FDR

To see a complete working example, check our synthetic data example, used to generate the gif above. Below is an exceprt of how to run DDLK.

Fitting a joint distribution

This implementation of DDLK uses the fast and easy PyTorch Lightning framework to fit q_joint:

# initialize data
x, y = ...
# put your data in standard PyTorch format
trainloader = ...
# initialize joint distribution model with mean and std of data
((X_mu, ), (X_sigma, )) = utils.get_two_moments(trainloader)
hparams = argparse.Namespace(X_mu=X_mu, X_sigma=X_sigma)
q_joint = mdn.MDNJoint(hparams)
# create and fit a PyTorch Lightning trainer
trainer = pl.Trainer()
trainer.fit(q_joint, train_dataloader=trainloader)

Fitting a knockoff generator

# initialize and fit a DDLK knockoff generator
q_knockoff = ddlk.DDLK(hparams, q_joint=q_joint)
trainer = pl.Trainer()
trainer.fit(q_knockoff, train_dataloader=trainloader)

Variable selection

Using the knockoff generator, we sample knockoffs, and run a Holdout Randomization Test:

xTr_tilde = q_knockoff.sample(xTr)
knockoff_test = hrt.HRT_Knockoffs()
knockoff_test.fit(xTr, yTr, xTr_tilde)

Citing this code

If you use this code, please cite the following paper (available here):

Deep Direct Likelihood Knockoffs
M. Sudarshan, W. Tansey, R. Ranganath
arXiv preprint arXiv:2007.15835

Bibtex entry:

@misc{sudarshan2020deep,
    title={Deep Direct Likelihood Knockoffs},
    author={Mukund Sudarshan and Wesley Tansey and Rajesh Ranganath},
    year={2020},
    eprint={2007.15835},
    archivePrefix={arXiv},
    primaryClass={stat.ML}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddlk-0.0.0.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

ddlk-0.0.0.1-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file ddlk-0.0.0.1.tar.gz.

File metadata

  • Download URL: ddlk-0.0.0.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5

File hashes

Hashes for ddlk-0.0.0.1.tar.gz
Algorithm Hash digest
SHA256 19f13e4dada7d9e064f9a0c3c9a4927be73047793a81e2102da2a0b1b87c7a7f
MD5 0b58b02217e0b9c28b2bd2c1831357cb
BLAKE2b-256 fe26ab4be78f9c2843a30929a8fa56c7f905459f218a549305169551573700a9

See more details on using hashes here.

File details

Details for the file ddlk-0.0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ddlk-0.0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5

File hashes

Hashes for ddlk-0.0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 078b06814457253e12eb3ee751b3e6adda1a1d87dd3ce12022dca177ac8b5096
MD5 63da5453aafc2b790d7df106a4f55871
BLAKE2b-256 ed0050cf3a7981c86da454d06ed469ebade1ca75dd40d7687e53c92a66f8f12e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page