Skip to main content

Deep direct likelihood knockoffs

Project description

DDLK: Deep direct likelihood knockoffs

This package implements DDLK, a method for variable selection with explicit control of the false discovery rate. Install with:

pip install ddlk

Controlled variable selection with DDLK

Suppose you have a set of features and a response. DDLK identifies the features most predictive of the response at a pre-specified false discovery rate (FDR) threshold. For example, if you choose an FDR of 20%, DDLK can guarantee that no more than 20% of the selected features will be unimportant. To learn more about how it works, check out our paper.

Running DDLK

Variable selection with DDLK involves three stages:

  1. Fit a joint distribution to model features
  2. Fit a knockoff generator
  3. Sample knockoffs and apply knockoff filter to select variables at a pre-specified FDR

To see a complete working example, check our synthetic data example, used to generate the gif above. Below is an exceprt of how to run DDLK.

Fitting a joint distribution

This implementation of DDLK uses the fast and easy PyTorch Lightning framework to fit q_joint:

# initialize data
x, y = ...
# put your data in standard PyTorch format
trainloader = ...
# initialize joint distribution model with mean and std of data
((X_mu, ), (X_sigma, )) = utils.get_two_moments(trainloader)
hparams = argparse.Namespace(X_mu=X_mu, X_sigma=X_sigma)
q_joint = mdn.MDNJoint(hparams)
# create and fit a PyTorch Lightning trainer
trainer = pl.Trainer()
trainer.fit(q_joint, train_dataloader=trainloader)

Fitting a knockoff generator

# initialize and fit a DDLK knockoff generator
q_knockoff = ddlk.DDLK(hparams, q_joint=q_joint)
trainer = pl.Trainer()
trainer.fit(q_knockoff, train_dataloader=trainloader)

Variable selection

Using the knockoff generator, we sample knockoffs, and run a Holdout Randomization Test:

xTr_tilde = q_knockoff.sample(xTr)
knockoff_test = hrt.HRT_Knockoffs()
knockoff_test.fit(xTr, yTr, xTr_tilde)

Citing this code

If you use this code, please cite the following paper (available here):

Deep Direct Likelihood Knockoffs
M. Sudarshan, W. Tansey, R. Ranganath
arXiv preprint arXiv:2007.15835

Bibtex entry:

@misc{sudarshan2020deep,
    title={Deep Direct Likelihood Knockoffs},
    author={Mukund Sudarshan and Wesley Tansey and Rajesh Ranganath},
    year={2020},
    eprint={2007.15835},
    archivePrefix={arXiv},
    primaryClass={stat.ML}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddlk-0.0.0.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

ddlk-0.0.0.2-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file ddlk-0.0.0.2.tar.gz.

File metadata

  • Download URL: ddlk-0.0.0.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for ddlk-0.0.0.2.tar.gz
Algorithm Hash digest
SHA256 5faaebadca9b5bf18fcd7785bad5aee6324f214973c1793ff1f67ec52185ddb7
MD5 36d08b4739b3b7dca7cd71ba4f1ce402
BLAKE2b-256 228fe9506e60899801e0034390a9e0dc7739f45ed00c794e4fdb456592c13553

See more details on using hashes here.

File details

Details for the file ddlk-0.0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ddlk-0.0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.5

File hashes

Hashes for ddlk-0.0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6611a2ede88fc6bec3fde8f8337f80c0689328ebdd3b62509d688b513286333e
MD5 9f30ce1e0e7932cce8c279d32eba4d7a
BLAKE2b-256 100f0337270d95d5b3c4bb91fa324b75fa51da1552781a76a544d0baa57cf452

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page