Skip to main content

Knockoffs for variable selection

Project description

Knockpy

A python implementation of the knockoffs framework for variable selection. See https://amspector100.github.io/knockpy/ for detailed documentation and tutorials.

Installation

To install knockpy, first install choldate:

pip install git+git://github.com/jcrudy/choldate.git

Then, install knockpy using pip:

pip install knockpy

If the installation fails on your system, please reach out to me and I'll try to help.

To use the (optional) kpytorch submodule, you will need to install pytorch.

Quickstart

Given a data-matrix X and a response vector y, knockpy makes it easy to use knockoffs to perform variable selection using a wide variety of machine learning algorithms (also known as "feature statistic") and types of knockoffs. One quick example is shown below, where we use the cross-validated lasso to assign variable importances to the features and knockoffs.

    import knockpy as kpy
    from knockpy.knockoff_filter import KnockoffFilter

    # Generate synthetic data from a Gaussian linear model
    data_gen_process = kpy.dgp.DGP()
    data_gen_process.sample_data(
        n=1500, # Number of datapoints
        p=500, # Dimensionality
        sparsity=0.1,
        x_dist='gaussian',
    )
    X = data_gen_process.X
    y = data_gen_process.y
    Sigma=data_gen_process.Sigma

    # Run model-X knockoffs
    kfilter = KnockoffFilter(
        fstat='lasso',
        ksampler='gaussian',
    )
    rejections = kfilter.forward(X=X, y=y, Sigma=Sigma)

Most importantly, knockpy is built to be modular, so researchers and analysts can easily layer functionality on top of it.

To run tests

  • To run all tests, run python3 -m pytest
  • To run a specific label, run pytest -v -m {label}.
  • To select all labels except a particular one, run pytest -v -m "not {label}" (with the quotes).
  • To run a specific file, try pytest test/{file_name}.py. To run a specific test within the file, run pytest test/{file_name}.py::classname::test_method. You also don't have to specify the exact test_method, you get the idea.
  • To run a test with profiling, try python3 -m pytest {path} --profile. This should generate a set of .prof files in prof/. Then you can run snakeviz filename.prof to visualize the output. There are also more flags/options for outputs in the command line command.
  • Alternatively, cprofilev is much better. To run cprofilev, copy and paste the test to proftest/* and then run python3 -m cprofilev proftest/test_name.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knockpy-1.0.0.tar.gz (82.6 kB view details)

Uploaded Source

File details

Details for the file knockpy-1.0.0.tar.gz.

File metadata

  • Download URL: knockpy-1.0.0.tar.gz
  • Upload date:
  • Size: 82.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.6.9

File hashes

Hashes for knockpy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3db27c269c5033e18d0a658ea3f7f36b809f934379d357b9c2f4a60c74c997cd
MD5 f3124f488d3111c7f1f54178bb9463c0
BLAKE2b-256 4df24563d138caec48e6b86ecb880b261c17e4607b54e150d4311931534a7c41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page