Skip to main content

Variable Selection with Knockoffs in Python

Project description

knockoffspy

An interface to Knockoffs.jl from the Python programming language. knockoffspy provides unique high performance methods for sampling various model-X knockoffs and ships with built-in routines for variable selection. Much of the functionality are unique and allow for orders of magnitude speedup over conventional methods. 'knockoffspy' attaches a Python interface onto the package, allowing seamless use of this tooling by Python users.

Installation

knockoffspy is not on PyPI yet. To install knockoffspy, use pip to clone from Github:

pip3 install git+https://github.com/biona001/knockoffspy.git#egg=knockoffspy

Then in the python interpreter,

>>> import knockoffspy
>>> knockoffspy.install()

This will install the Knockoffs.jl package and all the Julia dependencies that it needs.

Usage

Import the package as

from knockoffspy import ko

The general flow for using the package is to follow exactly as would be done in Julia, except add ko. in front of function calls. Most of the commands will work without any modification. Thus the Knockoffs.jl documentation is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.

Documentation

Most of the commands of Knockoffs.jl will work in python without any modification, just add ko. in front of function calls. Thus the Knockoffs.jl documentation is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.

Example: Exact model-X group knockoffs

Lets simulate X ~ N(0, Sigma) where Sigma is a symmetric Toesplitz matrix. Here we assume every 5 variables form a group

from knockoffspy import ko
from scipy import linalg
import numpy as np

# generate data
n = 1000          # number of samples
p = 1000          # number of covariates
m = 1             # number of knockoffs to generate per feature
groups = np.repeat(np.arange(0,200,1), 5)
Sigma = linalg.toeplitz([0.7**i for i in range(1, p+1)])
mu = np.zeros(p)
X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=(n,))

We generate model-X group knockoffs as follows

solver = "maxent" # Maximum entropy solver, other choices include "mvr", "sdp", "equi"
result = ko.modelX_gaussian_group_knockoffs(X, solver, groups, mu, Sigma, verbose=True)

Maxent initial obj = -2087.6929364666807
Iter 1 (PCA): obj = -1607.4371641119483, δ = 0.4806840533479427, t1 = 0.28, t2 = 0.46
Iter 2 (CCD): obj = -1589.951838589172, δ = 0.046537146748581976, t1 = 0.42, t2 = 1.28, t3 = 0.0
Iter 3 (PCA): obj = -1570.44910152802, δ = 0.32338244109034703, t1 = 0.67, t2 = 1.74
Iter 4 (CCD): obj = -1562.5471454507458, δ = 0.028462155141072386, t1 = 0.81, t2 = 2.56, t3 = 0.0
Iter 5 (PCA): obj = -1557.1393033537286, δ = 0.124560844581473, t1 = 1.04, t2 = 2.99
Iter 6 (CCD): obj = -1552.4489159484508, δ = 0.020754156607442897, t1 = 1.18, t2 = 3.81, t3 = 0.01
Iter 7 (PCA): obj = -1549.810615656943, δ = 0.07012194156368799, t1 = 1.43, t2 = 4.27
Iter 8 (CCD): obj = -1547.020766696055, δ = 0.015614065719368509, t1 = 1.56, t2 = 5.09, t3 = 0.01
Iter 9 (PCA): obj = -1545.531088575216, δ = 0.0511701859313534, t1 = 1.82, t2 = 5.58
Iter 10 (CCD): obj = -1543.8817717502163, δ = 0.013019011861975537, t1 = 1.95, t2 = 6.4, t3 = 0.01
Iter 11 (PCA): obj = -1542.9960966431295, δ = 0.04029486159842148, t1 = 2.23, t2 = 6.87
Iter 12 (CCD): obj = -1542.0192637205867, δ = 0.011386766045749418, t1 = 2.36, t2 = 7.69, t3 = 0.01
Iter 13 (PCA): obj = -1541.4898664708514, δ = 0.03310222247410438, t1 = 2.61, t2 = 8.17
Iter 14 (CCD): obj = -1540.9005704168408, δ = 0.010234115029284592, t1 = 2.74, t2 = 8.99, t3 = 0.01
Iter 15 (PCA): obj = -1540.5789961008802, δ = 0.027573434751233035, t1 = 3.5, t2 = 9.5
Iter 16 (CCD): obj = -1540.221696231743, δ = 0.009352305423352147, t1 = 3.62, t2 = 10.33, t3 = 0.01
Iter 17 (PCA): obj = -1540.0257796341718, δ = 0.02345771973192861, t1 = 4.15, t2 = 10.83
Iter 18 (CCD): obj = -1539.806801955503, δ = 0.008629474061316174, t1 = 4.28, t2 = 11.68, t3 = 0.02
Iter 19 (PCA): obj = -1539.689234490581, δ = 0.020162878516809316, t1 = 4.81, t2 = 12.21
Iter 20 (CCD): obj = -1539.5543495056934, δ = 0.007951906306609491, t1 = 4.93, t2 = 13.03, t3 = 0.02

To extract the knockoffs, S matrix, and the final objective as

Xko = result.Xko
S = result.S
obj = result.obj

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knockoffspy-0.0.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

knockoffspy-0.0.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file knockoffspy-0.0.1.tar.gz.

File metadata

  • Download URL: knockoffspy-0.0.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for knockoffspy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 633cee6d3ef0288fce64230a138619cf772659f119bc6b0c6879f8f4b5e97fe6
MD5 61c942f503158fd7ee3a8a083e98b0a1
BLAKE2b-256 1b6004e8dac787d434796b4a2a05f059d31f71946d805017625be96fcc056878

See more details on using hashes here.

File details

Details for the file knockoffspy-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: knockoffspy-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.1

File hashes

Hashes for knockoffspy-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c04a7f2a659f06ee40f7717b4b2cd71af5cf3acf60bc047b7f737597ec758ca
MD5 68f92b4a79b70a5af09dd9579582b219
BLAKE2b-256 c341a225432f779e2beb5993a0121f21c717fa952e0ad0f771eba793e94f1c5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page