Variable Selection with Knockoffs in Python
Project description
knockoffspy
An interface to Knockoffs.jl from the Python programming language. knockoffspy
provides unique high performance methods for sampling various model-X knockoffs and ships with built-in routines for variable selection. Much of the functionality are unique and allow for orders of magnitude speedup over conventional methods. 'knockoffspy' attaches a Python interface onto the package, allowing seamless use of this tooling by Python users.
Installation
knockoffspy
is not on PyPI yet. To install knockoffspy
, use pip to clone from Github:
pip3 install git+https://github.com/biona001/knockoffspy.git#egg=knockoffspy
Then in the python interpreter,
>>> import knockoffspy
>>> knockoffspy.install()
This will install the Knockoffs.jl
package and all the Julia dependencies that it needs.
Usage
Import the package as
from knockoffspy import ko
The general flow for using the package is to follow exactly as would be done in Julia, except add ko.
in front of function calls. Most of the commands will work without any modification. Thus the Knockoffs.jl documentation is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.
Documentation
Most of the commands of Knockoffs.jl
will work in python without any modification, just add ko.
in front of function calls. Thus the Knockoffs.jl documentation is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.
Example: Exact model-X group knockoffs
Lets simulate X ~ N(0, Sigma)
where Sigma
is a symmetric Toesplitz matrix. Here we assume every 5 variables form a group
from knockoffspy import ko
from scipy import linalg
import numpy as np
# generate data
n = 1000 # number of samples
p = 1000 # number of covariates
m = 1 # number of knockoffs to generate per feature
groups = np.repeat(np.arange(0,200,1), 5)
Sigma = linalg.toeplitz([0.7**i for i in range(1, p+1)])
mu = np.zeros(p)
X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=(n,))
We generate model-X group knockoffs as follows
solver = "maxent" # Maximum entropy solver, other choices include "mvr", "sdp", "equi"
result = ko.modelX_gaussian_group_knockoffs(X, solver, groups, mu, Sigma, verbose=True)
Maxent initial obj = -2087.6929364666807
Iter 1 (PCA): obj = -1607.4371641119483, δ = 0.4806840533479427, t1 = 0.28, t2 = 0.46
Iter 2 (CCD): obj = -1589.951838589172, δ = 0.046537146748581976, t1 = 0.42, t2 = 1.28, t3 = 0.0
Iter 3 (PCA): obj = -1570.44910152802, δ = 0.32338244109034703, t1 = 0.67, t2 = 1.74
Iter 4 (CCD): obj = -1562.5471454507458, δ = 0.028462155141072386, t1 = 0.81, t2 = 2.56, t3 = 0.0
Iter 5 (PCA): obj = -1557.1393033537286, δ = 0.124560844581473, t1 = 1.04, t2 = 2.99
Iter 6 (CCD): obj = -1552.4489159484508, δ = 0.020754156607442897, t1 = 1.18, t2 = 3.81, t3 = 0.01
Iter 7 (PCA): obj = -1549.810615656943, δ = 0.07012194156368799, t1 = 1.43, t2 = 4.27
Iter 8 (CCD): obj = -1547.020766696055, δ = 0.015614065719368509, t1 = 1.56, t2 = 5.09, t3 = 0.01
Iter 9 (PCA): obj = -1545.531088575216, δ = 0.0511701859313534, t1 = 1.82, t2 = 5.58
Iter 10 (CCD): obj = -1543.8817717502163, δ = 0.013019011861975537, t1 = 1.95, t2 = 6.4, t3 = 0.01
Iter 11 (PCA): obj = -1542.9960966431295, δ = 0.04029486159842148, t1 = 2.23, t2 = 6.87
Iter 12 (CCD): obj = -1542.0192637205867, δ = 0.011386766045749418, t1 = 2.36, t2 = 7.69, t3 = 0.01
Iter 13 (PCA): obj = -1541.4898664708514, δ = 0.03310222247410438, t1 = 2.61, t2 = 8.17
Iter 14 (CCD): obj = -1540.9005704168408, δ = 0.010234115029284592, t1 = 2.74, t2 = 8.99, t3 = 0.01
Iter 15 (PCA): obj = -1540.5789961008802, δ = 0.027573434751233035, t1 = 3.5, t2 = 9.5
Iter 16 (CCD): obj = -1540.221696231743, δ = 0.009352305423352147, t1 = 3.62, t2 = 10.33, t3 = 0.01
Iter 17 (PCA): obj = -1540.0257796341718, δ = 0.02345771973192861, t1 = 4.15, t2 = 10.83
Iter 18 (CCD): obj = -1539.806801955503, δ = 0.008629474061316174, t1 = 4.28, t2 = 11.68, t3 = 0.02
Iter 19 (PCA): obj = -1539.689234490581, δ = 0.020162878516809316, t1 = 4.81, t2 = 12.21
Iter 20 (CCD): obj = -1539.5543495056934, δ = 0.007951906306609491, t1 = 4.93, t2 = 13.03, t3 = 0.02
To extract the knockoffs, S matrix, and the final objective as
Xko = result.Xko
S = result.S
obj = result.obj
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file knockoffspy-0.0.1.tar.gz
.
File metadata
- Download URL: knockoffspy-0.0.1.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 633cee6d3ef0288fce64230a138619cf772659f119bc6b0c6879f8f4b5e97fe6 |
|
MD5 | 61c942f503158fd7ee3a8a083e98b0a1 |
|
BLAKE2b-256 | 1b6004e8dac787d434796b4a2a05f059d31f71946d805017625be96fcc056878 |
File details
Details for the file knockoffspy-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: knockoffspy-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c04a7f2a659f06ee40f7717b4b2cd71af5cf3acf60bc047b7f737597ec758ca |
|
MD5 | 68f92b4a79b70a5af09dd9579582b219 |
|
BLAKE2b-256 | c341a225432f779e2beb5993a0121f21c717fa952e0ad0f771eba793e94f1c5c |