Python implementation of Integrated Path Stability Selection (IPSS)
Project description
Integrated path stability selection (IPSS)
Integrated path stability selection (IPSS) is a general method for improving feature selection algorithms that yields more robust, accurate, and interpretable models. It does so by allowing users to control the expected number of falsely selected features, E(FP), while producing far more true positives than other versions of stability selection. This implementation of IPSS applied to L1-regularized linear and logistic regression is easy to use, requiring only the X (features) and y (response) data and specification of E(FP).
Associated paper
arXiv:
Installation
To install from PyPI:
pip install ipss
To clone from GitHub:
git clone git@github.com:omelikechi/ipss.git
Or clone from GitHub using HTTPS:
git clone https://github.com/omelikechi/ipss.git
Dependencies
For ipss
:
pip install joblib numpy scikit-learn scipy
Additional dependencies required for examples:
pip install matplotlib seaborn
Examples
Examples are available in the examples
folder as both .py and .ipynb files. These include
- IPSS applied to data simulated from a multivariate normal. Open in Colab
- IPSS applied to prostate cancer data. Open in Colab
- IPSS applied to colon cancer data. Open in Colab
Usage
Given an n-by-p numpy array of features, X (n = number of samples, p = number of features), an n-by-1 numpy array of responses, y, and a target number of expected false positives, EFP:
from ipss import ipss
# Load/generate X and y
# Specify EFP
# Run IPSS:
result = ipss(X, y, EFP)
# Print indices of features selected by IPSS
print(result['selected_features'])
Results
result = ipss(X, y, EFP)
is a dictionary containing:
alphas
: Grid of regularization parameters (array of shape(n_alphas,)
).average_select
: Average number of features selected at each regularization (array of shape(n_alphas,)
).scores
: IPSS score for each feature (array of shape(p,)
).selected_features
: Indices of features selected by IPSS (list of ints).stability_paths
: Estimated selection probabilities at each regularization (array of shape(n_alphas, p)
)stop_index
: Index of regularization value at which IPSS threshold is passed (int).threshold
: The calculated threshold value tau = Integral value / EFP (scalar).
Full ist of arguments
ipss
takes the following arguments (only X
and y
are required, and typically only EFP
is specified):
X
: Features (array of shape(n,p)
).y
: Responses (array of shape(n,)
or(n, 1)
). IPSS automatically detects ify
is continuous or binary.EFP
: Target expected number of false positives (positive scalar; default is1
).cutoff
: Together withEFP
, determines IPSS threshold (positive scalar; default is0.05
).B
: Number of subsampling steps (int; default is50
).n_alphas
: Number of values in regularization grid (int; default is25
).q_max
: Max number of features selected (int; default isNone
, in which caseq_max = p/2
).Z_sparse
: IfTrue
, tensor of subsamples,Z
, is sparse (default isFalse
).lars
: Implements least angle regression for linear regression ifTrue
, lasso otherwise (default isFalse
).selection_function
: Function to apply to the stability paths. If a positive int,m
, function ish_m(x) = (2x - 1)**m
ifx >= 0.5
and0
ifx < 0.5
(int, callable, orNone
; default isNone
, in which case function ish_2
if y is binary, orh_3
if continuous).with_stability
: IfTrue
, uses a stability measure in selection process (default isFalse
).delta
: Determines scaling of regularization interval (scalar; default is1
).standardize_X
: IfTrue
, standardizes all features (default isTrue
).center_y
: IfTrue
, centersy
when it is continuous (default isTrue
).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ipss-0.4.1.tar.gz
.
File metadata
- Download URL: ipss-0.4.1.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4620ff78984da4bb6c09fabe793ff5b90523ee9b0743e5717206f13c86a864b |
|
MD5 | 5b10fe04dc6aeaabbf32fa3f9a5d0105 |
|
BLAKE2b-256 | dce810379d387e7ff1984dafca229077103d1ec6659b535dba25db90e11606ea |
File details
Details for the file ipss-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: ipss-0.4.1-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26f447cc4d2c4e1add9192a623fe82accf93e1f39f491a1cfa7651c756c51cf6 |
|
MD5 | 8b1a178f68ff0d8551606f2852a38557 |
|
BLAKE2b-256 | ab010ae32d67e1dc2b21f9a1e9974f79973d20b68c17bc9f60097876c8abd2e5 |