Skip to main content

Python implementation of Integrated Path Stability Selection (IPSS)

Project description

Integrated path stability selection (IPSS)

Integrated path stability selection (IPSS) is a general method for improving feature selection algorithms that yields more robust, accurate, and interpretable models. IPSS does this by allowing users to control the expected number of falsely selected features, E(FP), while producing far more true positives than other versions of stability selection. This Python implementation of IPSS applied to L1-regularized linear and logistic regression is intended for researchers and practitioners alike, requiring only the X and y data and specification of E(FP).

Associated paper

arXiv:

Installation

Dependencies

pip install joblib numpy scikit-learn scipy

Installing IPSS

To install from PyPI:

pip install ipss

To clone from GitHub:

git clone git@github.com:omelikechi/ipss.git

Or clone from GitHub using HTTPS:

git clone https://github.com/omelikechi/ipss.git

Usage

Given an n-by-p matrix of features, X (n = number of samples, p = number of features), an n-by-1 vector of responses, y, and a target number of expected false positives, EFP:

from ipss import ipss

# Load data X and y
# Specify expected number of false positives (EFP)
# Run IPSS:
result = ipss(X, y, EFP)

# Result analysis
print(result['selected_features'])  # features selected by IPSS

Results

result is a dictionary containing:

  • alphas: Grid of regularization parameters (array of shape (n_alphas,)).
  • average_select: Average number of features selected at each regularization (array of shape (n_alphas,)).
  • scores: IPSS score for each feature (array of shape (p,)).
  • selected_features: Indices of features selected by IPSS (list of ints).
  • stability_paths: Estimated selection probabilities at each regularization (array of shape (n_alphas, p))
  • stop_index: Index of regularization value at which IPSS threshold is passed (int).
  • threshold: The calculated threshold value tau = Integral value / EFP (scalar).

Full ist of arguments

ipss takes the following arguments (only X and y are required, and typically only EFP is specified):

  • X: Features (array of shape (n,p)).
  • y: Responses (array of shape (n,) or (n, 1)). IPSS automatically detects if y is continuous or binary.
  • EFP: Target expected number of false positives (positive scalar; default is 1).
  • cutoff: Together with EFP, determines IPSS threshold (positive scalar; default is 0.05).
  • B: Number of subsampling steps (int; default is 50).
  • n_alphas: Number of values in regularization grid (int; default is 25).
  • q_max: Max number of features selected (int; default is None, in which case q_max = p/2).
  • Z_sparse: If True, tensor of subsamples, Z, is sparse (default is False).
  • lars: Implements least angle regression (LARS) for linear regression if True, lasso otherwise (default is False).
  • selection_function: Function to apply to the stability paths. If a positive int, m, function is h_m(x) = (2x - 1)**m if x >= 0.5 and 0 if x < 0.5 (int, callable, or None; default is None, in which case function is h_2 if y is binary, or h_3 if continuous).
  • with_stability: If True, uses a stability measure in selection process (default is False).
  • delta: Determines scaling of regularization interval (scalar; default is 1).
  • standardize_X: If True, standardizes all features (default is True).
  • center_y: If True, centers y when it is continuous (default is True).

Examples

Examples are available in the examples folder. These include

  • A simple example in which features are simulated independently from a standard normal distribution.
  • An example using prostate cancer data, as detailed in the associated paper.
  • An example using colon cancer data, as detailed in the associated paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipss-0.2.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

ipss-0.2.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file ipss-0.2.0.tar.gz.

File metadata

  • Download URL: ipss-0.2.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.0

File hashes

Hashes for ipss-0.2.0.tar.gz
Algorithm Hash digest
SHA256 fd186a3d40cd9111a334e9c639756d50c778797b66c9e1558f0da420e586f25e
MD5 bc49fdb3f95b723515bde66ef8791d99
BLAKE2b-256 98e4dffdfde62f96b2ed03dc02d1f9544e4632751d4826ce4e8beee34fc34fe0

See more details on using hashes here.

File details

Details for the file ipss-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ipss-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.0

File hashes

Hashes for ipss-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64df63fcdf3e47780a96d4b32a2df95259e6a9b0ce635206575fd38fa45e66b6
MD5 c31c2f0a5402cbf13b9f9cbf80f6849a
BLAKE2b-256 b934c9446e24abbd11031ad200f7388ebdac0baa9e5c952f7200d8257ddcdd39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page