Python implementation of integrated path stability selection (IPSS)

These details have not been verified by PyPI

Project description

Integrated path stability selection (IPSS)

Fast, flexible feature selection with false discovery control

Given an n-by-p feature matrix X (n = number of samples, p = number of features), and an n-dimensional response variable y, IPSS applies a base selection algorithm to subsamples of the data to select features (columns of X) that are related to the response. The final outputs are q-values and efp scores for each feature.

False discovery control

The q-value of feature j is the smallest false discovery rate (FDR) when feature j is selected.
- So to control the FDR at target_fdr, select the features with q-values at most target_fdr.
The efp score of feature j is the expected number of false positives, E(FP), when j is selected.
- So to control the E(FP) at target_fp, select the features with efp scores at most target_fp.

Flexible selection

IPSS applies to a wide range of base feature selection algorithms, including regularized models and any method that computes feature importance scores. This package includes three built-in base selection algorithms: IPSS for L1-regularized linear models (IPSSL1), IPSS for importance scores from gradient boosting (IPSSGB), and IPSS for importance scores from random forests (IPSSRF). It also allows users to seamlessly apply IPSS with their own customized feature importance scores.

Speed

For example, in simulation studies using real RNA-sequencing data from ovarian cancer patients, IPSSL1, IPSSGB, and IPSSRF all run in under 20 seconds (without parallelization) when n=500 and p=5000.

Easy to use

The only required inputs are the feature matrix X and response vector y.

Associated papers

IPSS for regularized models: https://arxiv.org/abs/2403.15877
IPSS for arbitrary feature importance scores: https://arxiv.org/abs/2410.02208v1

Installation

Install from PyPI:

pip install ipss

Usage

from ipss import ipss

# load n-by-p feature matrix X and n-by-1 response vector y

# run ipss
ipss_output = ipss(X,y)

# select features based on target FDR
target_fdr = 0.1
q_values = ipss_output['q_values']
selected_features = [idx for idx, q_value in q_values.items() if q_value <= target_fdr]
print(f'Selected features (target FDR = {target_fdr}): {selected_features}')

Output

ipss_output = ipss(X,y) is a dictionary containing:

efp_scores: Dictionary whose keys are feature indices and values are their efp scores (dict of length p).
q_values: Dictionary whose keys are feature indices and values are their q-values (dict of length p).
runtime: Runtime of the algorithm in seconds (float).
selected_features: Indices of features selected by IPSS; empty list if target_fp and target_fdr are not specified (list of ints).
stability_paths: Estimated selection probabilities at each parameter value (array of shape (n_alphas, p))

Usage with custom feature importance scores

For custom feature importance scores, selector must be a function that takes X and y as inputs (as well as an optional dictionary of arguments selector_args specific to the feature importance function), and returns a list or NumPy array of importance scores, one per feature, that must align with the column order in X.

from ipss import ipss

# define custom feature importance function based on ridge regression
from sklearn.linear_model import Ridge
selector_args = {'alpha':1}
def ridge_selector(X, y, alpha):
	model = Ridge(alpha=alpha)
	model.fit(X,y)
	feature_importance_scores = np.abs(model.coef_)
	return feature_importance_scores

# load n-by-p feature matrix X and n-by-1 response vector y

# run ipss
ipss_output = ipss(X, y, selector=ridge_selector, selector_args=selector_args)

# select features based on target FDR
target_fdr = 0.1
q_values = ipss_output['q_values']
selected_features = [idx for idx, q_value in q_values.items() if q_value <= target_fdr]
print(f'Selected features (target FDR = {target_fdr}): {selected_features}')

Examples

The examples folder includes

A simple simulation: simple_example.py (Open in Google Colab).
Analyze cancer data: cancer.py (Open in Google Colab).

Full list of `ipss` arguments

Required arguments:

X: Features (array of shape (n,p)), where n is the number of samples and p is the number of features.
y: Response (array of shape (n,) or (n, 1)). ipss automatically detects if y is continuous or binary.

Optional arguments:

selector: Base algorithm to use (str; default 'gb'). Options:
- 'gb': Gradient boosting (uses XGBoost).
- 'l1': L1-regularized linear or logistic regression (uses scikit-learn).
- 'rf': Random forest (uses scikit-learn).
- Custom function that computes feature importance scores (see usage example above).
selector_args: Arguments for the base algorithm (dict; default None).
preselect: Preselect/filter features prior to subsampling (bool; default True).
preselect_args: Arguments for preselection algorithm (dict; default None).
target_fp: Target number of false positives to control (positive float; default None).
target_fdr: Target false discovery rate (FDR) (positive float; default None).
B: Number of subsampling steps (int; default 100 for IPSSGB, 50 otherwise).
n_alphas: Number of values in the regularization or threshold grid (int; default 25 if 'l1' else 100).
ipss_function: Function to apply to selection probabilities (str; default 'h2' if 'l1' else 'h3'). Options:
- 'h1': Linear function, h1(x) = 2x - 1 if x >= 0.5 else 0.
- 'h2': Quadratic function, h2(x) = (2x - 1)**2 if x >= 0.5 else 0.
- 'h3': Cubic function, h3(x) = (2x - 1)**3 if x >= 0.5 else 0.
cutoff: Maximum value of the theoretical integral bound I(Lambda) (positive float; default 0.05).
delta: Defines probability measure; see Associated papers (float; defaults depend on selector).
standardize_X: Scale features to have mean 0, standard deviation 1 (bool; default None).
center_y: Center response to have mean 0 (bool; default None).
n_jobs: Number of jobs to run in parallel (int; default 1).

General observations/recommendations:

IPSSGB is usually best for capturing nonlinear relationships between features and response
IPSSL is usually best for capturing linear relationships between features and response
For FDR control, it is usually best to compute q-values with ipss and then use them to select features at the desired FDR threshold (as in the Usage section above), rather than specify target_fdr, which should be left as None. This provides greater flexibility when selecting features.
For E(FP) control, it is usually best to compute efp scores with ipss and then use them to select features at the desired false positive threshold, rather than specify target_fp, which should be left as None. This provides greater flexibility when selecting features.
In general, all other parameters should not be changed
- selector_args include, e.g., decision tree parameters for tree-based models
- Results are robust to B provided it is greater than 25
- 'h3' is less conservative than 'h2' which is less conservative than 'h1'.
- Preselection can significantly reduce computation time.
- Results are robust to cutoff provided it is between 0.025 and 0.1.
- Results are robust to delta provided it is between 0 and 1.5.
- Standardization is automatically applied for IPSSL. IPSSGB and IPSSRF are unaffected by this.
- Centering y is automatically applied for IPSSL. IPSSGB and IPSSRF are unaffected by this.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.1.11

Apr 10, 2026

1.1.10

Oct 1, 2025

1.1.9

Sep 26, 2025

1.1.7

Sep 15, 2025

1.1.6

Sep 15, 2025

1.1.5

Aug 25, 2025

1.1.4

Aug 25, 2025

1.1.3

Aug 19, 2025

1.1.2

Jul 9, 2025

This version

1.1.1

Apr 25, 2025

1.1.0

Apr 24, 2025

1.0.19

Apr 22, 2025

1.0.18

Apr 6, 2025

1.0.17

Mar 5, 2025

1.0.16

Dec 20, 2024

1.0.15

Dec 19, 2024

1.0.14

Dec 8, 2024

1.0.12

Nov 7, 2024

1.0.11

Nov 4, 2024

1.0.10

Nov 4, 2024

1.0.9

Nov 4, 2024

1.0.8

Oct 28, 2024

1.0.7

Oct 28, 2024

1.0.6

Oct 28, 2024

1.0.5

Oct 28, 2024

1.0.4

Oct 4, 2024

1.0.3

Oct 4, 2024

1.0.2

Sep 29, 2024

1.0.1

Sep 29, 2024

1.0.0

Sep 29, 2024

0.4.2

Mar 26, 2024

0.4.1

Mar 25, 2024

0.4.0

Mar 25, 2024

0.3.0

Mar 24, 2024

0.2.0

Mar 24, 2024

0.1.0

Mar 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipss-1.1.1.tar.gz (16.0 kB view details)

Uploaded Apr 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ipss-1.1.1-py3-none-any.whl (16.1 kB view details)

Uploaded Apr 25, 2025 Python 3

File details

Details for the file ipss-1.1.1.tar.gz.

File metadata

Download URL: ipss-1.1.1.tar.gz
Upload date: Apr 25, 2025
Size: 16.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for ipss-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c5e2891396072fbf02f8f93c4c66194dea81f1591aa2fb054448eed89db31122`
MD5	`21438c768e419615c9b0d50b5d7194e2`
BLAKE2b-256	`20b66ce52add066653b2d39d6917fe524980c674f12f461ecab2edb40c9f8873`

See more details on using hashes here.

File details

Details for the file ipss-1.1.1-py3-none-any.whl.

File metadata

Download URL: ipss-1.1.1-py3-none-any.whl
Upload date: Apr 25, 2025
Size: 16.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for ipss-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`357f2f41f8174872f007dad174598cb458d1958a005788cbab091a009051d08c`
MD5	`564f0e4c8e4948347677d5c994e6cd19`
BLAKE2b-256	`f64101d0da1bbd601c1fc7ac165ab9a99863a4377a3a5b4eb6d4db46f4151e7b`

See more details on using hashes here.

ipss 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Integrated path stability selection (IPSS)

Fast, flexible feature selection with false discovery control

False discovery control

Flexible selection

Speed

Easy to use

Associated papers

Installation

Usage

Output

Usage with custom feature importance scores

Examples

Full list of `ipss` arguments

Required arguments:

Optional arguments:

General observations/recommendations:

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

ipss 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Integrated path stability selection (IPSS)

Fast, flexible feature selection with false discovery control

False discovery control

Flexible selection

Speed

Easy to use

Associated papers

Installation

Usage

Output

Usage with custom feature importance scores

Examples

Full list of ipss arguments

Required arguments:

Optional arguments:

General observations/recommendations:

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Full list of `ipss` arguments