Factor Importance Ranking and Selection using Total Indices

These details have not been verified by PyPI

Project links

Project description

FIRST: Factor Importance Ranking and Selection for Total Indices

A Python3 module of FIRST, a model-independent factor importance ranking and selection procedure that is based on total Sobol' indices (Huang and Joseph, 2025). This research is supported by U.S. National Science Foundation grants DMS-2310637 and DMREF-1921873. The R implementation is also available on CRAN.

Installation

pip install pyfirst

or from source

pip install git+https://github.com/BillHuang01/pyfirst.git

Usage

Factor Importance Ranking and Selection

FIRST is the main function of this module. It provides factor importance ranking and selection directly from scattered data without any model fitting, where the importance is computed based on total Sobol' indices (Sobol', 2001). FIRST requires the following two arguments:

a numpy ndarray or a pandas dataframe for the factors/predictors X
a numpy ndarray or a pandas series for the response y

FIRST returns a numpy ndarray for the factor importance, with value of zero indicating that the factor is not important to the prediction of the response.

from pyfirst import FIRST
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=10000, n_features=10, noise=1.0, random_state=43)

FIRST(X, y)

For more advanced usages of FIRST, e.g., speeding up for big data, please see or API documentation.

To support an easy integration with sklearn.pipeline.Pipeline for a streamline model training process, we also provide SelectByFIRST, a class that is built from sklearn.feature_selection.

import numpy as np
from pyfirst import SelectByFIRST
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X = housing.data
y = np.log(housing.target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=43)

pipe = Pipeline([
    ('selector', SelectByFIRST(regression=True,random_state=43)),
    ('estimator', RandomForestRegressor(random_state=43))
]).fit(X_train, y_train)

pipe.predict(X_test)

For more details, please see or API documentation.

Total Sobol' Indices Estimation

This module also provides the function TotalSobolKNN for a consistent estimation of total Sobol' indices (Sobol', 2001) directly from scattered data. When the response is noiseless, TotalSobolKNN implements the Nearest-Neighbor estimator from Broto et al. (2020). For noisy response, TotalSobolKNN implements the Noise-Adjusted Nearest-Neighbor estimator from Huang and Joseph (2025). TotalSobolKNN returns a numpy ndarray for the total Sobol' indices estimation.

from pyfirst import TotalSobolKNN
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=10000, n_features=5, noise=1.0, random_state=43)

TotalSobolKNN(X, y, noise=True)

For more details and applications, please see or API documentation.

Shapley Sobol' Indices Estimation

This module also provides the function ShapleySobolKNN for a consistent estimation of Shapley Sobol' indices (Owen, 2014; Song et al., 2016) directly from scattered data. When the response is noiseless, ShapleySobolKNN implements the Nearest-Neighbor estimator from Broto et al. (2020). For noisy response, ShapleySobolKNN implements the Noise-Adjusted Nearest-Neighbor estimator from Huang and Joseph (2025). ShapleySobolKNN returns a numpy ndarray for the Shapley Sobol' indices estimation.

from pyfirst import ShapleySobolKNN
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=10000, n_features=5, noise=1.0, random_state=43)

ShapleySobolKNN(X, y, noise=True)

For more details and applications, please see or API documentation.

FIRSTRank

This module also provides the function FIRSTRank for factor importance ranking via maximizing cumulative variance that can be explained. Please see Huang and Joseph (2025) for details.

from pyfirst import FIRSTRank
from sklearn.datasets import make_friedman1

X, y = make_friedman1(n_samples=10000, n_features=5, noise=1.0, random_state=43)

FIRSTRank(X, y, noise=True)

For more details and applications, please see or API documentation.

References

Huang, C., & Joseph, V. R. (2025). Factor Importance Ranking and Selection using Total Indices. Technometrics.

Sobol', I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and computers in simulation, 55(1-3), 271-280.

Broto, B., Bachoc, F., & Depecker, M. (2020). Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA Journal on Uncertainty Quantification, 8(2), 693-716.

Owen, A. B. (2014), “Sobol’indices and Shapley value,” SIAM/ASA Journal on Uncertainty Quantification, 2, 245–251.

Song, E., Nelson, B. L., & Staum, J. (2016), “Shapley effects for global sensitivity analysis: Theory and computation,” SIAM/ASA Journal on Uncertainty Quantification, 4, 1060-1083.

Citation

If you find this module useful, please consider citing

@article{huang2025factor,
  title={Factor Importance Ranking and Selection using Total Indices},
  author={Huang, Chaofan and Joseph, V Roshan},
  journal={Technometrics},
  pages={1--29},
  year={2025},
  publisher={Taylor \& Francis}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.2

Sep 1, 2025

1.0.1

May 26, 2025

1.0.0

May 26, 2025

0.1.4

Aug 6, 2024

0.1.3

Feb 24, 2024

0.1.2

Feb 15, 2024

0.1.1

Feb 15, 2024

0.1.0

Feb 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfirst-1.0.2.tar.gz (13.2 kB view details)

Uploaded Sep 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyfirst-1.0.2-py3-none-any.whl (12.8 kB view details)

Uploaded Sep 1, 2025 Python 3

File details

Details for the file pyfirst-1.0.2.tar.gz.

File metadata

Download URL: pyfirst-1.0.2.tar.gz
Upload date: Sep 1, 2025
Size: 13.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.7

File hashes

Hashes for pyfirst-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`c5b4e8e239cb3625503eeb435e8e95fd3a92ac1a8b1ea7d29e1fc7d18013f0a3`
MD5	`e7677509518fa64c3835754557dcc111`
BLAKE2b-256	`a63856ee6e2169e97ca0a811970e569f5904e75c2f2f4d1514dad239e4a19b50`

See more details on using hashes here.

File details

Details for the file pyfirst-1.0.2-py3-none-any.whl.

File metadata

Download URL: pyfirst-1.0.2-py3-none-any.whl
Upload date: Sep 1, 2025
Size: 12.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.7

File hashes

Hashes for pyfirst-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ed17a0f5be1ad154e50b64ba155cf320962bd10296d1c504eeee83c03b10f466`
MD5	`3b5b1e7d744def16082be03ee76373bf`
BLAKE2b-256	`4107e7ea3e789354b0b48f75717710202aea7c71d6c2b7b9dd8ecda302601912`

See more details on using hashes here.

pyfirst 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FIRST: Factor Importance Ranking and Selection for Total Indices

Installation

Usage

Factor Importance Ranking and Selection

Total Sobol' Indices Estimation

Shapley Sobol' Indices Estimation

FIRSTRank

References

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes