Skip to main content

FRUFS stands for Feature Relevance based Unsupervised Feature Selection and is an unsupervised feature selection technique using supervised algorithms such as XGBoost

Project description

forthebadge made-with-python ForTheBadge built-with-love

PyPI version shields.io Downloads Maintenance

FRUFS: Feature Relevance based Unsupervised Feature Selection

FRUFS stands for Feature Relevance based Unsupervised Feature Selection and is an unsupervised feature selection technique that uses supervised algorithms such as XGBoost to rank features based on their importance.

How to install?

pip install FRUFS

Functions and parameters

# The initialization of FRUFS takes in multiple parameters as input
model = FRUFS(model_r, model_c, k, n_jobs, verbose, categorical_features, random_state)
  • model_r - estimator object, default=DecisionTreeRegressor() The model which is used to regress current continuous feature given all other features.
  • model_c - estimator object, default=DecisionTreeClassifier() The model which is used to classify current categorical feature given all other features.
  • k - float/int, default=1.0 The number of features to select.
    • float means to consider round(total_features*k) number of features. Values range from 0.0-1.0
    • int means to consider k number of features. Values range from 0-total_features
  • n_jobs - int, default=-1 The number of CPUs to use to do the computation.
    • None means 1 unless in a :obj:joblib.parallel_backend context.
    • -1 means using all processors.
  • verbose - int, default=0 Controls the verbosity: the higher, more the messages. A value of 0 displays a nice progress bar.
  • categorical_features - list of integers or strings A list of indices denoting which features are categorical
    • list of integers If input data is a numpy matrix then pass a list of integers that denote indices of categorical features
    • list of strings If input data is a pandas dataframe then pass a list of strings that denote names of categorical features
  • random_state - int or RandomState instance, default=None Pass an int for reproducible output across multiple function calls.
# To fit FRUFS on provided dataset and find recommended features
fit(data)
  • data - A pandas dataframe or a numpy matrix upon which feature selection is to be applied
    (Passing pandas dataframe allows using correct column names. Numpy matrix will apply default column names)
# This function prunes the dataset to selected set of features
transform(data)
  • data - A pandas dataframe or a numpy matrix which needs to be pruned
    (Passing pandas dataframe allows using correct column names. Numpy matrix will apply default column names)
# To fit FRUFS on provided dataset and return pruned data
fit_transform(data)
  • data - A pandas dataframe or numpy matrix upon which feature selection is to be applied
    (Passing pandas dataframe allows using correct column names. Numpy matrix will apply default column names)
# To plot XGBoost style feature importance
feature_importance()

How to import?

from FRUFS import FRUFS

Usage

If data is a pandas dataframe

# Import the algorithm. 
from FRUFS import FRUFS
# Initialize the FRUFS object
model_frufs = FRUFS(model_r=LGBMRegressor(random_state=27), model_c=LGBMClassifier(random_state=27, class_weight="balanced"), categorical_features=categorical_features, k=13, n_jobs=-1, verbose=0, random_state=27)
# The fit_transform function is a wrapper for the fit and transform functions, individually.
# The fit function ranks the features and the transform function prunes the dataset to selected set of features
df_train_pruned = model.fit_transform(df_train)
df_test_pruned = model.transform(df_test)
# Get a plot of the feature importance scores
model_frufs.feature_importance()

If data is a numpy matrix

# Import the algorithm. 
from FRUFS import FRUFS
# Initialize the FRUFS object
model_frufs = FRUFS(model_r=LGBMRegressor(random_state=27), model_c=LGBMClassifier(random_state=27, class_weight="balanced"), categorical_features=categorical_features, k=13, n_jobs=-1, verbose=0, random_state=27)
# The fit_transform function is a wrapper for the fit and transform functions, individually.
# The fit function ranks the features and the transform function prunes the dataset to selected set of features
X_train_pruned = model.fit_transform(X_train)
X_test_pruned = model.transform(X_test)
# Get a plot of the feature importance scores
model_frufs.feature_importance()

For better accuracy

  • Try incorporating more features by increasing the value of k
  • Pass strong, hyperparameter-optimized non-linear models

For better speeds

  • Set n_jobs to -1

Performance in terms of Accuracy (classification) and MSE (regression)

Dataset # of samples # of features Task Type Score using all features Score using FRUFS # of features selected % of features selected Tutorial
Ionosphere 351 34 Supervised 88.01 91.45 24 70.5% tutorial here
Adult 45222 14 Supervised 62.16 62.65 13 92.8% tutorial here
MNIST 60000 784 Unsupervised 50.48 53.70 329 42.0% tutorial here
Waveform 5000 21 Unsupervised 38.20 39.67 15 72.0% tutorial here

Note: Here, for the first and second task, we use accuracy and f1 score, respectively while for both the fourth and fifth tasks, we use the NMI metric. In all cases, higher scores indicate better performance.

Future Ideas

  • Let me know

Feature Request

Drop me an email at atif.hit.hassan@gmail.com if you want any particular feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FRUFS-1.0.0.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

FRUFS-1.0.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file FRUFS-1.0.0.tar.gz.

File metadata

  • Download URL: FRUFS-1.0.0.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.8.2 readme-renderer/27.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/3.10.1 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for FRUFS-1.0.0.tar.gz
Algorithm Hash digest
SHA256 17f06ad5632dca9365589af0b90a18965e9412c9003494fa3d84806c451b1267
MD5 6efc3a2d0e0559b532098593723ecf1a
BLAKE2b-256 4a4c2d736e8dd9ee6e45769e41058244fe1b1ae2b79a8cad3054669e4ec14219

See more details on using hashes here.

File details

Details for the file FRUFS-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: FRUFS-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 pkginfo/1.8.2 readme-renderer/27.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/3.10.1 keyring/23.4.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for FRUFS-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2fb5d222f679ecd981507ea397932153e3bb4e629e6f1bc780255de186b19c01
MD5 893a6115f41539c5e62df32858e4e3e5
BLAKE2b-256 7f7a10fac82514aba68c1d1104b6df2e691025889e6848214cc4c220205ad75a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page