Skip to main content

This is an implementation of LOFO for automatic feature selection.

Project description

LOFO

Leave One Feature Out (LOFO) is on of the most powerful techniques for feature selection.

This repository contains the implementation of LOFO in Python and can be used with any model of the followings:

  1. Any Scikit-Learn model.
  2. Any TensorFlow/Keras model.
  3. LightGBM.
  4. CatBoost.
  5. XGBoost.

Usage

  • Install the package:
pip install feature-selection-lofo
  • Import lofo
from feature_selection_lofo import lofo
lofo.LOFO(X, Y, 
          model, 
          cv, 
          metric, 
          direction, 
          fit_params=None, 
          predict_type='predict', 
          return_bad_feats=False, 
          groups=None,
          is_keras_model=False)
Args
X Pandas DataFrame, input features to the model (predictors).
Y array_like, target/label feature.
model object, the model class (e.g. sklearn.linear_model.LinearRegression()).
cv object, sklearn cross validatoin object (e.g. sklearn.model_selection.KFold(n_splits=5, shuffle=True, random_state=0)).
metric object, metric to use during search (e.g. sklearn.metrics.roc_auc_score).
direction string, direction of optimization ('max' or 'min').
fit_params string, parameters to use for fitting (e.g. "{'X': x_train, 'y': y_train}") . Defaults to "{'X': x_train, 'y': y_train}".
predict_type string, ('predict' or 'predict_proba'). Defaults to 'predict'.
return_bad_feats boolean, whether to return a list of bad features. Defaults to False.
groups array_like, used with StratifiedGroupKFold. Defaults to None.
is_keras_model boolean, whether the model passed is Keras model. Defaults to False.
Returns
A Pandas DataFrame with harmful features removed.
If return_bad_feats is set to True, it returns a list of the harmful features.
  • Import the needed libraries for your model, cross-validation, etc

Scikit-Learn Model Example

import warnings
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
  • Define the paramters
# shutdown warning messages
warnings.filterwarnings('ignore')

X = train_df.iloc[:, :-1]
Y = train_df.iloc[:, -1]
model = LogisticRegression()
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
metric = roc_auc_score
direction = 'max'
fit_params = "{'X': x_train, 'y': y_train}"
predict_type = 'predict_proba'
return_bad_feats = True
groups = None
is_keras_model = False
  • Define the LOFO object and call it
lofo_object = lofo.LOFO(X, Y, model, cv, metric, direction, fit_params, 
                        predict_type, return_bad_feats, groups, is_keras_model)

clean_X, bad_feats = lofo_object()

clean_X: is the dataset containing the useful features only.

bad_feats: are the harmful or useless features.

LightGBM Model Example

import warnings
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
import lightgbm as lgbm
  • Define the paramters
# shutdown warning messages
warnings.filterwarnings('ignore')

X = train_df.iloc[:, :-1]
Y = train_df.iloc[:, -1]
model= lgbm.LGBMClassifier(
                       objective='binary',
                       metric='auc',
                       subsample=0.7,
                       learning_rate=0.03,
                       n_estimators=100,
                       n_jobs=-1)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
metric = roc_auc_score
direction = 'max'
fit_params = "{'X': x_train, 'y': y_train, 'eval_set': [(x_valid,y_valid)], 'verbose': 0}"
predict_type = 'predict_proba'
return_bad_feats = True
groups = None
is_keras_model = False
  • Define the LOFO object and call it
lofo_object = lofo.LOFO(X, Y, model, cv, metric, direction, fit_params, 
                        predict_type, return_bad_feats, groups, is_keras_model)
clean_X, bad_feats = lofo_object()

TensorFlow/Keras Model Example

import warnings
import numpy as np
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
import tensorflow as tf
from tensorflow.keras import layers
  • Construct the model
def nn_model():
    inputs = layers.Input(shape=X.shape[-1],)
    x = layers.Dense(256, activation='relu')(inputs)
    x = layers.Dense(64, activation='relu')(x)
    output = layers.Dense(1, activation='sigmoid')(x)
    
    model = tf.keras.Model(inputs=inputs, outputs=output)
    model.compile(loss='binary_crossentropy',
              optimizer='adam',)
    
    return model
  • Define the paramters
# shutdown warning messages
warnings.filterwarnings('ignore')

X = train_df.iloc[:, :-1]
Y = train_df.iloc[:, -1]

tf.keras.backend.clear_session()
model = nn_model()

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)
metric = roc_auc_score
direction = 'max'
fit_params = "{'x': x_train, 'y': y_train, 'validation_data': (x_valid, y_valid), 'epochs': 10, 'batch_size': 256, 'verbose': 0}"
predict_type = 'predict'
return_bad_feats = True
groups = None
is_keras_model = True
  • Define the LOFO object and call it
lofo_object = lofo.LOFO(X, Y, model, cv, metric, direction, fit_params, 
                        predict_type, return_bad_feats, groups, is_keras_model)

clean_X, bad_feats = lofo_object()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_selection_lofo-0.0.6.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_selection_lofo-0.0.6-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file feature_selection_lofo-0.0.6.tar.gz.

File metadata

  • Download URL: feature_selection_lofo-0.0.6.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for feature_selection_lofo-0.0.6.tar.gz
Algorithm Hash digest
SHA256 859feef78d430bc5b123833e07257963db655b4c68cd298410c69415cbcde828
MD5 5d124f7b5d7d08f58390c593be9bf4c3
BLAKE2b-256 48b673bf2a8086e791c60fa003ce289163a1cfa1eab87017e8a7480f9acdfe87

See more details on using hashes here.

File details

Details for the file feature_selection_lofo-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for feature_selection_lofo-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 db27817b6c71847b8b05bec1b464dd7b8a37962f8f8162d8689b9f7a873292e4
MD5 1ca483c0de016f5b36385c4bfe474a03
BLAKE2b-256 2b81634bc63d87a35a6d4888c15bc5eb163ba7984d2a4215585ea3591ab540a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page