Automated Feature Selection & Feature Importance Calculation Framework
Project description
Automated Feature Selection & Importance
autofeatselect
is a python library that automates and accelerates feature selection processes for machine learning projects.
It helps to calculate feature importance scores & rankings with several methods and also helps to detect and remove highly correlated variables.
Installation
You can install from PyPI:
pip install autofeatselect
Key Features
autofeatselect
offers a wide range of features to support feature selection and importance analysis:
- Automated Feature Selection: Various automated feature selection methods, such as LGBM Importance, XGBoost Importance, RFECV so on.
- Feature Importance Analysis: Calculation and visualization of feature importance scores for different algorithms seperately.
- Correlation Analysis: Perform correlation analysis to identify and drop correlated features automatically.
Full List of Methods
Correlation Calculation Methods
- Pearson, Spearman & Kendall Correlation Coefficients for Continuous Variables
- Cramer's V Scores for Categorical Variables
Feature Selection Methods
- LightGBM Feature Importance Scores
- XGBoost Feature Importance Scores (with Target Encoding for Categorical Variables)
- Random Forest Feature Importance Scores (with Target Encoding for Categorical Variables)
- LassoCV Coefficients (with One Hot Encoding for Categorical Variables)
- Permutation Importance Scores (LightGBM as the estimator)
- RFECV Rankings (LightGBM as the estimator)
- Boruta Rankings (Random Forest as the estimator)
Usage
- Calculating Correlations & Detecting Highly Correlated Features
num_static_feats = ['x1', 'x2'] #Static features to be kept regardless of the correlation results.
corr_df_num, remove_list_num = CorrelationCalculator.numeric_correlations(X=X_train,
features=num_feats, #List of continuous features
static_features=num_static_feats,
corr_method='pearson',
threshold=0.9)
corr_df_cat, remove_list_cat = CorrelationCalculator.categorical_correlations(X=X_train,
features=cat_feats, #List of categorical features
static_features=None,
threshold=0.9)
- Calculating Single Feature Importance Score & Plot Results
#Create Feature Selection Object
feat_selector = FeatureSelector(modeling_type='classification', # 'classification' or 'regression'
X_train=X_train,
y_train=y_train,
X_test=None,
y_test=None,
numeric_columns=num_feats,
categorical_columns=cat_feats,
seed=24)
#Train LightGBM model & return importance results as pd.DataFrame
lgbm_importance_df = feat_selector.lgbm_importance(hyperparam_dict=None,
objective=None,
return_plot=True)
#Apply RFECV with using LightGBM as the estimator & return importance results as pd.DataFrame
lgbm_hyperparams = {'learning_rate': 0.01, 'max_depth': 6, 'n_estimators': 400,
'num_leaves': 30, 'random_state':24, 'importance_type':'gain'
}
rfecv_hyperparams = {'step':3, 'min_features_to_select':5, 'cv':5}
rfecv_importance_df = feat_selector.rfecv_importance(lgbm_hyperparams=lgbm_hyperparams,
rfecv_hyperparams=rfecv_hyperparams,
return_plot=False)
- Calculating Single Feature Importance Score & Plot Results
#Automated correlation analysis & applying multiple feature selection methods
feat_selector = AutoFeatureSelect(modeling_type='classification',
X_train=X_train,
y_train=y_train,
X_test=X_test,
y_test=y_test,
numeric_columns=num_feats,
categorical_columns=cat_feats,
seed=24)
corr_features = feat_selector.calculate_correlated_features(static_features=None,
num_threshold=0.9,
cat_threshold=0.9)
feat_selector.drop_correlated_features()
final_importance_df = feat_selector.apply_feature_selection(selection_methods=['lgbm', 'xgb', 'perimp', 'rfecv', 'boruta'])
License
This project is completely free, open-source and licensed under the MIT license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file AutoFeatSelect-0.1.5.tar.gz
.
File metadata
- Download URL: AutoFeatSelect-0.1.5.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046d7408cadcd371479e0f6cca5c0cd6f3bf358244ed6961e214c89056fdcedd |
|
MD5 | e5ff55aa02fb6acf89199908c97dcfcd |
|
BLAKE2b-256 | dfb480b6d39886c5d7fd3beb6f564ecc436ac234e11e97a1b2e1c45708fe5548 |
File details
Details for the file AutoFeatSelect-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: AutoFeatSelect-0.1.5-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7ee58deae929ff4dd1aae0876da3056ac682fb154feb73d61aeec191e3e3ddc |
|
MD5 | 139f90a141e1a60d20a297fda49d4126 |
|
BLAKE2b-256 | fb81813be4d53882c404154c50f834a93b6e831b91eca8bfcf01ceb07d9ed508 |