Feature selection using XAI
Project description
Advanced feature selection using explainable Artificial Intelligence (XAI)
Developed by Yaganteeswarudu Akkem , Data scientist , Ph.D. Scholar , NIT Silchar
Introduction
In the rapidly evolving field of machine learning, the complexity of models is ever-increasing, necessitating sophisticated feature selection techniques to enhance predictive performance and shed light on the decision-making processes. This study presents an innovative architecture that synergizes the global explanation capabilities of SHAP (SHapley Additive exPlanations) with the local interpretability provided by LIME (Local Interpretable Model-agnostic Explanations) to advance the feature selection process.
Our proposed methodology harnesses the strengths of both SHAP and LIME, systematically identifying features that wield consistent influence across the entire dataset as well as those vital to individual predictions. By normalizing SHAP values to derive feature weights and integrating these with LIME scores, we formulate a maximum interpretation score for each feature. This hybrid framework offers a refined and nuanced approach to feature selection, adeptly balancing the pursuit of model simplicity with the demands for high predictive accuracy and interpretability. The architecture not only promises substantial enhancements in computational efficiency and model performance but also holds significant promise for applications where model transparency and decision-making understanding are critical.
Examples of How To Use Feature selection
Install package by using below syntax
pip install xai-feature-selection==0.4
Consume package by using below syntax
from xai_feature_selection.feature_selection import FeatureSelect
from xai_feature_selection.model_prediction import Model
Currently xai_feature_selection built to work for classification and regression problems
Use below algorithms to test regession
-
LinearRegression
-
RandomForestRegressor
Use below algorithm for classification
- LogisticRegression
Below is the syntax to retrieve best features after calculating feature importance
file_path: location of csv file in your system
predict_columns : in classification or regression , column which is going to be predicted
model_type_choice :
0 - Regression , 1 - Classification
model_choice :
For regression
0 - LinearRegression
1 - RandomForestRegressor
for classification
0 - LogisticRegression
Once all parameters choosen , simply use below syntax to call Model , to calculate LIME and SHAP values and finally Feature select method will return important features
if predict_columns and file_path:
model = Model(
model_type=model_type_choice,
model_choice=model_choice,
data_file_path=file_path,
predict_columns=predict_columns,
)
model.train()
lime_data, shap_data = model.explain()
feature_handler = FeatureSelect(
shap_data=shap_data, lime_data=lime_data
)
feature_handler.prepare_weights()
feature_handler.calculate_feature_values()
feature_handler.get_best_feature_data()
print(feature_handler.get_best_feature_data())
Note :
Its very important if you pass more appropriate pre-processed data ( without null values , outliers and so on ) , you will expect more better features from algorithm
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xai_feature_selection-0.6.tar.gz
.
File metadata
- Download URL: xai_feature_selection-0.6.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ee5df2a1cb92c7f0c381ebf7c28e84af21378b8fba8ea4c5ea1390237f102bc |
|
MD5 | 91364fe7bc1228c088556d795d2d9afb |
|
BLAKE2b-256 | 7d23100495bdf698e35bd8f1d6e011e676a9b3d2d1055fa3235cfe4387c9fe8c |
File details
Details for the file xai_feature_selection-0.6-py3-none-any.whl
.
File metadata
- Download URL: xai_feature_selection-0.6-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d200591f44101e359aefbcf0c56fea7db6db7156847d588da739a78e77c87744 |
|
MD5 | 4ccdfac6c16a8ee2b497ad9af7e75495 |
|
BLAKE2b-256 | 49433c88db32d53000068f58e821ae380ca95e00545b2ad5c4c4da802a5ce1fd |