All Filter Feature Selection Methods
Project description
Feature Selection
Feature selection is a technique used in machine learning and data mining to improve model performance, reduce irrelevant information, and decrease computational costs. It aims to identify the most important or influential variables among those present in the dataset. During this process, unnecessary or low-impact features are removed, or only those that contribute the most to model performance are selected. Feature selection reduces data dimensionality, thereby enhancing the model's generalization ability, reducing the risk of overfitting, and making the model simpler and more interpretable.
Filter Methods
Filter method is one of the feature selection techniques in machine learning, which involves evaluating each feature independently of the machine learning algorithm. It ranks the features based on certain criteria, such as correlation, statistical tests, or information gain, and selects the top-ranked features for model training. Unlike wrapper and embedded methods, filter methods are computationally less expensive and less prone to overfitting, making them suitable for high-dimensional datasets. However, they may overlook interactions between features. Overall, filter methods serve as an initial step in feature selection, providing insights into the relevance of individual features to the target variable.
All Filter Methods Used in This Package:
- Fisher Score
- T-Score
- Welch's t-statistic
- Chi-Squared
- Information Gain
- Gain Ratio
- Symmetric Uncertainty Coefficient
- Relief Score
- mRMR
- Absolute Pearson Correlation Coefficients
- Maximum Likelihood Feature Selection
- Least Squares Feature Selection
- Laplacian Feature Selection Score
- Mutual Information
- Euclidean Distance
- Cramer's V test
- Markov Blanket Filter
- Kruskal-Wallis test
Example of Usage
from pyallffs import AbsolutePearsonCorrelationCalculator
import pandas as pd
df = pd.read_csv("dataset.csv")
exmp_class = AbsolutePearsonCorrelationCalculator(df, drop_labels=["target_variable"], target="target_variable")
scores, ranked_features = exmp_class.calculate_absolute_pearson_correlation(plot_importance=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyallffs-0.1.tar.gz
.
File metadata
- Download URL: pyallffs-0.1.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
9aab8e7b5f144fb6dc98380078582edfcaa34dae0a012f1d5887adda2ee2000a
|
|
MD5 |
306135144fe8496291e94b72c3423f2c
|
|
BLAKE2b-256 |
e61c1c373c20685dc4208b86c006f66fb802d984dd44ed0c6a8c18775ef3ec9a
|
File details
Details for the file pyallffs-0.1-py3-none-any.whl
.
File metadata
- Download URL: pyallffs-0.1-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
bc2645ac663c7eaa0b73cc18601094e0e21cd2d92df0fde559b0fc50f2d1ea5c
|
|
MD5 |
cfa38c7354c7e1e5f2662f9a1fefc10c
|
|
BLAKE2b-256 |
ea6e6b7e49908f9016751e81599b6c2cb063464c57fa2c96187cc03bd097f731
|