Skip to main content

All Filter Feature Selection Methods

Project description

Feature Selection

Feature selection is a technique used in machine learning and data mining to improve model performance, reduce irrelevant information, and decrease computational costs. It aims to identify the most important or influential variables among those present in the dataset. During this process, unnecessary or low-impact features are removed, or only those that contribute the most to model performance are selected. Feature selection reduces data dimensionality, thereby enhancing the model's generalization ability, reducing the risk of overfitting, and making the model simpler and more interpretable.

Filter Methods

Filter method is one of the feature selection techniques in machine learning, which involves evaluating each feature independently of the machine learning algorithm. It ranks the features based on certain criteria, such as correlation, statistical tests, or information gain, and selects the top-ranked features for model training. Unlike wrapper and embedded methods, filter methods are computationally less expensive and less prone to overfitting, making them suitable for high-dimensional datasets. However, they may overlook interactions between features. Overall, filter methods serve as an initial step in feature selection, providing insights into the relevance of individual features to the target variable.

All Filter Methods Used in This Package:

  • Fisher Score
  • T-Score
  • Welch's t-statistic
  • Chi-Squared
  • Information Gain
  • Gain Ratio
  • Symmetric Uncertainty Coefficient
  • Relief Score
  • mRMR
  • Absolute Pearson Correlation Coefficients
  • Maximum Likelihood Feature Selection
  • Least Squares Feature Selection
  • Laplacian Feature Selection Score
  • Mutual Information
  • Euclidean Distance
  • Cramer's V test
  • Markov Blanket Filter
  • Kruskal-Wallis test

Example of Usage

from pyallffs import AbsolutePearsonCorrelationCalculator
import pandas as pd

df = pd.read_csv("dataset.csv")
exmp_class = AbsolutePearsonCorrelationCalculator(df, drop_labels=["target_variable"], target="target_variable")

scores, ranked_features = exmp_class.calculate_absolute_pearson_correlation(plot_importance=True)

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyallffs-0.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

pyallffs-0.1-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file pyallffs-0.1.tar.gz.

File metadata

  • Download URL: pyallffs-0.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for pyallffs-0.1.tar.gz
Algorithm Hash digest
SHA256 9aab8e7b5f144fb6dc98380078582edfcaa34dae0a012f1d5887adda2ee2000a
MD5 306135144fe8496291e94b72c3423f2c
BLAKE2b-256 e61c1c373c20685dc4208b86c006f66fb802d984dd44ed0c6a8c18775ef3ec9a

See more details on using hashes here.

File details

Details for the file pyallffs-0.1-py3-none-any.whl.

File metadata

  • Download URL: pyallffs-0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for pyallffs-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc2645ac663c7eaa0b73cc18601094e0e21cd2d92df0fde559b0fc50f2d1ea5c
MD5 cfa38c7354c7e1e5f2662f9a1fefc10c
BLAKE2b-256 ea6e6b7e49908f9016751e81599b6c2cb063464c57fa2c96187cc03bd097f731

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page