Skip to main content

Feature selection based on top frequency

Project description

TFFS

Description

TFFS (Feature Selection based on Top Frequency) is a feature selection method that leverages Random Forest to identify the most frequently important features across multiple model runs. This method helps in reducing dimensionality while retaining significant features for better model performance.

Installation

pip install tffs

🔥 Functionality

The library provides multiple feature selection functions that combine TFFS with classical selection techniques:

🏷 Core Function:

get_frequency_of_feature_by_percent(df, number_of_runs, percent, n_estimators)

📌 Parameters

The function get_frequency_of_feature_by_percent() accepts the following parameters:

Parameter Type Description
df pandas.DataFrame The input dataset containing features and target variables. The first column should be the class label.
number_of_runs int The number of times a Random Forest model is built to compute feature importance.
percent float The percentage of top important features to retain (e.g., percent=20 keeps the top 20% most important features).
n_estimators int The number of decision trees in the Random Forest model.

📤 Return

The function returns:

  • A NumPy array containing the indices of the selected features that are among the top percent% most important features across multiple Random Forest runs.

🔄 Example Return:

array([0, 2, 4, 7, 9])

📌 Example Usage

import pandas as pd
from tffs import get_features_by_forward_and_tffs

# Create a sample DataFrame
data = pd.DataFrame({
    'class': [0, 1, 0, 1, 2, 0, 1, 2, 0, 1],
    'feature_1': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
    'feature_2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'feature_3': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
    'feature_4': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'feature_5': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
    'feature_6': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'feature_7': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
    'feature_8': [3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    'feature_9': [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
})

# Run the function
selected_features = get_features_by_forward_and_tffs(
    data,
    percent_tffs=50,
    number_run=10,
    n_estimators=100,
    percent_forward=30
)

print("Selected features:", selected_features)

Author

Vu Thi Kieu Anh


© 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tffs-1.1.2.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tffs-1.1.2-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file tffs-1.1.2.tar.gz.

File metadata

  • Download URL: tffs-1.1.2.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.5

File hashes

Hashes for tffs-1.1.2.tar.gz
Algorithm Hash digest
SHA256 a4652625eaf01678b5787ac0e182f7a666af3de61b75c7068d2cda5576bfdfe9
MD5 560df54f9b5ad01736fb8b70b5adecae
BLAKE2b-256 fd03b2ee5d8aa9e5e4806a42587f114502f8b52522d1a71823566053841ed10b

See more details on using hashes here.

File details

Details for the file tffs-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: tffs-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.5

File hashes

Hashes for tffs-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4c5403608cc53fbb16beaa81b0b3102deab675871a85b4af85dc8ad0a8610dac
MD5 59726724027a183ad9398a6861d1c2fd
BLAKE2b-256 69b67140a01588d783c53321a17b6569344e08542cb0685c4f485b7fc1a2941b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page