Feature selection based on top frequency
Project description
TFFS
Description
TFFS (Feature Selection based on Top Frequency) is a feature selection method that leverages Random Forest to identify the most frequently important features across multiple model runs. This method helps in reducing dimensionality while retaining significant features for better model performance.
Installation
pip install tffs
🔥 Functionality
The library provides multiple feature selection functions that combine TFFS with classical selection techniques:
🏷 Core Function:
get_frequency_of_feature_by_percent(df, number_of_runs, percent, n_estimators)
📌 Parameters
The function get_frequency_of_feature_by_percent() accepts the following parameters:
| Parameter | Type | Description |
|---|---|---|
df |
pandas.DataFrame |
The input dataset containing features and target variables. The first column should be the class label. |
number_of_runs |
int |
The number of times a Random Forest model is built to compute feature importance. |
percent |
float |
The percentage of top important features to retain (e.g., percent=20 keeps the top 20% most important features). |
n_estimators |
int |
The number of decision trees in the Random Forest model. |
📤 Return
The function returns:
- A NumPy array containing the indices of the selected features that are among the top
percent%most important features across multiple Random Forest runs.
🔄 Example Return:
array([0, 2, 4, 7, 9])
📌 Example Usage
import pandas as pd
from tffs import get_features_by_forward_and_tffs
# Create a sample DataFrame
data = pd.DataFrame({
'class': [0, 1, 0, 1, 2, 0, 1, 2, 0, 1],
'feature_1': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
'feature_2': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'feature_3': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
'feature_4': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'feature_5': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
'feature_6': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'feature_7': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
'feature_8': [3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'feature_9': [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
})
# Run the function
selected_features = get_features_by_forward_and_tffs(
data,
percent_tffs=50,
number_run=10,
n_estimators=100,
percent_forward=30
)
print("Selected features:", selected_features)
Author
Vu Thi Kieu Anh
© 2025
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tffs-1.1.2.tar.gz.
File metadata
- Download URL: tffs-1.1.2.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4652625eaf01678b5787ac0e182f7a666af3de61b75c7068d2cda5576bfdfe9
|
|
| MD5 |
560df54f9b5ad01736fb8b70b5adecae
|
|
| BLAKE2b-256 |
fd03b2ee5d8aa9e5e4806a42587f114502f8b52522d1a71823566053841ed10b
|
File details
Details for the file tffs-1.1.2-py3-none-any.whl.
File metadata
- Download URL: tffs-1.1.2-py3-none-any.whl
- Upload date:
- Size: 3.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c5403608cc53fbb16beaa81b0b3102deab675871a85b4af85dc8ad0a8610dac
|
|
| MD5 |
59726724027a183ad9398a6861d1c2fd
|
|
| BLAKE2b-256 |
69b67140a01588d783c53321a17b6569344e08542cb0685c4f485b7fc1a2941b
|