Skip to main content

Feature Selection for Clustering

Project description

Feature Selection for Clustering: fselect

Downloads

A fast and scalable implementation of A-RANK algorithm as proposed by Dash, M. and Liu, H. in their paper "Feature Selection for Clustering" for selecting features from a dataset using an entropy measure using fast python libraries: numpy, pandas and scikit-learn.

Getting Started

Install the package:

pip install fselect

Import the main function:

from fselect import rank_features  

Prepare a dataframe with normalized continuous features:

import pandas as pd

df = pd.DataFrame({
    'feature1': [...],
    'feature2': [...],    
    [...]
})

Rank the features:

ranked_df = rank_features(df)  

The returned dataframe `ranked_df` contains columns: "rank", "feature", "entropy" sorted by entropy.

Usage

The main parameters:

  • dataframe: pd.DataFrame - Input dataframe with continuous normalized features
  • remove_correlated_columns: bool (optional) - Whether to remove highly correlated columns before ranking
  • correlation_threshold: float (optional) - Correlation threshold to determine correlated columns (default 0.999)

Remove correlated columns first

ranked_df = rank_features(df, remove_correlated_columns=True)  

Custom correlation threshold

ranked_df = rank_features(df, remove_correlated_columns=True, correlation_threshold=0.95) 

Algorithm

The entropy calculation is based on the equations defined in the ARANK paper. It calculates a similarity matrix of the dataframe and computes entropy from the same.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fselect-1.0.4.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fselect-1.0.4-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file fselect-1.0.4.tar.gz.

File metadata

  • Download URL: fselect-1.0.4.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.0.0

File hashes

Hashes for fselect-1.0.4.tar.gz
Algorithm Hash digest
SHA256 1d26adffdf2c38e5cda5ec56ce75c277b609ad2234f59da8d64b3a6be56253cd
MD5 48b7c153d4adef3676ded4256f4d16c4
BLAKE2b-256 d4ab4f4b8ff85292785e636ddb7aefbb62b781caae081dfc79a0b7ab818ac1a3

See more details on using hashes here.

File details

Details for the file fselect-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: fselect-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.0.0

File hashes

Hashes for fselect-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7b3237a6ed402f3963d278e654a30994f295c1ea79dff1c4c7359e58f1d33286
MD5 325d6d20cb058e1d642b858637942f43
BLAKE2b-256 86eda405e2cd982ac5bafa60e0a1b1a0e983d991ce63d99c4948fdd7587ef037

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page