Skip to main content

Feature Selection for Clustering

Project description

Feature Selection for Clustering: fselect

A fast and scalable implementation of A-RANK algorithm as proposed by Dash, M. and Liu, H. in their paper "Feature Selection for Clustering" for selecting features from a dataset using an entropy measure using fast python libraries: numpy, pandas and scikit-learn.

Getting Started

Install the package:

pip install fselect

Import the main function:

from fselect import rank_features  

Prepare a dataframe with normalized continuous features:

import pandas as pd

df = pd.DataFrame({
    'feature1': [...],
    'feature2': [...],    
    [...]
})

Rank the features:

ranked_df = rank_features(df)  

The returned dataframe `ranked_df` contains columns: "rank", "feature", "entropy" sorted by entropy.

Usage

The main parameters:

  • dataframe: pd.DataFrame - Input dataframe with continuous normalized features
  • remove_correlated_columns: bool (optional) - Whether to remove highly correlated columns before ranking
  • correlation_threshold: float (optional) - Correlation threshold to determine correlated columns (default 0.999)

Remove correlated columns first

ranked_df = rank_features(df, remove_correlated_columns=True)  

Custom correlation threshold

ranked_df = rank_features(df, remove_correlated_columns=True, correlation_threshold=0.95) 

Algorithm

The entropy calculation is based on the equations defined in the ARANK paper. It calculates a similarity matrix of the dataframe and computes entropy from the same.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fselect-1.0.2.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fselect-1.0.2-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file fselect-1.0.2.tar.gz.

File metadata

  • Download URL: fselect-1.0.2.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.0.0

File hashes

Hashes for fselect-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6c87df100c3f8679a889b7692cbdac4c34e40ef1040ac4e7075153bb70260e8a
MD5 ddf9c572f943c9f1d3dc24bb6101ca9d
BLAKE2b-256 ac4c3c0213d7aea62e771af97076ec3c18bfb68064fb7dcac936aed2caaccf8f

See more details on using hashes here.

File details

Details for the file fselect-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: fselect-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/23.0.0

File hashes

Hashes for fselect-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 624db0dd845e4aca067c2a3777f315f81cdc81edd2df41267d2f75438d076464
MD5 e3c91d0d9fba448deff6d03d9cf7e5bd
BLAKE2b-256 201a6959e1594402c57cc051ff951f91d54ac7961d9a8fb742e17c6b61d47ae3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page