Skip to main content

A robust and flexible Python package designed for selecting the most discriminatory features in both **binary and multi-class classification problems** using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.

Project description

KSFeatureSelector is a robust and flexible Python package designed for selecting the most discriminatory features in both binary and multi-class classification problems using the Kolmogorov-Smirnov (K-S) test. It provides advanced options for handling multi-class scenarios and aggregating p-values.

Features

  • Uses the K-S test to rank features by their ability to separate classes.

  • Handles target variables with more than two categories (up to 10 classes internally).

  • Flexible Comparison Strategies:
    • pairwise: Performs K-S tests between every unique pair of classes.

    • one-vs-rest: Compares each class against all other classes combined.

  • Multiple P-Value Aggregation Methods:
    • fisher: Uses Fisher’s combined probability test (default, generally recommended).

    • min: Takes the minimum p-value from all comparisons for a feature.

    • max: Takes the maximum p-value from all comparisons for a feature.

  • Scikit-learn Style API: Offers a class-based interface (KSFeatureSelector with fit, transform) for seamless integration into machine learning pipelines.

  • Convenience Function: Provides a simple select_ks_features wrapper for quick, one-off feature selection.

  • Robust Validation & Warnings: Includes comprehensive input validation and issues UserWarning for data quality issues, such as categories with too few observations or insufficient samples for K-S tests.

  • Pure Python: Built using pandas, scipy, and numpy.

Installation

pip install ksfeatureselector

For local installation:

pip install -e .

Usage

from ksfeatureselector import select_ks_features

significant_features = select_ks_features(
    df, x_cols, y_var,
    top_p=0.01,
    aggregation_method='one-vs-rest',
    p_value_aggregation_method='min'
)
print(f"Significant features (one-vs-rest, min p-value <= 0.01): {significant_features}")

# Example 3: Select top 3 features using 'pairwise' comparison
# and 'max' p-value aggregation
top_3_features_max_agg = select_ks_features(
    df, x_cols, y_var,
    top_n=3,
    aggregation_method='pairwise',
    p_value_aggregation_method='max'
)
print(f"Top 3 features (pairwise, max p-value): {top_3_features_max_agg}")

Arguments

  • df (pd.DataFrame): The input DataFrame containing feature columns and the binary target column.

  • x_cols (List[str]): A list of column names in df representing the features you want to evaluate.

  • y_var (str): The name of the column in df representing the binary target variable (0/1 or similar).

  • top_p (float, optional): If provided, only features with a K-S test p-value less than top_p will be selected.

  • top_n (int, optional): If provided, the top n features with the lowest p-values will be selected.

License

MIT License

Author

V Subrahmanya Raghu Ram Kishore Parupudi Email: pvsrrkishore@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ksfeatureselector-0.2.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ksfeatureselector-0.2.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file ksfeatureselector-0.2.0.tar.gz.

File metadata

  • Download URL: ksfeatureselector-0.2.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for ksfeatureselector-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4320680116b5b13a85d97c7c60da1bc7c57cc1b47348b484327a9b51e16f8c74
MD5 8a174ce614729603958161fce8c67c67
BLAKE2b-256 a59bfb0ba489293e750606cb276a40d3aab93652fa7f8ec386405a821646e672

See more details on using hashes here.

File details

Details for the file ksfeatureselector-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ksfeatureselector-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 abb31f153f12f3d93948d128e5d6bfea8336b85ee8852bb88d416358d9c0e67d
MD5 893dba3df33c91f0107f2c4502744b74
BLAKE2b-256 d1b1253d653ceb246bac1a3fc71fa028fe5919da04151d7d37b9380da1225609

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page