Skip to main content

Variance Feature Analysis for binary classification feature selection

Project description

VFA - Variance Feature Analysis

A Python package for binary classification feature selection using variance-based analysis.

Overview

Variance Feature Analysis (VFA) implements a feature selection method based on the Class-Variance Ratio (CVR). It selects the most discriminative features for binary classification tasks by analyzing the ratio of between-class variance to total variance.

Installation

pip install vfa

Features

  • Fast variance-based feature selection for binary classification
  • Automatic feature ranking using Class-Variance Ratio (CVR)
  • Weighted feature aggregation
  • Compatible with scikit-learn workflows
  • Lightweight with minimal dependencies (only NumPy)

Usage

from vfa import variance_feature_analysis
import numpy as np

# Example data
X = np.random.rand(100, 20)  # 100 samples, 20 features
y = np.random.randint(0, 2, 100)  # Binary labels

# Select top 5 features
X_selected, f_aggregated, selected_indices, scores = variance_feature_analysis(X, y, k=5)

print(f"Selected feature indices: {selected_indices}")
print(f"Feature scores: {scores[selected_indices]}")

Parameters

  • X (array-like): Training data features of shape (n_samples, n_features)
  • y (array-like): Target labels of shape (n_samples,) - must be binary
  • k (int, default=8): Number of top features to select
  • epsilon (float, default=1e-12): Small constant to prevent division by zero

Returns

  • X_selected: Selected feature subset
  • f_aggregated: Weighted aggregation of selected features
  • selected_idx: Indices of selected features
  • scores: CVR scores for all features

How It Works

The algorithm:

  1. Computes within-class and between-class variance for each feature
  2. Calculates the Class-Variance Ratio (CVR) = B / (B + W)
  3. Selects the top-k features with highest CVR scores
  4. Returns selected features and their weighted aggregation

Requirements

  • Python >=3.8
  • NumPy >=1.20.0

Development

# Clone the repository
git clone https://github.com/nqmn/vfa.git
cd vfa

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Citation

If you use this package in your research, please cite:

@software{vfa2024,
  title={VFA: Variance Feature Analysis},
  author={Mohd Adil Mokti},
  year={2026},
  url={https://github.com/nqmn/vfa}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vfa-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vfa-0.1.0-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file vfa-0.1.0.tar.gz.

File metadata

  • Download URL: vfa-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for vfa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 80268287669a4f15c3eda0ff8a90850c0bb97ee548d75f88713d237eb2759a36
MD5 cb9efbc5ebca13388bc3fae6343d77f6
BLAKE2b-256 03815f034d0c01b12a4f53bee506d0a477d3ff305edd2b2d6881f0695d60433b

See more details on using hashes here.

File details

Details for the file vfa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vfa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for vfa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d121273862543b9a303b507e3e846ec83679bfe2683d73da5cdb326dd0c7f741
MD5 151f45bb00631e1bdc98a4aa0d1481f9
BLAKE2b-256 6ed28e16ddeac1cf0825583f8cc0fcac2fb3b961765dee1561047998a9e6bdd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page