Skip to main content

Feature Selection via Genetic Algorithm - ML framework for optimal feature subset discovery

Project description

Feature Selection via Genetic Algorithm (FSGA)

A production-ready framework for automated feature selection using Genetic Algorithms with comprehensive evaluation and visualization tools.

Quick Start

# Installation
git clone <repository-url>
cd feature-selection-via-genetic-algorithm
uv venv && source .venv/bin/activate
uv pip install -e .

# Run example
python experiments/run_comparison.py

Basic Usage

from fsga.core.genetic_algorithm import GeneticAlgorithm
from fsga.datasets.loader import load_dataset
from fsga.evaluators.accuracy_evaluator import AccuracyEvaluator
from fsga.ml.models import ModelWrapper

# Load data and setup
X_train, X_test, y_train, y_test, _ = load_dataset('iris', split=True)
model = ModelWrapper('rf', n_estimators=50, random_state=42)
evaluator = AccuracyEvaluator(X_train, y_train, X_test, y_test, model)

# Run GA
from fsga.selectors.tournament_selector import TournamentSelector
from fsga.operators.uniform_crossover import UniformCrossover
from fsga.mutations.bitflip_mutation import BitFlipMutation

ga = GeneticAlgorithm(
    num_features=X_train.shape[1],
    evaluator=evaluator,
    selector=TournamentSelector(evaluator, tournament_size=3),
    crossover_operator=UniformCrossover(),
    mutation_operator=BitFlipMutation(probability=0.01),
    population_size=50,
    num_generations=100,
    early_stopping_patience=10
)

results = ga.evolve()
print(f"Accuracy: {results['best_fitness']:.2%}")
print(f"Features: {results['best_chromosome'].sum()}/{X_train.shape[1]}")

Key Features

  • Modular Design: Swappable operators, selectors, and evaluators
  • Multiple Operators: 5 crossover types, 5 selection strategies, 3 fitness functions
  • Baseline Comparisons: Built-in RFE, LASSO, Mutual Information, Chi², ANOVA
  • Statistical Rigor: Wilcoxon, Mann-Whitney, Cohen's d, Jaccard stability
  • Visualization: 9 publication-quality plot functions
  • Experiment Framework: ExperimentRunner for reproducible experiments
  • Configuration: YAML-based configuration system

Architecture

fsga/
├── core/          # GA engine (genetic_algorithm, population)
├── operators/     # Crossover: uniform, single-point, two-point, multi-point
├── mutations/     # Mutation: bitflip
├── selectors/     # Selection: tournament, roulette, ranking, elitism
├── evaluators/    # Fitness: accuracy, F1, balanced accuracy
├── ml/            # Model wrappers (sklearn integration)
├── datasets/      # Dataset loaders (iris, wine, breast_cancer, digits)
├── analysis/      # Baselines + ExperimentRunner
├── visualization/ # 9 plot functions
└── utils/         # Config, metrics, serialization, logging

Documentation

  • Getting Started - Installation and basic usage
  • Tutorial - Step-by-step guide with examples
  • Architecture - System design and extension points
  • Project Plan - Remaining tasks and roadmap
  • Module READMEs - See fsga/*/README.md for component details

Example Results

Breast Cancer Dataset (30 features → 12 features):

  • GA Accuracy: 98.3% with 40% of features
  • All Features: 95.7% with 100% of features
  • +2.6% accuracy, 60% dimensionality reduction

Iris Dataset (4 features → 2 features):

  • GA Accuracy: 98.3% with 50% of features
  • Selected: petal length, petal width

Wine Dataset (13 features → 6.5 features):

  • GA Accuracy: 100% with 50% of features

Running Experiments

# Full analysis (all datasets, all visualizations)
python experiments/run_experiment.py

# Quick test (single dataset, fewer runs)
python experiments/run_experiment.py --quick

# Specific datasets only
python experiments/run_experiment.py --datasets iris wine

# Without visualizations (faster)
python experiments/run_experiment.py --no-plots

# Results saved to: results/{mode}/{dataset}/

Tests

# Run all tests
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ --cov=fsga --cov-report=html

# Current: 280+ tests, 82% coverage

Configuration

Example config (configs/default.yaml):

population_size: 50
num_generations: 100
mutation_rate: 0.01
crossover_rate: 0.8
early_stopping_patience: 10

dataset:
  name: iris
  split_ratio: 0.7

Load with:

from fsga.utils.config import Config
config = Config.from_file('configs/default.yaml')

Citation

If you use this framework in research, please cite:

@software{fsga2025,
  title={Feature Selection via Genetic Algorithm},
  author={Piotr Krzysztof Lis},
  year={2025},
  url={https://github.com/straightchlorine/feature-selection-via-genetic-algorithm}
}

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! See module READMEs for extension points:

  • New operators: fsga/operators/README.md
  • New selectors: fsga/selectors/README.md
  • New evaluators: fsga/evaluators/README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fsga-0.1.1.tar.gz (64.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fsga-0.1.1-py3-none-any.whl (55.2 kB view details)

Uploaded Python 3

File details

Details for the file fsga-0.1.1.tar.gz.

File metadata

  • Download URL: fsga-0.1.1.tar.gz
  • Upload date:
  • Size: 64.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsga-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1278122dd5be458ff23718ecffc4eb42fca6cef9875a0e18520313e9fb3076f4
MD5 0f0751bb605af93e322e5fb4a0a42803
BLAKE2b-256 d887142571aa76ec27d60cb1eb25575dd3b87da312f68bbe709e3c9501a933da

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsga-0.1.1.tar.gz:

Publisher: publish-pypi.yml on straightchlorine/fsga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fsga-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fsga-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 55.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsga-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59fdad7aa8fd87cdb08f5618b8fdc3a579d92c7b5f9d728485d0dfd57244ab44
MD5 12325384a241b3bc9702afa3f78e1bdf
BLAKE2b-256 a3cf17577caf4843273f7ce0d88dade79d23edd6ccfc0b8f196fe94d09a4c520

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsga-0.1.1-py3-none-any.whl:

Publisher: publish-pypi.yml on straightchlorine/fsga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page