Skip to main content

Feature selection using a genetic algorithm

Project description

Feature Selection via Genetic Algorithm (FSGA)

A university project implementing feature selection using Genetic Algorithms, with evaluation and visualization tools.

Quick Start

# Installation
git clone <repository-url>
cd feature-selection-via-genetic-algorithm
uv venv && source .venv/bin/activate
uv pip install -e .

# Run example
python experiments/run_comparison.py

Basic Usage

from fsga.core.genetic_algorithm import GeneticAlgorithm
from fsga.datasets.loader import load_dataset
from fsga.evaluators.accuracy_evaluator import AccuracyEvaluator
from fsga.ml.models import ModelWrapper

# Load data and setup
X_train, X_test, y_train, y_test, _ = load_dataset('iris', split=True)
model = ModelWrapper('rf', n_estimators=50, random_state=42)
evaluator = AccuracyEvaluator(X_train, y_train, X_test, y_test, model)

# Run GA
from fsga.selectors.tournament_selector import TournamentSelector
from fsga.operators.uniform_crossover import UniformCrossover
from fsga.mutations.bitflip_mutation import BitFlipMutation

ga = GeneticAlgorithm(
    num_features=X_train.shape[1],
    evaluator=evaluator,
    selector=TournamentSelector(evaluator, tournament_size=3),
    crossover_operator=UniformCrossover(),
    mutation_operator=BitFlipMutation(probability=0.01),
    population_size=50,
    num_generations=100,
    early_stopping_patience=10
)

results = ga.evolve()
print(f"Accuracy: {results['best_fitness']:.2%}")
print(f"Features: {results['best_chromosome'].sum()}/{X_train.shape[1]}")

Key Features

  • Modular Design: Swappable operators, selectors, and evaluators
  • Multiple Operators: 5 crossover types, 5 selection strategies, 3 fitness functions
  • Baseline Comparisons: Built-in RFE, LASSO, Mutual Information, Chi², ANOVA
  • Statistical Testing: Wilcoxon, Mann-Whitney, Cohen's d, Jaccard stability
  • Visualization: 9 plot functions for analysis and comparison
  • Experiment Framework: ExperimentRunner for reproducible experiments
  • Configuration: YAML-based configuration system

Architecture

fsga/
├── core/          # GA engine (genetic_algorithm, population)
├── operators/     # Crossover: uniform, single-point, two-point, multi-point
├── mutations/     # Mutation: bitflip
├── selectors/     # Selection: tournament, roulette, ranking, elitism
├── evaluators/    # Fitness: accuracy, F1, balanced accuracy
├── ml/            # Model wrappers (sklearn integration)
├── datasets/      # Dataset loaders (iris, wine, breast_cancer, digits)
├── analysis/      # Baselines + ExperimentRunner
├── visualization/ # 9 plot functions
└── utils/         # Config, metrics, serialization, logging

Documentation

  • Getting Started - Installation and basic usage
  • Tutorial - Step-by-step guide with examples
  • Architecture - System design and extension points
  • Project Plan - Status and roadmap
  • Module READMEs - See fsga/*/README.md for component details

Example Results

Breast Cancer Dataset (30 features → 12 features):

  • GA Accuracy: 98.3% with 40% of features
  • All Features: 95.7% with 100% of features
  • +2.6% accuracy, 60% dimensionality reduction

Iris Dataset (4 features → 2 features):

  • GA Accuracy: 98.3% with 50% of features
  • Selected: petal length, petal width

Wine Dataset (13 features → 6.5 features):

  • GA Accuracy: 100% with 50% of features

Running Experiments

# Full analysis (all datasets, all visualizations)
python experiments/run_experiment.py

# Quick test (single dataset, fewer runs)
python experiments/run_experiment.py --quick

# Specific datasets only
python experiments/run_experiment.py --datasets iris wine

# Without visualizations (faster)
python experiments/run_experiment.py --no-plots

# Results saved to: results/{mode}/{dataset}/

Tests

# Run all tests
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ --cov=fsga --cov-report=html

# Current: 280+ tests, 82% coverage

Configuration

Example config (configs/default.yaml):

population_size: 50
num_generations: 100
mutation_rate: 0.01
crossover_rate: 0.8
early_stopping_patience: 10

dataset:
  name: iris
  split_ratio: 0.7

Load with:

from fsga.utils.config import Config
config = Config.from_file('configs/default.yaml')

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! See module READMEs for extension points:

  • New operators: fsga/operators/README.md
  • New selectors: fsga/selectors/README.md
  • New evaluators: fsga/evaluators/README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fsga-0.1.5.tar.gz (64.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fsga-0.1.5-py3-none-any.whl (55.0 kB view details)

Uploaded Python 3

File details

Details for the file fsga-0.1.5.tar.gz.

File metadata

  • Download URL: fsga-0.1.5.tar.gz
  • Upload date:
  • Size: 64.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsga-0.1.5.tar.gz
Algorithm Hash digest
SHA256 6cd6315501506875f3f4193ccf64185f355ce1390d4f874de19939c6d7b3f412
MD5 def6220160ab068d54f19b5b59a809c5
BLAKE2b-256 e43392c0cb8fe934315c5638ef299150e8620179eff9d830406e04e0e447da9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsga-0.1.5.tar.gz:

Publisher: publish-pypi.yml on straightchlorine/fsga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fsga-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: fsga-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 55.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsga-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b38b736329f606ce1635dc8838e50feb57baac13cb7e447c28dd3861c2a76aad
MD5 9939edc0fd851666bcdb643e799af086
BLAKE2b-256 6c564b33993c57ae12118510cf2e5aed08571271f6db45363ed9f5cd300d2d08

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsga-0.1.5-py3-none-any.whl:

Publisher: publish-pypi.yml on straightchlorine/fsga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page