Skip to main content

Feature selection using a genetic algorithm

Project description

Feature Selection via Genetic Algorithm (FSGA)

PyPI version Total Downloads PyPI - Downloads

A university project implementing feature selection using Genetic Algorithms, with evaluation and visualization tools.

Quick Start

# Installation
git clone <repository-url>
cd feature-selection-via-genetic-algorithm
uv venv && source .venv/bin/activate
uv pip install -e .

# Run example
python experiments/run_comparison.py

Basic Usage

from fsga.core.genetic_algorithm import GeneticAlgorithm
from fsga.datasets.loader import load_dataset
from fsga.evaluators.accuracy_evaluator import AccuracyEvaluator
from fsga.ml.models import ModelWrapper

# Load data and setup
X_train, X_test, y_train, y_test, _ = load_dataset('iris', split=True)
model = ModelWrapper('rf', n_estimators=50, random_state=42)
evaluator = AccuracyEvaluator(X_train, y_train, X_test, y_test, model)

# Run GA
from fsga.selectors.tournament_selector import TournamentSelector
from fsga.operators.uniform_crossover import UniformCrossover
from fsga.mutations.bitflip_mutation import BitFlipMutation

ga = GeneticAlgorithm(
    num_features=X_train.shape[1],
    evaluator=evaluator,
    selector=TournamentSelector(evaluator, tournament_size=3),
    crossover_operator=UniformCrossover(),
    mutation_operator=BitFlipMutation(probability=0.01),
    population_size=50,
    num_generations=100,
    early_stopping_patience=10
)

results = ga.evolve()
print(f"Accuracy: {results['best_fitness']:.2%}")
print(f"Features: {results['best_chromosome'].sum()}/{X_train.shape[1]}")

Key Features

  • Modular Design: Swappable operators, selectors, and evaluators
  • Multiple Operators: 5 crossover types, 5 selection strategies, 3 fitness functions
  • Baseline Comparisons: Built-in RFE, LASSO, Mutual Information, Chi², ANOVA
  • Statistical Testing: Wilcoxon, Mann-Whitney, Cohen's d, Jaccard stability
  • Visualization: 9 plot functions for analysis and comparison
  • Experiment Framework: ExperimentRunner for reproducible experiments
  • Configuration: YAML-based configuration system

Architecture

fsga/
├── core/          # GA engine (genetic_algorithm, population)
├── operators/     # Crossover: uniform, single-point, two-point, multi-point
├── mutations/     # Mutation: bitflip
├── selectors/     # Selection: tournament, roulette, ranking, elitism
├── evaluators/    # Fitness: accuracy, F1, balanced accuracy
├── ml/            # Model wrappers (sklearn integration)
├── datasets/      # Dataset loaders (iris, wine, breast_cancer, digits)
├── analysis/      # Baselines + ExperimentRunner
├── visualization/ # 9 plot functions
└── utils/         # Config, metrics, serialization, logging

Documentation

  • Getting Started - Installation and basic usage
  • Tutorial - Step-by-step guide with examples
  • Architecture - System design and extension points
  • Project Plan - Status and roadmap
  • Module READMEs - See fsga/*/README.md for component details

Example Results

Breast Cancer Dataset (30 features → 12 features):

  • GA Accuracy: 98.3% with 40% of features
  • All Features: 95.7% with 100% of features
  • +2.6% accuracy, 60% dimensionality reduction

Iris Dataset (4 features → 2 features):

  • GA Accuracy: 98.3% with 50% of features
  • Selected: petal length, petal width

Wine Dataset (13 features → 6.5 features):

  • GA Accuracy: 100% with 50% of features

Running Experiments

# Full analysis (all datasets, all visualizations)
python experiments/run_experiment.py

# Quick test (single dataset, fewer runs)
python experiments/run_experiment.py --quick

# Specific datasets only
python experiments/run_experiment.py --datasets iris wine

# Without visualizations (faster)
python experiments/run_experiment.py --no-plots

# Results saved to: results/{mode}/{dataset}/

Tests

# Run all tests
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ --cov=fsga --cov-report=html

# Current: 280+ tests, 82% coverage

Configuration

Example config (configs/default.yaml):

population_size: 50
num_generations: 100
mutation_rate: 0.01
crossover_rate: 0.8
early_stopping_patience: 10

dataset:
  name: iris
  split_ratio: 0.7

Load with:

from fsga.utils.config import Config
config = Config.from_file('configs/default.yaml')

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! See module READMEs for extension points:

  • New operators: fsga/operators/README.md
  • New selectors: fsga/selectors/README.md
  • New evaluators: fsga/evaluators/README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fsga-1.1.8.tar.gz (64.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fsga-1.1.8-py3-none-any.whl (55.1 kB view details)

Uploaded Python 3

File details

Details for the file fsga-1.1.8.tar.gz.

File metadata

  • Download URL: fsga-1.1.8.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsga-1.1.8.tar.gz
Algorithm Hash digest
SHA256 fd9f8815b1da9fb422a766492617bc0211a2d4d104d96288df0cf65594880616
MD5 f19a606743a1c2111f24bc13ab66ad03
BLAKE2b-256 5626382709a16a5ca7ebafada91c8174bf9201d92c9ac74345fef28d136df1bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsga-1.1.8.tar.gz:

Publisher: publish-pypi.yml on straightchlorine/fsga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fsga-1.1.8-py3-none-any.whl.

File metadata

  • Download URL: fsga-1.1.8-py3-none-any.whl
  • Upload date:
  • Size: 55.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fsga-1.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 fcf05eb7324c4255876fce85ae3388a3da0dd6a479aaf01c64f39d9c478940b6
MD5 455e99108ec6f5a416f2819f4e6af604
BLAKE2b-256 09087db1c6574a11b5aeb354572fb89f61c6202b514089822b9abdc1dc893565

See more details on using hashes here.

Provenance

The following attestation bundles were made for fsga-1.1.8-py3-none-any.whl:

Publisher: publish-pypi.yml on straightchlorine/fsga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page