Skip to main content

MOOSE-FS: Multi-Objective Optimized Ensemble Feature Selection

Project description

MOOSE-FS

tests Documentation

Overview

MOOSE-FS is a feature selection library that leverages an ensemble-based approach to optimize both predictive performance and stability. By combining multiple feature selection methods, merging strategies, and evaluation metrics, it provides a highly flexible and tunable pipeline for both classification and regression tasks. The package automates feature selection across multiple iterations and uses Pareto optimization to identify the best feature subsets.

Users can define their feature selection process by:

  • Selecting feature selection methods from predefined options or implementing custom ones.
  • Choosing merging strategies to aggregate feature rankings.
  • Specifying performance metrics to evaluate selected features.
  • Configuring the number of features to select and the number of repetitions.
  • Working with either classification or regression problems.

The library allows defining feature selectors, merging strategies, and metrics either as class instances or as string identifiers, which act as placeholders for built-in methods. The framework is modular and can be easily extended by adding new selection algorithms or merging strategies.


Requirements

  • Python 3.9 or higher
  • Dependencies: Automatically installed from pyproject.toml.

Installation

From PyPI (Recommended)

pip install moose-fs

From Source

To install the package from source, run:

pip install git+https://github.com/CI4CB-lab/moosefs.git

Alternatively, clone the repository and install locally:

git clone https://github.com/CI4CB-lab/moosefs.git
cd moosefs
pip install .

Using the Library

1. Feature Selection Pipeline

The core of MOOSE-FS is the FeatureSelectionPipeline, which provides a fully configurable workflow for feature selection. Users can specify:

  • Feature selection methods
  • Merging strategy
  • Evaluation metrics
  • Task type (classification or regression)
  • Number of features to select
  • Number of repetitions

Example Usage

# `data` can be a single DataFrame (last column = target)
# or you can pass `X` and `y` separately.
# Assume `data` is a pandas DataFrame whose last column "label" holds the targets.
from moosefs import FeatureSelectionPipeline

fs_methods = ["f_statistic_selector", "random_forest_selector", "svm_selector"]
merging_strategy = "union_of_intersections_merger"

pipeline = FeatureSelectionPipeline(
    X=data.drop(columns=["label"]),
    y=data["label"],
    fs_methods=fs_methods,
    merging_strategy=merging_strategy,
    num_repeats=5,
    task="classification",
    num_features_to_select=10,
    stability_mode="fold_stability",  # Options: "selector_agreement", "fold_stability", "all"
)

# Run the pipeline
selected_features, best_ensemble = pipeline.run()

This will run feature selection using K-fold cross-validation, merge results using the chosen strategy, and return the best-selected features after refitting on the full dataset.

Stability Modes

The stability_mode parameter controls which stability metrics are included in the Pareto optimization:

  • "selector_agreement": Measures agreement between selectors within each ensemble
  • "fold_stability": Measures consistency of selected features across CV folds (default)
  • "all": Includes both stability metrics in the optimization

2. Extensibility

MOOSE-FS is designed to be easily extended. Users can implement custom:

  • Feature selection methods: Define a new feature selector class and integrate it into the pipeline.
  • Merging strategies: Implement a custom strategy to aggregate selected features.
  • Metrics: Add new evaluation metrics tailored to specific tasks.

New methods can be used directly in the pipeline by passing the class or a corresponding identifier.


Code Structure

  • core/: Core modules for data processing, metrics, and stability computation.
  • feature_selection_pipeline.py: Defines the main feature selection workflow.
  • feature_selectors/: Implements feature selection methods (e.g., F-statistic, mutual information, RandomForest, SVM).
  • merging_strategies/: Implements merging strategies such as Borda count and union of intersections.

Contributing

Contributions are welcome! If you have ideas for improving MOOSE-FS, feel free to open an issue or submit a pull request.

Development (uv)

This project uses uv for local environments and dependency management. The library builds via the existing PEP 517 backend (hatchling); uv only manages the environment, installs, and command execution.

  • Install/select Python 3.9+ and ensure uv is installed.
  • Create a local virtual environment in .venv:
uv venv --python 3.9
  • Install dev dependencies (editable):
uv pip install -e ".[dev]"
  • Install pre-commit hooks:
uv run pre-commit install
  • Run formatting and linting:
uv run ruff format .
uv run ruff check --fix .
  • Run tests:
uv run pytest -q

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moose_fs-0.2.0.tar.gz (48.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moose_fs-0.2.0-py3-none-any.whl (45.2 kB view details)

Uploaded Python 3

File details

Details for the file moose_fs-0.2.0.tar.gz.

File metadata

  • Download URL: moose_fs-0.2.0.tar.gz
  • Upload date:
  • Size: 48.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for moose_fs-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ae93e2b4df46f48f2e17ff9d01c956262ba80e46ba5fe7f89e39db020d02e1f8
MD5 f2d7a8254e550e1bf8ad3cf46264d0dc
BLAKE2b-256 58df7707eae4d4ddcb38bd45ff3c6e091eedd1f1fe165a8de74a61f925cd7057

See more details on using hashes here.

File details

Details for the file moose_fs-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: moose_fs-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 45.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for moose_fs-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0646ec719bbb346610542fd3ecc2f2ab2fbbc197ae848016fdeff60514031f9b
MD5 9b9854c43e89c93967c9d7dcee205f28
BLAKE2b-256 9f2b6f7beca6f86015fdee3c06b730e77a4ef2d957a3f5b239e4b1c7a3828e3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page