Skip to main content

ML Fast Opt - Advanced ensemble optimization system for LightGBM hyperparameter tuning

Project description

MLFastOpt

PyPI version Python 3.8+ License: MIT

MLFastOpt is a high-speed ensemble optimization system for Bayesian hyperparameter tuning of LightGBM, XGBoost, and Random Forest models.

Features

  • 🚀 Fast Optimization: Advanced Bayesian optimization algorithms (Sobol + BoTorch).
  • 🧩 Multi-Model Support: Tune LightGBM, XGBoost, or Random Forest ensembles.
  • ⚙️ Simple Config: Hierarchical JSON configuration and YAML/Python search spaces.
  • 📊 Rich Analytics: Built-in web dashboards and visualization tools.

Prerequisites

  • Python 3.9+
  • macOS Users: You must install openmp for LightGBM/XGBoost to work:
    brew install libomp
    

Installation

  1. Activate Virtual Environment:

    source .venv/bin/activate
    # OR if you haven't created one yet:
    # python3.12 -m venv .venv && source .venv/bin/activate
    
  2. Install Package:

    pip install -e .[dev]
    

Quick Start (End Users)

If you installed the package via pip install mlfastopt, follow these steps:

  1. Create Configuration Files: You need a config.json and a hyperparameter space file (e.g., hyperparameters.yaml).

    config.json:

    {
      "data": { "path": "train.parquet", "label_column": "target", "features": "features.yaml" },
      "model": { "type": "xgboost", "hyperparameter_path": "config/hyperparameters/xgboost.yaml" },
      "training": { "metric": "f1", "total_trials": 20 },
      "output": { "dir": "outputs" }
    }
    
  2. Run Optimization:

    export OMP_NUM_THREADS=1
    mlfastopt-optimize --config config.json
    

Quick Start (Developers)

Prerequisite: Input data must be preprocessed and numerical. Handle all categorical encoding (e.g., one-hot, label encoding) before using MLFastOpt (except for LightGBM/XGBoost which have some categorical support).

1. Setup

Create the required directory structure:

mkdir -p config/hyperparameters data

2. Define Parameter Space

We recommend using YAML for parameter spaces. Create config/hyperparameters/my_space.yaml:

parameters:
  - name: learning_rate
    type: range
    bounds: [0.01, 0.3]
    value_type: float
    log_scale: true

  - name: max_depth
    type: range
    bounds: [3, 10]
    value_type: int

3. Configure

Create my_config.json using the nested structure:

{
  "data": {
    "path": "data/your_dataset.parquet",
    "label_column": "target",
    "features": ["feature1", "feature2"],
    "class_weight": { "0": 1, "1": 5 },
    "under_sample_majority_ratio": 1.0
  },
  "model": {
    "type": "lightgbm",
    "hyperparameter_path": "config/hyperparameters/my_space.yaml",
    "ensemble_size": 5
  },
  "training": {
    "total_trials": 20,
    "sobol_trials": 5,
    "metric": "soft_recall",
    "parallel": true,
    "n_jobs": -1
  },
  "output": {
    "dir": "outputs/runs"
  }
}

4. Run

Execute optimization (ensure single-threading for LightGBM/XGBoost to avoid deadlocks):

OMP_NUM_THREADS=1 python -m mlfastopt.cli --config my_config.json

Configuration Reference

Data Section (data)

Parameter Description Default
path Path to dataset (CSV/Parquet). Required
label_column Name of target column. Required
features List of features or path to YAML file. Required
class_weight Dictionary of class weights (e.g., {"0": 1, "1": 10}). None

Model Section (model)

Parameter Description Default
type Model type: lightgbm, xgboost, random_forest. lightgbm
hyperparameter_path Path to parameter space file. Required
ensemble_size Models per ensemble. 1

Training Section (training)

Parameter Description Default
total_trials Total optimization trials. 20
metric Metric to maximize (soft_recall, soft_f1_score, etc). soft_recall
parallel Enable parallel training of ensemble members. false

Outputs

Results are saved to outputs/:

  • runs/: Detailed logs and models for each run.
  • best_trials/: JSON configurations of the best performing trials.
  • visualizations/: Generated plots.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlfastopt-0.0.9b4.tar.gz (70.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlfastopt-0.0.9b4-py3-none-any.whl (73.5 kB view details)

Uploaded Python 3

File details

Details for the file mlfastopt-0.0.9b4.tar.gz.

File metadata

  • Download URL: mlfastopt-0.0.9b4.tar.gz
  • Upload date:
  • Size: 70.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mlfastopt-0.0.9b4.tar.gz
Algorithm Hash digest
SHA256 b8e9c9eb65dfd2eef7354598441e6d6fcca81196c957a0618f40481924311792
MD5 bc5d6d2bf4021aa0622d68a189ac5492
BLAKE2b-256 09294601e08f695276f6de6c00c15e86abc9503e9151ebf65659b7fc6476a37d

See more details on using hashes here.

File details

Details for the file mlfastopt-0.0.9b4-py3-none-any.whl.

File metadata

  • Download URL: mlfastopt-0.0.9b4-py3-none-any.whl
  • Upload date:
  • Size: 73.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mlfastopt-0.0.9b4-py3-none-any.whl
Algorithm Hash digest
SHA256 0c03d4e11e6b565006d3d2d05af377875f5457007e24f3986a8c47043f306143
MD5 ce6be047e73f203d70a330a310edc578
BLAKE2b-256 8d37fd5c2446c2d9c0da70bec0d2a714b033354bc59cc0696399ff5b9b19a663

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page