Skip to main content

ML Fast Opt - Advanced ensemble optimization system for LightGBM hyperparameter tuning

Project description

MLFastOpt

PyPI version Python 3.8+ License: MIT

MLFastOpt is a high-speed ensemble optimization system for Bayesian hyperparameter tuning of LightGBM, XGBoost, and Random Forest models.

Features

  • 🚀 Fast Optimization: Advanced Bayesian optimization algorithms (Sobol + BoTorch).
  • 🧩 Multi-Model Support: Tune LightGBM, XGBoost, or Random Forest ensembles.
  • ⚙️ Simple Config: Hierarchical JSON configuration and YAML/Python search spaces.
  • 📊 Rich Analytics: Built-in web dashboards and visualization tools.

Prerequisites

  • Python 3.9+
  • macOS Users: You must install openmp for LightGBM/XGBoost to work:
    brew install libomp
    

Installation

  1. Activate Virtual Environment:

    source .venv/bin/activate
    # OR if you haven't created one yet:
    # python3.12 -m venv .venv && source .venv/bin/activate
    
  2. Install Package:

    pip install -e .[dev]
    

Quick Start (End Users)

If you installed the package via pip install mlfastopt, follow these steps:

  1. Create Configuration Files: You need a config.json and a hyperparameter space file (e.g., hyperparameters.yaml).

    config.json:

    {
      "data": { "path": "train.parquet", "label_column": "target", "features": "features.yaml" },
      "model": { "type": "xgboost", "hyperparameter_path": "config/hyperparameters/xgboost.yaml" },
      "training": { "metric": "f1", "total_trials": 20 },
      "output": { "dir": "outputs" }
    }
    
  2. Run Optimization:

    export OMP_NUM_THREADS=1
    mlfastopt-optimize --config config.json
    

Quick Start (Developers)

Prerequisite: Input data must be preprocessed and numerical. Handle all categorical encoding (e.g., one-hot, label encoding) before using MLFastOpt (except for LightGBM/XGBoost which have some categorical support).

1. Setup

Create the required directory structure:

mkdir -p config/hyperparameters data

2. Define Parameter Space

We recommend using YAML for parameter spaces. Create config/hyperparameters/my_space.yaml:

parameters:
  - name: learning_rate
    type: range
    bounds: [0.01, 0.3]
    value_type: float
    log_scale: true

  - name: max_depth
    type: range
    bounds: [3, 10]
    value_type: int

3. Configure

Create my_config.json using the nested structure:

{
  "data": {
    "path": "data/your_dataset.parquet",
    "label_column": "target",
    "features": ["feature1", "feature2"],
    "class_weight": { "0": 1, "1": 5 },
    "under_sample_majority_ratio": 1.0
  },
  "model": {
    "type": "lightgbm",
    "hyperparameter_path": "config/hyperparameters/my_space.yaml",
    "ensemble_size": 5
  },
  "training": {
    "total_trials": 20,
    "sobol_trials": 5,
    "metric": "soft_recall",
    "parallel": true,
    "n_jobs": -1
  },
  "output": {
    "dir": "outputs/runs"
  }
}

4. Run

Execute optimization (ensure single-threading for LightGBM/XGBoost to avoid deadlocks):

OMP_NUM_THREADS=1 python -m mlfastopt.cli --config my_config.json

Configuration Reference

Data Section (data)

Parameter Description Default
path Path to dataset (CSV/Parquet). Required
label_column Name of target column. Required
features List of features or path to YAML file. Required
class_weight Dictionary of class weights (e.g., {"0": 1, "1": 10}). None

Model Section (model)

Parameter Description Default
type Model type: lightgbm, xgboost, random_forest. lightgbm
hyperparameter_path Path to parameter space file. Required
ensemble_size Models per ensemble. 1

Training Section (training)

Parameter Description Default
total_trials Total optimization trials. 20
metric Metric to maximize (soft_recall, soft_f1_score, etc). soft_recall
parallel Enable parallel training of ensemble members. false

Outputs

Results are saved to outputs/:

  • runs/: Detailed logs and models for each run.
  • best_trials/: JSON configurations of the best performing trials.
  • visualizations/: Generated plots.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlfastopt-0.0.9b3.tar.gz (70.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlfastopt-0.0.9b3-py3-none-any.whl (73.4 kB view details)

Uploaded Python 3

File details

Details for the file mlfastopt-0.0.9b3.tar.gz.

File metadata

  • Download URL: mlfastopt-0.0.9b3.tar.gz
  • Upload date:
  • Size: 70.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mlfastopt-0.0.9b3.tar.gz
Algorithm Hash digest
SHA256 a028cfd4d91b1aacf000e5c3cba8a245128afaa9ccd8885b9d86895e8e9f36e6
MD5 174c76902bd2127b5c523c7703aa88bc
BLAKE2b-256 0ba163ae4867569def7b32597e194e61dc923f9d6f7b5d5989cd4d9a84333178

See more details on using hashes here.

File details

Details for the file mlfastopt-0.0.9b3-py3-none-any.whl.

File metadata

  • Download URL: mlfastopt-0.0.9b3-py3-none-any.whl
  • Upload date:
  • Size: 73.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for mlfastopt-0.0.9b3-py3-none-any.whl
Algorithm Hash digest
SHA256 fa08590c534582eb5a19bc58c0ed41910e1f7fc0001e9ac5b9f99a10a1868c03
MD5 cd97688f22de2e2b8c8dc5587b529bdf
BLAKE2b-256 981efe9509dffe2ee452b1b38d598e960935512f4da34a1d9c785fe65d9ea3df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page