Skip to main content

ML Fast Opt - Advanced ensemble optimization system for LightGBM hyperparameter tuning

Project description

๐Ÿš€ MLFastOpt

High-Speed Bayesian Hyperparameter Optimization for ML Ensembles

PyPI version Python versions License: MIT Downloads

Installation โ€ข Quick Start โ€ข Features โ€ข Documentation โ€ข Contributing


MLFastOpt is a production-ready framework for Bayesian hyperparameter optimization of LightGBM, XGBoost, and Random Forest ensemble models. It combines state-of-the-art Bayesian optimization algorithms with ensemble learning techniques.

โœจ Features

Feature Description
๐ŸŽฏ Bayesian Optimization Two-phase optimization: quasi-random exploration followed by Bayesian exploitation
๐Ÿงฉ Multi-Model Support LightGBM, XGBoost, and Random Forest with unified interface
๐Ÿ”„ Ensemble Learning Train N models per trial with different seeds, aggregate via soft/hard voting
โšก Parallel Training Optional parallel ensemble training with joblib
๏ฟฝ Model Serialization Trained model objects saved to disk automatically โ€” deploy the actual ensemble, not a retrained single model
๏ฟฝ๐Ÿ“Š Rich Visualizations Auto-generated optimization plots and feature importance charts
๐ŸŽ›๏ธ Flexible Configuration Hierarchical JSON configs with YAML/Python parameter spaces
๐Ÿ”ฌ SHAP Integration Built-in SHAP feature importance analysis
๐ŸŒ Web Dashboard Interactive Flask-based visualization tools

๐Ÿ“ฆ Installation

pip install mlfastopt

Prerequisites

  • Python: 3.12+
  • macOS Users: Install OpenMP for LightGBM/XGBoost support:
    brew install libomp
    

๐Ÿš€ Quick Start

1. Install the Package

pip install mlfastopt

2. Create Configuration Files

config.json - Main configuration:

{
  "data": {
    "path": "data/train.parquet",
    "label_column": "target",
    "features": ["feature1", "feature2", "feature3"],
    "class_weight": {"0": 1, "1": 5}
  },
  "model": {
    "type": "lightgbm",
    "hyperparameter_path": "config/hyperparameters.yaml",
    "ensemble_size": 10
  },
  "training": {
    "total_trials": 30,
    "sobol_trials": 10,
    "metric": "soft_recall",
    "parallel": true,
    "n_jobs": 4
  },
  "output": {
    "dir": "outputs/runs"
  }
}

config/hyperparameters.yaml - Parameter search space:

parameters:
  - name: learning_rate
    type: range
    bounds: [0.01, 0.3]
    value_type: float
    log_scale: true

  - name: max_depth
    type: range
    bounds: [3, 12]
    value_type: int

  - name: num_leaves
    type: range
    bounds: [20, 150]
    value_type: int

  - name: min_child_samples
    type: range
    bounds: [5, 100]
    value_type: int

3. Run Optimization

MLFastOpt offers two ways to run optimization:

Option A: Command Line (CLI)

# Set OMP_NUM_THREADS=1 to avoid LightGBM/XGBoost deadlocks
OMP_NUM_THREADS=1 mlfastopt-optimize --config config.json

Additional CLI options:

# Validate configuration without running
mlfastopt-optimize --config config.json --validate

# Override trials from command line
mlfastopt-optimize --config config.json --trials 50

# Start web dashboard
mlfastopt-web

# Analysis tools
mlfastopt-analyze

Option B: Python API

from mlfastopt import AEModelTuner

# Initialize with config file
tuner = AEModelTuner(config_path="config.json")

# Run optimization
results = tuner.run_complete_optimization()

# Access results programmatically
print(f"Best parameters: {results['best_parameters']}")
print(f"Output directory: {results['output_dir']}")
Method Best For
CLI Quick runs, shell scripts, cron jobs, CI/CD pipelines
Python API Jupyter notebooks, integration with larger applications, programmatic access to results

4. View Results

Results are saved to outputs/runs/<timestamp>/:

  • best_parameters.json โ€” Optimal hyperparameters + metrics (always written)
  • qualifying_trials_*.json โ€” All trials meeting the threshold, with per-trial params + metrics
  • models/manifest.json โ€” Index of every serialized model file
  • models/trial_NNNN_seed_SS.txt โ€” Trained model binaries (LightGBM native format; .pkl for other types)
  • optimization_progress.png โ€” Training curves
  • feature_importance.png โ€” Feature importance plots
  • README.md โ€” Run summary report

๐Ÿ“– How It Works

MLFastOpt uses a two-level nested optimization loop:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ OUTER LOOP: Trial Iteration (total_trials = 30)                โ”‚
โ”‚                                                                 โ”‚
โ”‚  Trial 1: {learning_rate: 0.05, max_depth: 7, ...}             โ”‚
โ”‚  โ”œโ”€โ”€ Train Model 1 (seed=42)                                   โ”‚
โ”‚  โ”œโ”€โ”€ Train Model 2 (seed=43)                                   โ”‚
โ”‚  โ”œโ”€โ”€ ...                                                        โ”‚
โ”‚  โ””โ”€โ”€ Train Model 10 (seed=51)                                  โ”‚
โ”‚  โ””โ”€โ”€ Ensemble Prediction โ†’ Calculate Metrics โ†’ Update Optimizerโ”‚
โ”‚                                                                 โ”‚
โ”‚  Trial 2: {learning_rate: 0.12, max_depth: 5, ...}             โ”‚
โ”‚  โ””โ”€โ”€ ... (same ensemble training)                               โ”‚
โ”‚                                                                 โ”‚
โ”‚  Phase 1: Quasi-random exploration (sobol_trials)              โ”‚
โ”‚  Phase 2: Bayesian optimization (remaining trials)             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key concepts:

  • Trial: One hyperparameter configuration tested
  • Ensemble: N models trained per trial (different random seeds)
  • Soft Voting: Average probabilities across ensemble members
  • Hard Voting: Average binary predictions across ensemble members

โš™๏ธ Configuration Reference

Data Section

Parameter Type Description Default
path string Path to dataset (CSV, Parquet, or URL) Required
label_column string Target column name Required
features list/string Feature names or path to YAML file Required
class_weight dict Class weights for imbalanced data None
test_size float Validation set proportion 0.2

Model Section

Parameter Type Description Default
type string lightgbm, xgboost, or random_forest lightgbm
hyperparameter_path string Path to parameter space file Required
ensemble_size int Models per ensemble 10

Training Section

Parameter Type Description Default
total_trials int Total optimization trials 30
sobol_trials int Initial exploration trials 10
metric string Optimization metric soft_recall
parallel bool Parallel ensemble training false
n_jobs int CPU cores for parallel training 4

Selection Section

Parameter Type Description Default
threshold_saving_enabled bool Save all trials meeting the metric threshold (and serialize their model files) true
metric string Metric used for threshold comparison soft_recall
threshold_value float Minimum metric value to qualify a trial for saving 0.85

Available Metrics

Metric Description
soft_recall Recall using probability averaging
soft_f1_score F1 score using soft voting
soft_precision Precision using soft voting
soft_roc_auc AUC-ROC score
neg_log_loss Negative log loss
hard_recall Recall using hard voting
hard_f1_score F1 using hard voting

๐Ÿ“Š Output Files

After optimization, find results in outputs/runs/<timestamp>/:

outputs/runs/20240205_143022/
โ”œโ”€โ”€ best_parameters.json        # Best trial's hyperparameters & metrics (always written)
โ”œโ”€โ”€ qualifying_trials_*.json    # All threshold-qualifying trials (threshold mode)
โ”œโ”€โ”€ config.json                 # Configuration used for this run
โ”œโ”€โ”€ optimization_progress.png   # Metric curves across all trials
โ”œโ”€โ”€ feature_importance.png      # Feature importance chart
โ”œโ”€โ”€ feature_importance.csv      # Numerical importance data
โ”œโ”€โ”€ README.md                   # Run summary report
โ””โ”€โ”€ models/
    โ”œโ”€โ”€ manifest.json           # Index: trial โ†’ seed โ†’ file path + metrics
    โ”œโ”€โ”€ trial_0003_seed_00.txt  # LightGBM native format (.ubj for XGBoost,
    โ”œโ”€โ”€ trial_0003_seed_01.txt  #   .pkl for RandomForest)
    โ””โ”€โ”€ ...                     # One file per sub-model in each qualifying trial

Loading Saved Models for Inference

import json
import lightgbm as lgb
import numpy as np

# Read the manifest
with open("outputs/runs/<timestamp>/models/manifest.json") as f:
    manifest = json.load(f)

# Load all sub-models for the first qualifying trial
trial = manifest["trials"][0]
models = [lgb.Booster(model_file=sub["file"]) for sub in trial["sub_models"]]

# Ensemble soft-vote prediction
probas = np.mean([m.predict(X_new) for m in models], axis=0)

Why save model files? Metrics reported during optimization reflect ensemble performance (N models averaged together). Deploying the saved ensemble directly guarantees you get the same performance at inference โ€” no re-training required.

๐Ÿ“ง Support

For questions, issues, or feature requests, please contact us at contact@genxai.cc.

๐Ÿ“„ License

This is proprietary software. See the LICENSE file for details.

๐Ÿข About

Developed by GenX AI Lab - Building intelligent AI solutions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlfastopt-0.0.10.tar.gz (75.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlfastopt-0.0.10-py3-none-any.whl (77.2 kB view details)

Uploaded Python 3

File details

Details for the file mlfastopt-0.0.10.tar.gz.

File metadata

  • Download URL: mlfastopt-0.0.10.tar.gz
  • Upload date:
  • Size: 75.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlfastopt-0.0.10.tar.gz
Algorithm Hash digest
SHA256 cd351ec01f81e5a15f253b21cfff4afa92e88537d38fa501fa2d64a6381b5bc1
MD5 a118adda6f730c543013ec47433b7245
BLAKE2b-256 db6f620e498fa97e79bec8e765c2960284943652e27ee0627a31c90bb97fa9ab

See more details on using hashes here.

File details

Details for the file mlfastopt-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: mlfastopt-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 77.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlfastopt-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 f16b46c0db0a2e3d67150a719e65b6f6aed1756fc2f9e5a532eb1613e95430c1
MD5 91a3c51ab7427ccbda090d4834560cb8
BLAKE2b-256 4da96a837cba34ea565b0c3cd1ab2c2a439e42fd489c3314d0c4ebc7ba5ef484

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page