ML Fast Opt - Advanced ensemble optimization system for LightGBM hyperparameter tuning
Project description
๐ MLFastOpt
High-Speed Bayesian Hyperparameter Optimization for ML Ensembles
Installation โข Quick Start โข Features โข Documentation โข Contributing
MLFastOpt is a production-ready framework for Bayesian hyperparameter optimization of LightGBM, XGBoost, and Random Forest ensemble models. It combines state-of-the-art Bayesian optimization algorithms with ensemble learning techniques.
โจ Features
| Feature | Description |
|---|---|
| ๐ฏ Bayesian Optimization | Two-phase optimization: quasi-random exploration followed by Bayesian exploitation |
| ๐งฉ Multi-Model Support | LightGBM, XGBoost, and Random Forest with unified interface |
| ๐ Ensemble Learning | Train N models per trial with different seeds, aggregate via soft/hard voting |
| โก Parallel Training | Optional parallel ensemble training with joblib |
| ๏ฟฝ Model Serialization | Trained model objects saved to disk automatically โ deploy the actual ensemble, not a retrained single model |
| ๏ฟฝ๐ Rich Visualizations | Auto-generated optimization plots and feature importance charts |
| ๐๏ธ Flexible Configuration | Hierarchical JSON configs with YAML/Python parameter spaces |
| ๐ฌ SHAP Integration | Built-in SHAP feature importance analysis |
| ๐ Web Dashboard | Interactive Flask-based visualization tools |
๐ฆ Installation
pip install mlfastopt
Prerequisites
- Python: 3.12+
- macOS Users: Install OpenMP for LightGBM/XGBoost support:
brew install libomp
๐ Quick Start
1. Install the Package
pip install mlfastopt
2. Create Configuration Files
config.json - Main configuration:
{
"data": {
"path": "data/train.parquet",
"label_column": "target",
"features": ["feature1", "feature2", "feature3"],
"class_weight": {"0": 1, "1": 5}
},
"model": {
"type": "lightgbm",
"hyperparameter_path": "config/hyperparameters.yaml",
"ensemble_size": 10
},
"training": {
"total_trials": 30,
"sobol_trials": 10,
"metric": "soft_recall",
"parallel": true,
"n_jobs": 4
},
"output": {
"dir": "outputs/runs"
}
}
config/hyperparameters.yaml - Parameter search space:
parameters:
- name: learning_rate
type: range
bounds: [0.01, 0.3]
value_type: float
log_scale: true
- name: max_depth
type: range
bounds: [3, 12]
value_type: int
- name: num_leaves
type: range
bounds: [20, 150]
value_type: int
- name: min_child_samples
type: range
bounds: [5, 100]
value_type: int
3. Run Optimization
MLFastOpt offers two ways to run optimization:
Option A: Command Line (CLI)
# Set OMP_NUM_THREADS=1 to avoid LightGBM/XGBoost deadlocks
OMP_NUM_THREADS=1 mlfastopt-optimize --config config.json
Additional CLI options:
# Validate configuration without running
mlfastopt-optimize --config config.json --validate
# Override trials from command line
mlfastopt-optimize --config config.json --trials 50
# Start web dashboard
mlfastopt-web
# Analysis tools
mlfastopt-analyze
Option B: Python API
from mlfastopt import AEModelTuner
# Initialize with config file
tuner = AEModelTuner(config_path="config.json")
# Run optimization
results = tuner.run_complete_optimization()
# Access results programmatically
print(f"Best parameters: {results['best_parameters']}")
print(f"Output directory: {results['output_dir']}")
| Method | Best For |
|---|---|
| CLI | Quick runs, shell scripts, cron jobs, CI/CD pipelines |
| Python API | Jupyter notebooks, integration with larger applications, programmatic access to results |
4. View Results
Results are saved to outputs/runs/<timestamp>/:
best_parameters.jsonโ Optimal hyperparameters + metrics (always written)qualifying_trials_*.jsonโ All trials meeting the threshold, with per-trial params + metricsmodels/manifest.jsonโ Index of every serialized model filemodels/trial_NNNN_seed_SS.txtโ Trained model binaries (LightGBM native format;.pklfor other types)optimization_progress.pngโ Training curvesfeature_importance.pngโ Feature importance plotsREADME.mdโ Run summary report
๐ How It Works
MLFastOpt uses a two-level nested optimization loop:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OUTER LOOP: Trial Iteration (total_trials = 30) โ
โ โ
โ Trial 1: {learning_rate: 0.05, max_depth: 7, ...} โ
โ โโโ Train Model 1 (seed=42) โ
โ โโโ Train Model 2 (seed=43) โ
โ โโโ ... โ
โ โโโ Train Model 10 (seed=51) โ
โ โโโ Ensemble Prediction โ Calculate Metrics โ Update Optimizerโ
โ โ
โ Trial 2: {learning_rate: 0.12, max_depth: 5, ...} โ
โ โโโ ... (same ensemble training) โ
โ โ
โ Phase 1: Quasi-random exploration (sobol_trials) โ
โ Phase 2: Bayesian optimization (remaining trials) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key concepts:
- Trial: One hyperparameter configuration tested
- Ensemble: N models trained per trial (different random seeds)
- Soft Voting: Average probabilities across ensemble members
- Hard Voting: Average binary predictions across ensemble members
โ๏ธ Configuration Reference
Data Section
| Parameter | Type | Description | Default |
|---|---|---|---|
path |
string |
Path to dataset (CSV, Parquet, or URL) | Required |
label_column |
string |
Target column name | Required |
features |
list/string |
Feature names or path to YAML file | Required |
class_weight |
dict |
Class weights for imbalanced data | None |
test_size |
float |
Validation set proportion | 0.2 |
Model Section
| Parameter | Type | Description | Default |
|---|---|---|---|
type |
string |
lightgbm, xgboost, or random_forest |
lightgbm |
hyperparameter_path |
string |
Path to parameter space file | Required |
ensemble_size |
int |
Models per ensemble | 10 |
Training Section
| Parameter | Type | Description | Default |
|---|---|---|---|
total_trials |
int |
Total optimization trials | 30 |
sobol_trials |
int |
Initial exploration trials | 10 |
metric |
string |
Optimization metric | soft_recall |
parallel |
bool |
Parallel ensemble training | false |
n_jobs |
int |
CPU cores for parallel training | 4 |
Selection Section
| Parameter | Type | Description | Default |
|---|---|---|---|
threshold_saving_enabled |
bool |
Save all trials meeting the metric threshold (and serialize their model files) | true |
metric |
string |
Metric used for threshold comparison | soft_recall |
threshold_value |
float |
Minimum metric value to qualify a trial for saving | 0.85 |
Available Metrics
| Metric | Description |
|---|---|
soft_recall |
Recall using probability averaging |
soft_f1_score |
F1 score using soft voting |
soft_precision |
Precision using soft voting |
soft_roc_auc |
AUC-ROC score |
neg_log_loss |
Negative log loss |
hard_recall |
Recall using hard voting |
hard_f1_score |
F1 using hard voting |
๐ Output Files
After optimization, find results in outputs/runs/<timestamp>/:
outputs/runs/20240205_143022/
โโโ best_parameters.json # Best trial's hyperparameters & metrics (always written)
โโโ qualifying_trials_*.json # All threshold-qualifying trials (threshold mode)
โโโ config.json # Configuration used for this run
โโโ optimization_progress.png # Metric curves across all trials
โโโ feature_importance.png # Feature importance chart
โโโ feature_importance.csv # Numerical importance data
โโโ README.md # Run summary report
โโโ models/
โโโ manifest.json # Index: trial โ seed โ file path + metrics
โโโ trial_0003_seed_00.txt # LightGBM native format (.ubj for XGBoost,
โโโ trial_0003_seed_01.txt # .pkl for RandomForest)
โโโ ... # One file per sub-model in each qualifying trial
Loading Saved Models for Inference
import json
import lightgbm as lgb
import numpy as np
# Read the manifest
with open("outputs/runs/<timestamp>/models/manifest.json") as f:
manifest = json.load(f)
# Load all sub-models for the first qualifying trial
trial = manifest["trials"][0]
models = [lgb.Booster(model_file=sub["file"]) for sub in trial["sub_models"]]
# Ensemble soft-vote prediction
probas = np.mean([m.predict(X_new) for m in models], axis=0)
Why save model files? Metrics reported during optimization reflect ensemble performance (N models averaged together). Deploying the saved ensemble directly guarantees you get the same performance at inference โ no re-training required.
๐ง Support
For questions, issues, or feature requests, please contact us at contact@genxai.cc.
๐ License
This is proprietary software. See the LICENSE file for details.
๐ข About
Developed by GenX AI Lab - Building intelligent AI solutions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlfastopt-0.0.10.tar.gz.
File metadata
- Download URL: mlfastopt-0.0.10.tar.gz
- Upload date:
- Size: 75.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd351ec01f81e5a15f253b21cfff4afa92e88537d38fa501fa2d64a6381b5bc1
|
|
| MD5 |
a118adda6f730c543013ec47433b7245
|
|
| BLAKE2b-256 |
db6f620e498fa97e79bec8e765c2960284943652e27ee0627a31c90bb97fa9ab
|
File details
Details for the file mlfastopt-0.0.10-py3-none-any.whl.
File metadata
- Download URL: mlfastopt-0.0.10-py3-none-any.whl
- Upload date:
- Size: 77.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f16b46c0db0a2e3d67150a719e65b6f6aed1756fc2f9e5a532eb1613e95430c1
|
|
| MD5 |
91a3c51ab7427ccbda090d4834560cb8
|
|
| BLAKE2b-256 |
4da96a837cba34ea565b0c3cd1ab2c2a439e42fd489c3314d0c4ebc7ba5ef484
|