ML Fast Opt - Advanced ensemble optimization system for LightGBM hyperparameter tuning

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- Other/Proprietary License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

🚀 MLFastOpt

High-Speed Bayesian Hyperparameter Optimization for ML Ensembles

Installation • Quick Start • Features • Documentation • Contributing

MLFastOpt is a production-ready framework for Bayesian hyperparameter optimization of LightGBM, XGBoost, and Random Forest ensemble models. It combines state-of-the-art Bayesian optimization algorithms with ensemble learning techniques.

✨ Features

Feature	Description
🎯 Bayesian Optimization	Two-phase optimization: quasi-random exploration followed by Bayesian exploitation
🧩 Multi-Model Support	LightGBM, XGBoost, and Random Forest with unified interface
🔄 Ensemble Learning	Train N models per trial with different seeds, aggregate via soft/hard voting
⚡ Parallel Training	Optional parallel ensemble training with joblib
� Model Serialization	Trained model objects saved to disk automatically — deploy the actual ensemble, not a retrained single model
�📊 Rich Visualizations	Auto-generated optimization plots and feature importance charts
🎛️ Flexible Configuration	Hierarchical JSON configs with YAML/Python parameter spaces
🔬 SHAP Integration	Built-in SHAP feature importance analysis
🌐 Web Dashboard	Interactive Flask-based visualization tools

📦 Installation

pip install mlfastopt

Prerequisites

Python: 3.12+
macOS Users: Install OpenMP for LightGBM/XGBoost support:
```
brew install libomp
```

🚀 Quick Start

1. Install the Package

pip install mlfastopt

2. Create Configuration Files

config.json - Main configuration:

{
  "data": {
    "path": "data/train.parquet",
    "label_column": "target",
    "features": ["feature1", "feature2", "feature3"],
    "class_weight": {"0": 1, "1": 5}
  },
  "model": {
    "type": "lightgbm",
    "hyperparameter_path": "config/hyperparameters.yaml",
    "ensemble_size": 10
  },
  "training": {
    "total_trials": 30,
    "sobol_trials": 10,
    "metric": "soft_recall",
    "parallel": true,
    "n_jobs": 4
  },
  "output": {
    "dir": "outputs/runs"
  }
}

config/hyperparameters.yaml - Parameter search space:

parameters:
  - name: learning_rate
    type: range
    bounds: [0.01, 0.3]
    value_type: float
    log_scale: true

  - name: max_depth
    type: range
    bounds: [3, 12]
    value_type: int

  - name: num_leaves
    type: range
    bounds: [20, 150]
    value_type: int

  - name: min_child_samples
    type: range
    bounds: [5, 100]
    value_type: int

3. Run Optimization

MLFastOpt offers two ways to run optimization:

Option A: Command Line (CLI)

# Set OMP_NUM_THREADS=1 to avoid LightGBM/XGBoost deadlocks
OMP_NUM_THREADS=1 mlfastopt-optimize --config config.json

Additional CLI options:

# Validate configuration without running
mlfastopt-optimize --config config.json --validate

# Override trials from command line
mlfastopt-optimize --config config.json --trials 50

# Start web dashboard
mlfastopt-web

# Analysis tools
mlfastopt-analyze

Option B: Python API

from mlfastopt import AEModelTuner

# Initialize with config file
tuner = AEModelTuner(config_path="config.json")

# Run optimization
results = tuner.run_complete_optimization()

# Access results programmatically
print(f"Best parameters: {results['best_parameters']}")
print(f"Output directory: {results['output_dir']}")

Method	Best For
CLI	Quick runs, shell scripts, cron jobs, CI/CD pipelines
Python API	Jupyter notebooks, integration with larger applications, programmatic access to results

4. View Results

Results are saved to outputs/runs/<timestamp>/:

best_parameters.json — Optimal hyperparameters + metrics (always written)
qualifying_trials_*.json — All trials meeting the threshold, with per-trial params + metrics
models/manifest.json — Index of every serialized model file
models/trial_NNNN_seed_SS.txt — Trained model binaries (LightGBM native format; .pkl for other types)
optimization_progress.png — Training curves
feature_importance.png — Feature importance plots
README.md — Run summary report

📖 How It Works

MLFastOpt uses a two-level nested optimization loop:

┌─────────────────────────────────────────────────────────────────┐
│ OUTER LOOP: Trial Iteration (total_trials = 30)                │
│                                                                 │
│  Trial 1: {learning_rate: 0.05, max_depth: 7, ...}             │
│  ├── Train Model 1 (seed=42)                                   │
│  ├── Train Model 2 (seed=43)                                   │
│  ├── ...                                                        │
│  └── Train Model 10 (seed=51)                                  │
│  └── Ensemble Prediction → Calculate Metrics → Update Optimizer│
│                                                                 │
│  Trial 2: {learning_rate: 0.12, max_depth: 5, ...}             │
│  └── ... (same ensemble training)                               │
│                                                                 │
│  Phase 1: Quasi-random exploration (sobol_trials)              │
│  Phase 2: Bayesian optimization (remaining trials)             │
└─────────────────────────────────────────────────────────────────┘

Key concepts:

Trial: One hyperparameter configuration tested
Ensemble: N models trained per trial (different random seeds)
Soft Voting: Average probabilities across ensemble members
Hard Voting: Average binary predictions across ensemble members

⚙️ Configuration Reference

Data Section

Parameter	Type	Description	Default
`path`	`string`	Path to dataset (CSV, Parquet, or URL)	Required
`label_column`	`string`	Target column name	Required
`features`	`list/string`	Feature names or path to YAML file	Required
`class_weight`	`dict`	Class weights for imbalanced data	`None`
`test_size`	`float`	Validation set proportion	`0.2`

Model Section

Parameter	Type	Description	Default
`type`	`string`	`lightgbm`, `xgboost`, or `random_forest`	`lightgbm`
`hyperparameter_path`	`string`	Path to parameter space file	Required
`ensemble_size`	`int`	Models per ensemble	`10`

Training Section

Parameter	Type	Description	Default
`total_trials`	`int`	Total optimization trials	`30`
`sobol_trials`	`int`	Initial exploration trials	`10`
`metric`	`string`	Optimization metric	`soft_recall`
`parallel`	`bool`	Parallel ensemble training	`false`
`n_jobs`	`int`	CPU cores for parallel training	`4`

Selection Section

Parameter	Type	Description	Default
`threshold_saving_enabled`	`bool`	Save all trials meeting the metric threshold (and serialize their model files)	`true`
`metric`	`string`	Metric used for threshold comparison	`soft_recall`
`threshold_value`	`float`	Minimum metric value to qualify a trial for saving	`0.85`

Available Metrics

Metric	Description
`soft_recall`	Recall using probability averaging
`soft_f1_score`	F1 score using soft voting
`soft_precision`	Precision using soft voting
`soft_roc_auc`	AUC-ROC score
`neg_log_loss`	Negative log loss
`hard_recall`	Recall using hard voting
`hard_f1_score`	F1 using hard voting

📊 Output Files

After optimization, find results in outputs/runs/<timestamp>/:

outputs/runs/20240205_143022/
├── best_parameters.json        # Best trial's hyperparameters & metrics (always written)
├── qualifying_trials_*.json    # All threshold-qualifying trials (threshold mode)
├── config.json                 # Configuration used for this run
├── optimization_progress.png   # Metric curves across all trials
├── feature_importance.png      # Feature importance chart
├── feature_importance.csv      # Numerical importance data
├── README.md                   # Run summary report
└── models/
    ├── manifest.json           # Index: trial → seed → file path + metrics
    ├── trial_0003_seed_00.txt  # LightGBM native format (.ubj for XGBoost,
    ├── trial_0003_seed_01.txt  #   .pkl for RandomForest)
    └── ...                     # One file per sub-model in each qualifying trial

Loading Saved Models for Inference

import json
import lightgbm as lgb
import numpy as np

# Read the manifest
with open("outputs/runs/<timestamp>/models/manifest.json") as f:
    manifest = json.load(f)

# Load all sub-models for the first qualifying trial
trial = manifest["trials"][0]
models = [lgb.Booster(model_file=sub["file"]) for sub in trial["sub_models"]]

# Ensemble soft-vote prediction
probas = np.mean([m.predict(X_new) for m in models], axis=0)

Why save model files? Metrics reported during optimization reflect ensemble performance (N models averaged together). Deploying the saved ensemble directly guarantees you get the same performance at inference — no re-training required.

📧 Support

For questions, issues, or feature requests, please contact us at contact@genxai.cc.

📄 License

This is proprietary software. See the LICENSE file for details.

🏢 About

Developed by GenX AI Lab - Building intelligent AI solutions.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- Other/Proprietary License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.0.10.2b1 pre-release

May 5, 2026

0.0.10.2b0 pre-release

May 5, 2026

0.0.10.2a0 pre-release

May 4, 2026

0.0.10.1

May 2, 2026

0.0.10.1b0 pre-release

May 4, 2026

This version

0.0.10

Apr 27, 2026

0.0.9b5 pre-release

Mar 6, 2026

0.0.9b4 pre-release

Jan 24, 2026

0.0.9b3 pre-release

Jan 12, 2026

0.0.9b2 pre-release

Dec 12, 2025

0.0.9b1 pre-release

Dec 11, 2025

0.0.9a3 pre-release

Oct 25, 2025

0.0.9a2 pre-release

Aug 8, 2025

0.0.9a1 pre-release

Aug 5, 2025

0.0.8.6

Jul 19, 2025

0.0.8.5

Jul 19, 2025

0.0.8.1

Jul 16, 2025

0.0.8

Jul 9, 2025

0.0.8a6 pre-release

Jul 9, 2025

0.0.8a5 pre-release

Jul 9, 2025

0.0.8a4 pre-release

Jul 9, 2025

0.0.8a3 pre-release

Jul 9, 2025

0.0.8a2 pre-release

Jul 9, 2025

0.0.8a1 pre-release

Jul 8, 2025

0.0.8a0 pre-release

Jul 8, 2025

0.0.7

Jul 8, 2025

0.0.6

Jul 8, 2025

0.0.5

Jul 7, 2025

0.0.4

Jul 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlfastopt-0.0.10.tar.gz (75.8 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlfastopt-0.0.10-py3-none-any.whl (77.2 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file mlfastopt-0.0.10.tar.gz.

File metadata

Download URL: mlfastopt-0.0.10.tar.gz
Upload date: Apr 27, 2026
Size: 75.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlfastopt-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`cd351ec01f81e5a15f253b21cfff4afa92e88537d38fa501fa2d64a6381b5bc1`
MD5	`a118adda6f730c543013ec47433b7245`
BLAKE2b-256	`db6f620e498fa97e79bec8e765c2960284943652e27ee0627a31c90bb97fa9ab`

See more details on using hashes here.

File details

Details for the file mlfastopt-0.0.10-py3-none-any.whl.

File metadata

Download URL: mlfastopt-0.0.10-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 77.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for mlfastopt-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f16b46c0db0a2e3d67150a719e65b6f6aed1756fc2f9e5a532eb1613e95430c1`
MD5	`91a3c51ab7427ccbda090d4834560cb8`
BLAKE2b-256	`4da96a837cba34ea565b0c3cd1ab2c2a439e42fd489c3314d0c4ebc7ba5ef484`

See more details on using hashes here.

mlfastopt 0.0.10

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🚀 MLFastOpt

✨ Features

📦 Installation

Prerequisites

🚀 Quick Start

1. Install the Package

2. Create Configuration Files

3. Run Optimization

Option A: Command Line (CLI)

Option B: Python API

4. View Results

📖 How It Works

⚙️ Configuration Reference

Data Section

Model Section

Training Section

Selection Section

Available Metrics

📊 Output Files

Loading Saved Models for Inference

📧 Support

📄 License

🏢 About

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes