Sample-Efficient Bayesian Optimizer — GP surrogate with ensemble acquisition

These details have not been verified by PyPI

Project links

Project description

SEBO — Sample-Efficient Bayesian Optimizer

Python License Last Updated Rounds Objectives Status

I designed and implemented a GP-based sequential optimizer from scratch, applied it to the NeurIPS 2020 Black-Box Optimisation Challenge format — 8 unknown objective functions (2D–8D), one evaluation per function per round, 13 rounds — and benchmarked it head-to-head against common open-source solvers (Optuna-TPE, TuRBO, DE-GP-EI) on identical observation histories. The core contribution is a full BO pipeline: automatic kernel selection by log-marginal likelihood, ensemble acquisition (EI + PI + UCB with centroid fallback), and output warping for skewed objectives. The same suggest / observe API generalises directly to AutoML hyperparameter search, drug discovery, and materials design — any setting where evaluations are expensive and every query counts.

Use SEBO

pip install git+https://github.com/karefyllidis/SEBO.git

Or clone for development:

git clone https://github.com/karefyllidis/SEBO.git && cd SEBO && pip install -e .

With benchmark solvers (Optuna, TuRBO):

pip install "sebo[benchmark] @ git+https://github.com/karefyllidis/SEBO.git"

from sebo import BayesianOptimizer

optimizer = BayesianOptimizer(
    bounds=[(0.0, 1.0)] * 4,   # search space — any dimension
    output_warping="log",        # for skewed objectives (log or boxcox)
    use_ensemble=True,           # EI + PI + UCB with centroid fallback
)
optimizer.fit(X_init, y_init)   # warm-start with existing observations

for _ in range(n_rounds):
    x_next = optimizer.suggest()          # GP surrogate + ensemble acquisition
    y_next = oracle(x_next)              # your expensive function here
    optimizer.observe(x_next, y_next)    # update the surrogate

print(optimizer.best)   # (best_x, best_y)

See notebooks/demo_sklearn_hpo.ipynb for a fully worked example — no external data required.

Benchmark

notebooks/sebo_benchmark.ipynb — SEBO (built from scratch) benchmarked against common open-source solvers — Optuna-TPE, TuRBO, DE-GP-EI, and Random Search — on 6 synthetic black-box functions spanning four orders of magnitude in output scale (log-warping on F3, asymmetric Gaussian peaks on F6). Adaptive stopping: all solvers halt as soon as any one reaches ≥99% of the true maximum (cap: 80 evaluations) — each subplot shows a different evaluation count depending on function difficulty.

SEBO Benchmark Convergence

Incumbent best-y convergence. Green band = LHS warm-start. Dashed black line = true maximum. Dash-dot vertical line = stopping point (first solver to reach ≥99% of true max).

notebooks/demo_sklearn_hpo.ipynb — self-contained HPO demo. Tunes a RandomForestClassifier on sklearn's Digits dataset (4D search space: n_estimators, max_depth, min_samples_split, max_features). 10 LHS warm-start + 20 BO iterations vs 30 random search evaluations.

The Problem

Eight unknown objective functions, dimensions 2–8, domain [0, 1]^d. No formula, no gradients. One evaluation per function per round, across 13 rounds. Maximise each within budget.

#	Dim	Real-world analogy	Landscape character
F1	2D	Radiation detection	Sparse signal; near-zero almost everywhere with a narrow high-value peak
F2	2D	Unknown ML model	Noisy; multiple local maxima
F3	3D	Drug discovery	Smooth, always negative; optimisation = least negative
F4	4D	Warehouse logistics	Many local optima, extreme outliers
F5	4D	Chemical process yield	Unimodal; output spans orders of magnitude near domain boundary
F6	5D	Recipe formulation	Noisy oracle; same input returned different y across rounds
F7	6D	Hyperparameter tuning	Sparse in 6D; smooth locally
F8	8D	High-dimensional ML model	Hardest; strong cumulative improvement with coverage

Domain: [0, 1]^d for all functions. Higher y is always better; F3 and F6 outputs are negative.

Results

Best observed y after 13 rounds (10 warm-start points per function):

Function	Initial best y	Final best y	Improvement
F1	~0.0	0.6704	Large — narrow peak located in round 10
F2	~0.19	0.7248	Large
F3	~−0.44	−0.0032	Large (always negative; less negative = better)
F4	~0.04	0.2987	Moderate
F5	~1700	7493.9	Very large — near-boundary region [0.99, …] confirmed
F6	~−1.3	−0.1402	Large
F7	~0.003	2.7968	Large
F8	~5.6	9.9619	Large

Full per-round strategy notes and GP diagnostics: docs/model_card.md.

GP Surrogate Evolution — Function 3 (Drug Discovery, 3D)

Pairwise IDW projections evolution — Function 3

Weekly evolution of pairwise IDW-interpolated projections of observed y. Red dots are evaluations numbered by round. Colour scale fixed across all frames for direct comparison. Regenerate with python scripts/export_function3_gp_evolution_gif.py once local observations.csv is present.

Methodology

Bayesian Optimisation maintains a probabilistic surrogate (GP) over the unknown function and uses it to select the next query — balancing exploitation of known good regions against exploration of uncertain ones.

fit GP → maximise acquisition → evaluate f(x*) → append (x*, y*) → repeat

Kernel Selection

At each round, three kernels compete for the best log-marginal likelihood (LML): RBF, Matérn ν=1.5, and RBF + WhiteKernel. The winner is selected automatically; hyperparameters are tuned by L-BFGS-B MLE with multiple restarts.

Ensemble Acquisition

Three acquisition functions run simultaneously — EI, PI, and UCB. If their suggested next points are close together (agree), SEBO follows the EI recommendation. If they diverge (disagree), SEBO queries the centroid of all three — a soft blend that avoids over-committing to one strategy.

Output Warping

For objectives spanning orders of magnitude (e.g. F5: values from ~1700 to ~7500), targets are log-transformed before GP fitting. The GP, acquisition functions, and incumbent tracking all operate in warped space; raw y values are stored and reported.

Why This Matters — Real-World Applications

The BO loop in SEBO is the same engine used by Optuna, SMAC, and Ax internally. Building it from scratch makes every design decision explicit and auditable.

AutoML / Hyperparameter Optimisation — GP surrogate replaces grid/random search; finds better configs in fewer model-training calls. Demonstrated directly in the HPO demo: SEBO tunes a RandomForest on Digits using 30 evaluations and consistently outperforms random search.
Drug Discovery & Materials Science — sample-efficient search over molecular or material property spaces where each lab measurement is costly (F3 analogue: drug potency proxy).
Simulation Optimisation — engineering or physics simulations where one run takes minutes to hours; BO is the standard approach for tuning parameters.
Neural Architecture Search — treating layer widths, learning rates, and dropout as a continuous search space; same GP + acquisition loop applies.
Sequential Experiment Design — A/B tests, clinical dose-finding, adaptive sampling — any setting where observations arrive one at a time and each one is expensive.

Project Structure

sebo/
│
├── notebooks/
│   ├── function_{1..8}_*.ipynb         # One notebook per objective — full BO pipeline
│   ├── sebo_benchmark.ipynb            # Head-to-head benchmark vs open-source solvers
│   └── demo_sklearn_hpo.ipynb          # Standalone HPO demo — start here
│
├── src/
│   ├── optimizers/
│   │   ├── optimizer.py                # BayesianOptimizer — stateful suggest/observe API
│   │   ├── my_bayesian/my_gp_skopt.py # GP + ensemble acquisition (EI/PI/UCB)
│   │   └── wrappers/                   # optuna, turbo, de_gp_ei, hyperopt solvers
│   └── utils/
│       ├── warping.py                  # log / Box-Cox output warping
│       ├── sampling_utils.py           # Sobol / LHS candidate generation
│       ├── plot_utilities.py           # Shared plot helpers
│       └── load_challenge_data.py      # Data loader
│
├── append_results/
│   ├── append_week{1..13}_results.py  # Round-append scripts (idempotent)
│   └── run_optimizers_on_data.py      # Benchmark all solvers on oracle data
│
├── scripts/
│   └── export_function3_gp_evolution_gif.py
│
├── configs/                            # Solver configs (optuna, turbo, de_gp_ei, hyperopt)
├── docs/
│   ├── model_card.md                  # Architecture, per-round performance, limitations
│   ├── datasheet.md                   # Data provenance, composition, uses
│   └── TECHNICAL_FOUNDATIONS.md       # BO theory, kernel choices, key papers
│
├── run_pipeline.py
├── requirements.txt
└── requirements-benchmark.txt

Note on data: Raw evaluation CSVs (data/problems/, initial_data/) are gitignored. The demo and benchmark notebooks run without them; GP evolution plots require local observations.csv files.

Stack

Python 3.10+ · NumPy · SciPy · scikit-learn (GaussianProcessRegressor) · scikit-optimize · Optuna · BoTorch/TuRBO · Matplotlib

References

Turner et al., PMLR Vol. 133 — Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020
NeurIPS 2020 Black-Box Optimisation Challenge

Licence

MIT. Initial warm-start data provided by Imperial College London for educational use; redistribution permitted for non-commercial, academic purposes.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sebo-0.1.0.tar.gz (37.4 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sebo-0.1.0-py3-none-any.whl (46.4 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file sebo-0.1.0.tar.gz.

File metadata

Download URL: sebo-0.1.0.tar.gz
Upload date: Jun 27, 2026
Size: 37.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for sebo-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dbc8700486eea851f45ec84471134c710c4671422986b95cf379efd3a0fc81f3`
MD5	`74fa0aa2b376b17f8440a3690d85fb29`
BLAKE2b-256	`fc430572f631890e52961951b2827a9e62008058b579380b97d976ff72a87b4e`

See more details on using hashes here.

File details

Details for the file sebo-0.1.0-py3-none-any.whl.

File metadata

Download URL: sebo-0.1.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 46.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for sebo-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1012a53ca30a0dece9c497d163e2387cc1c53d3c8dd4bf44dc7fe75820c47f5`
MD5	`ee5cd6c375c303e825774c4bea16f98b`
BLAKE2b-256	`d5c8fc0970c96f38664a1772696e8afd276bee700efdeed6428850c8721b3492`

See more details on using hashes here.

sebo 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SEBO — Sample-Efficient Bayesian Optimizer

Use SEBO

Benchmark

The Problem

Results

GP Surrogate Evolution — Function 3 (Drug Discovery, 3D)

Methodology

Kernel Selection

Ensemble Acquisition

Output Warping

Why This Matters — Real-World Applications

Project Structure

Stack

References

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes