GLOSS: Global-Local-Unexplored Sampling Strategy for batch surrogate optimization in vast chemical search spaces
Project description
GLOSS
Global–Local–Unexplored Sampling Strategy — a multi-strategy batch recommender for surrogate-based optimization in vast chemical search spaces.
What it does
Standard batch Bayesian optimization (BO) picks all q candidates per round by greedily maximizing a single acquisition function. When the surrogate—fit on scarce data—has locked onto a secondary peak rather than the global optimum, the whole batch is wasted.
GLOSS decomposes each q-point batch across three complementary streams that share one surrogate:
| Stream | Role | Selection |
|---|---|---|
| Global | Exploitation | UCB acquisition s·μ(x) + κ·σ(x) |
| Local | Refinement | BallTree neighborhood around current best x* (top-K truncation, K = 500 by default; O(K) per round) |
| Unexplored | Exploration | Maximizes geometric distance to observed points; uses no surrogate signal |
The Unexplored stream is the operational answer to the overfitting trap: it forces every round to deposit data in regions the surrogate has not yet seen, so its blind spots get filled even when the μ/σ predictions are unreliable.
Install
pip install gloss-opt
The PyPI distribution is gloss-opt (the bare gloss name was
already taken on PyPI), but the Python import name is still
gloss:
from gloss import GLOSS
Optional extras:
pip install "gloss-opt[nn]" # + torch (for NN surrogate)
pip install "gloss-opt[ml]" # + xgboost, lightgbm
pip install "gloss-opt[all]" # everything above
Install from source:
git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"
Python 3.9+ required.
Quick start
import numpy as np
from gloss import GLOSS
# A candidate pool of 10,000 points in 5 dimensions
candidates = np.random.rand(10_000, 5)
g = GLOSS(
space={"candidates": candidates},
direction="maximize",
ratio={"global_best": 4, "local_best": 2, "unexplored": 2},
ucb_kappa=2.0,
diversity_radius=0.02,
)
# Bootstrap with a few initial measurements
X_obs = candidates[np.random.choice(len(candidates), 8, replace=False)]
y_obs = my_oracle(X_obs) # your evaluation here
# Round-by-round recommendation
for _ in range(20):
batch = g.recommend(X_obs, y_obs, n_points=8)
y_new = my_oracle(batch)
X_obs = np.vstack([X_obs, batch])
y_obs = np.concatenate([y_obs, y_new])
See benchmarks/bench_main.py for end-to-end runnable examples on
QM9, Buchwald–Hartwig and a virtual reaction surface.
Reproducing the benchmark
git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"
python -m benchmarks.bench_main --study all
The benchmark compares GLOSS against UCB-BO, BO(EI), GA and Random on three chemistry datasets across 5 seeds × 20 rounds:
| Dataset | n | Source |
|---|---|---|
| Buchwald–Hartwig | 3,955 | Experimental yields |
| QM9 HOMO–LUMO gap | 100,000 | DFT, 20 RDKit descriptors |
| Arrhenius-2D | 10,000 | Virtual reaction surface |
Headline numbers on QM9-100k (5/5 seeds, mean t₉₅):
| Algorithm | t₉₅ (rounds) | Reach 95% |
|---|---|---|
| GLOSS (4:2:2) | 7.2 | 5/5 |
| UCB-BO | 16.6 | 3/5 |
| BO(EI) | 18.4 | 2/5 |
→ 2.31× / 2.56× speedup over the two BO variants.
Documentation
- Algorithm details, design decisions, and a per-stream walkthrough are in the paper (link to be added on submission).
- Per-class API:
gloss/gloss.py(top-levelGLOSSclass),gloss/strategies/(the three streams),gloss/surrogate/(RF / GP / NN backends). - Benchmark scripts:
benchmarks/bench_*.py.
Citation
If you use GLOSS in your research, please cite:
@article{gloss2026,
title = {GLOSS: A Multi-Strategy Sampling Framework for Optimization in Vast Chemical Search Spaces},
author = {Zhang, Baicheng and Zhang, Guoqing and Luo, Yi and Jiang, Jun and Zhu, Zhuoying},
year = {2026},
note = {Submitted}
}
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gloss_opt-1.0.0.tar.gz.
File metadata
- Download URL: gloss_opt-1.0.0.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d17ed28bda4f79ee1ff5134fca1c59b21ff5ec27529761f6be6af78e6b910268
|
|
| MD5 |
bbfda168a6f8c3ea91996aac35a89e11
|
|
| BLAKE2b-256 |
daf1054e4c7c1dc3dbcf7624f98017d7dcfa63c0c0c7797108af4ff6920568ec
|
File details
Details for the file gloss_opt-1.0.0-py3-none-any.whl.
File metadata
- Download URL: gloss_opt-1.0.0-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49a927f43267e38aa7f11d00cb619da62413982f40f56994d1dc4b200db2543d
|
|
| MD5 |
fd03f314bee87ea40d810b9f434dceb3
|
|
| BLAKE2b-256 |
ea64b528f7c8d1b2b48deef8ceae5cd5365f9979e56aece7d570411754990e3f
|