Skip to main content

GLOSS: Global-Local-Unexplored Sampling Strategy for batch surrogate optimization in vast chemical search spaces

Project description

GLOSS

Global–Local–Unexplored Sampling Strategy — a multi-strategy batch recommender for surrogate-based optimization in vast chemical search spaces.

PyPI Python License Downloads GitHub stars Last commit Docs


What it does

Standard batch Bayesian optimization (BO) picks all q candidates per round by greedily maximizing a single acquisition function. When the surrogate—fit on scarce data—has locked onto a secondary peak rather than the global optimum, the whole batch is wasted.

GLOSS decomposes each q-point batch across three complementary streams that share one surrogate:

Stream Role Selection
Global Exploitation UCB acquisition s·μ(x) + κ·σ(x)
Local Refinement BallTree neighborhood around current best x* (top-K truncation, K = 500 by default; O(K) per round)
Unexplored Exploration Maximizes geometric distance to observed points; uses no surrogate signal

The Unexplored stream is the operational answer to the overfitting trap: it forces every round to deposit data in regions the surrogate has not yet seen, so its blind spots get filled even when the μ/σ predictions are unreliable.


Install

pip install gloss-opt

The PyPI distribution is gloss-opt (the bare gloss name was already taken on PyPI), but the Python import name is still gloss:

from gloss import GLOSS

Optional extras:

pip install "gloss-opt[nn]"   # + torch (for NN surrogate)
pip install "gloss-opt[ml]"   # + xgboost, lightgbm
pip install "gloss-opt[all]"  # everything above

Install from source:

git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"

Python 3.9+ required.


Quick start

import numpy as np
from gloss import GLOSS

# A candidate pool of 10,000 points in 5 dimensions
candidates = np.random.rand(10_000, 5)

g = GLOSS(
    space={"candidates": candidates},
    direction="maximize",
    ratio={"global_best": 4, "local_best": 2, "unexplored": 2},
    ucb_kappa=2.0,
    diversity_radius=0.02,
)

# Bootstrap with a few initial measurements
X_obs = candidates[np.random.choice(len(candidates), 8, replace=False)]
y_obs = my_oracle(X_obs)                       # your evaluation here

# Round-by-round recommendation
for _ in range(20):
    batch = g.recommend(X_obs, y_obs, n_points=8)
    y_new = my_oracle(batch)
    X_obs = np.vstack([X_obs, batch])
    y_obs = np.concatenate([y_obs, y_new])

See benchmarks/bench_main.py for end-to-end runnable examples on QM9, Buchwald–Hartwig and a virtual reaction surface.


Reproducing the benchmark

git clone https://github.com/zbc0315/gloss.git
cd gloss
pip install -e ".[all]"
python -m benchmarks.bench_main --study all

The benchmark compares GLOSS against UCB-BO, BO(EI), GA and Random on three chemistry datasets across 5 seeds × 20 rounds:

Dataset n Source
Buchwald–Hartwig 3,955 Experimental yields
QM9 HOMO–LUMO gap 100,000 DFT, 20 RDKit descriptors
Arrhenius-2D 10,000 Virtual reaction surface

Headline numbers on QM9-100k (5/5 seeds, mean t₉₅):

Algorithm t₉₅ (rounds) Reach 95%
GLOSS (4:2:2) 7.2 5/5
UCB-BO 16.6 3/5
BO(EI) 18.4 2/5

2.31× / 2.56× speedup over the two BO variants.


Documentation

  • Algorithm details, design decisions, and a per-stream walkthrough are in the paper (link to be added on submission).
  • Per-class API: gloss/gloss.py (top-level GLOSS class), gloss/strategies/ (the three streams), gloss/surrogate/ (RF / GP / NN backends).
  • Benchmark scripts: benchmarks/bench_*.py.

Citation

If you use GLOSS in your research, please cite:

@article{gloss2026,
  title  = {GLOSS: A Multi-Strategy Sampling Framework for Optimization in Vast Chemical Search Spaces},
  author = {Zhang, Baicheng and Zhang, Guoqing and Luo, Yi and Jiang, Jun and Zhu, Zhuoying},
  year   = {2026},
  note   = {Submitted}
}

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gloss_opt-1.0.0.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gloss_opt-1.0.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file gloss_opt-1.0.0.tar.gz.

File metadata

  • Download URL: gloss_opt-1.0.0.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for gloss_opt-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d17ed28bda4f79ee1ff5134fca1c59b21ff5ec27529761f6be6af78e6b910268
MD5 bbfda168a6f8c3ea91996aac35a89e11
BLAKE2b-256 daf1054e4c7c1dc3dbcf7624f98017d7dcfa63c0c0c7797108af4ff6920568ec

See more details on using hashes here.

File details

Details for the file gloss_opt-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gloss_opt-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for gloss_opt-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49a927f43267e38aa7f11d00cb619da62413982f40f56994d1dc4b200db2543d
MD5 fd03f314bee87ea40d810b9f434dceb3
BLAKE2b-256 ea64b528f7c8d1b2b48deef8ceae5cd5365f9979e56aece7d570411754990e3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page