Benchmark harness for black-box optimizers that speak an ask/tell JSON Lines protocol

These details have not been verified by PyPI

Project links

Project description

hypara

A benchmark harness for measuring how well an optimizer searches an unknown black-box evaluation function.

hypara is deliberately not about solving famous problems (TSP, knapsack, bin packing) where a strong off-the-shelf solver wins. Each problem ships a natural-language description, a mixed search space, and a hidden evaluator whose shape changes with the instance seed. To score well an optimizer has to read the description, reason about the space, and adapt its strategy from the evaluation history within a limited budget.

Optimizers are language-agnostic external processes: they talk to the runner over a stdin/stdout JSON Lines protocol, so an optimizer can be written in Python, Rust, Go, TypeScript, or any executable.

Install

pip install hypara

For development (tests + build tooling):

pip install -e .[dev]
python -m pytest

Quickstart

List the built-in problems:

hypara list

Write a minimal optimizer. Create my_opt/manifest.json:

{"name": "my_opt", "command": ["python", "main.py"]}

and my_opt/main.py:

import json, random, sys

space = []
rng = random.Random()

def send(msg):
    sys.stdout.write(json.dumps(msg) + "\n")
    sys.stdout.flush()

for line in sys.stdin:
    msg = json.loads(line)
    t = msg.get("type")
    if t == "init":
        space = msg["problem"]["space"]
        rng = random.Random(msg.get("optimizer_seed"))
        send({"type": "ready"})
    elif t == "ask":
        # propose a candidate; here, a trivial random pick over numeric params
        cand = {}
        for p in space:
            if p.get("condition") is not None:
                continue
            if p["type"] == "categorical":
                cand[p["name"]] = rng.choice(p["choices"])
            elif p["type"] == "bool":
                cand[p["name"]] = rng.random() < 0.5
            else:
                lo, hi = p["low"], p["high"]
                v = rng.uniform(lo, hi)
                cand[p["name"]] = int(round(v)) if p["type"] == "int" else v
        send({"type": "propose", "candidate": cand})
    elif t == "tell":
        pass  # inspect msg["score"], msg["valid"], msg["remaining"] to adapt
    elif t == "finish":
        break

Run it against one problem, then aggregate:

hypara run --problem smooth_hill --optimizer ./my_opt --seed 1

The source repository also includes two reference optimizers (optimizers/random_search, optimizers/hill_climb) and ready-made suite configs (configs/smoke.json, configs/full.json):

hypara suite --config configs/smoke.json
hypara report --dir results/smoke-YYYYmmdd-HHMMSS

Built-in problems

All problems are single-objective, maximize, with an achievable maximum near 1.0. The hidden landscape is reseeded per run, so memorizing an instance does not help.

Problem	What it tests
`smooth_hill`	Smooth unimodal surface; local search should win.
`rugged_trap`	Multimodal with a decoy hill; needs restarts / exploration.
`conditional_knobs`	A categorical choice switches which knobs exist.
`noisy_lab`	Additive gaussian noise; beware chasing lucky readings.
`multi_fidelity`	Cheap biased low-fidelity vs. expensive true high-fidelity.
`sparse_needle`	One hidden combination scores high; weak partial-match signal.
`cost_aware`	The candidate's own `samples` knob drives its evaluation cost.
`rag_pipeline`	Surrogate RAG tuning (chunking, top_k, reranker interactions).
`image_pipeline`	Surrogate diffusion tuning; steps drive quality and cost.
`dispatch_policy`	Surrogate delivery policy; balance, batching, mild noise.

Protocol

The runner launches the optimizer as a child process (working directory = the optimizer's directory; if command[0] is "python" it is replaced with the runner's own interpreter). Messages are one JSON object per line: runner → optimizer on stdin, optimizer → runner on stdout. Optimizer stdout is protocol-only; write debug output to stderr (the runner saves it to optimizer.stderr.log). Receivers ignore unknown keys. NaN/Infinity must not be sent. Current protocol_version is 1.

Messages and turn-taking

Direction	`type`	Reply
runner → optimizer	`init`	`ready` (once)
runner → optimizer	`ask`	`propose` (once)
runner → optimizer	`tell`	none
runner → optimizer	`finish`	none; exit promptly

Only one ask is outstanding at a time. The init reply may take up to 30s, each ask reply up to 60s by default; overruns end the run as optimizer_timeout. A crash, an unparseable line, or an out-of-order message ends the run as failed. The best-so-far is recorded in every case.

init (runner → optimizer):

{"type": "init", "protocol_version": 1, "run_id": "smooth_hill--my_opt--s1",
 "problem": {
   "description": "natural-language prompt",
   "space": [ ...param specs (below)... ],
   "objective": "maximize",
   "budget": {"evaluations": 100, "cost_limit": null, "time_limit_sec": 300.0},
   "fidelities": null
 },
 "optimizer_seed": 12345}

budget always has at least one of evaluations or cost_limit non-null. fidelities, when non-null, is ordered low→high (last entry = top fidelity).

ready / propose (optimizer → runner):

{"type": "ready"}
{"type": "propose", "candidate": {"x0": 0.5, "algo": "alpha"}, "fidelity": "low"}

fidelity is optional; omitted/null means top fidelity. Sending a non-null fidelity to a problem with no fidelities is invalid.

tell (runner → optimizer):

{"type": "tell", "candidate_id": "c-0007", "candidate": {"x0": 0.5},
 "valid": true, "score": 0.73, "cost": 1.0, "fidelity": null, "error": null,
 "remaining": {"evaluations": 92, "cost": null, "time_sec": 291.3}}

When invalid: valid: false, score: null, and error gives the reason.

finish (runner → optimizer): {"type": "finish", "reason": "budget_exhausted"} (reason is budget_exhausted or time_limit).

Search space

[
  {"name": "lr", "type": "float", "low": 1e-4, "high": 1.0, "log": true},
  {"name": "layers", "type": "int", "low": 1, "high": 12},
  {"name": "opt", "type": "categorical", "choices": ["sgd", "adam"]},
  {"name": "warmup", "type": "bool"},
  {"name": "warmup_steps", "type": "int", "low": 10, "high": 1000,
   "condition": {"param": "warmup", "equals": [true]}}
]

Types: float, int, categorical, bool. Bounds low/high are inclusive; log: true hints a log scale.
A param with condition is active only when candidate[condition.param] is in equals. Conditioning is one level deep (the parent must be unconditional).

A candidate is validated by the runner: it must be a JSON object containing exactly the active params (no unknown keys, no inactive params, none missing), each of the right type and within range.

Budget rules

A valid evaluation consumes the evaluator's cost (may depend on the candidate/fidelity); the evaluations axis always consumes 1.
An invalid proposal still consumes budget (1 evaluation, cost 1.0), so spamming invalid candidates cannot mine the space for free.
The stop check runs before each ask, so the final evaluation may slightly overshoot cost_limit.
For problems with fidelities, only top-fidelity evaluations count toward best_score; lower fidelities are available as history but not scored.

Metrics

hypara report recomputes everything from the saved logs. Per run: best score, best candidate, best-so-far curve (over evaluations or cumulative cost), valid rate, status, wall time. Aggregated per (problem, optimizer): mean best, a baseline-relative normalized best and normalized anytime AUC (0 = baseline median, 1 = best observed for that problem), and an overall mean across problems.

Adding a problem

Implement Problem under src/hypara/problems/ and register it in src/hypara/registry.py. Keep the description and the evaluator's actual behavior in sync — the point of the benchmark is that reading the description helps. The shared invariants in tests/test_problems.py (finite scores, determinism given a seed, instance-seed sensitivity) apply automatically.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypara-0.1.0.tar.gz (37.9 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hypara-0.1.0-py3-none-any.whl (36.1 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file hypara-0.1.0.tar.gz.

File metadata

Download URL: hypara-0.1.0.tar.gz
Upload date: Jul 1, 2026
Size: 37.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for hypara-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7e71d6b29ee1c21b1701afa6e7b1453b85ae7b9dfa7fa966de88d1b7c9c65ed4`
MD5	`63578c02a52e7c7a7fec1dcb220e080e`
BLAKE2b-256	`27b35436a2282d4f7c362d2a57f296033ac04ce4012e9b22e527b7f8ba470ce9`

See more details on using hashes here.

File details

Details for the file hypara-0.1.0-py3-none-any.whl.

File metadata

Download URL: hypara-0.1.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 36.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for hypara-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ada38a4fb9dcbaba8121b5bde35fa95c3c0384eccedde0226a9e7f3c53e313e`
MD5	`2134e0b15ca6ce1d1ee4b56dceda173d`
BLAKE2b-256	`54fa0600effee4a77312817b3e17607fb71f322caa050e7cdd6339c1a176125c`

See more details on using hashes here.

hypara 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hypara

Install

Quickstart

Built-in problems

Protocol

Messages and turn-taking

Search space

Budget rules

Metrics

Adding a problem

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes