Python SDK + CLI for Verifiable Labs — evaluate frontier LLMs on conformal-calibrated scientific RL environments.
Project description
verifiable-labs
Python SDK for the Verifiable Labs Hosted Evaluation API — evaluate frontier LLMs on conformal-calibrated scientific RL environments without writing any HTTP plumbing.
v0.1.0a1 — alpha. The Hosted Evaluation API itself is v0.1.0-alpha (open / rate-limited / no auth / single-process session store). This SDK is a thin httpx wrapper that mirrors the 8-endpoint API surface; it'll keep working when we add auth + persistence in v0.2.
Install
pip install verifiable-labs
Python >=3.11 required.
Quickstart
Synchronous
from verifiable_labs import Client
with Client() as client: # localhost:8000 by default
print(client.health().version) # "0.1.0-alpha"
env = client.env("stelioszach/sparse-fourier-recovery")
result = env.evaluate(
seed=0,
answer='{"support_idx": [12, 47, 91], "support_amp_x1000": [800, -300, 1200]}',
env_kwargs={"calibration_quantile": 2.0},
)
print(f"reward={result.reward:.3f} parse_ok={result.parse_ok}")
Asynchronous
import asyncio
from verifiable_labs import AsyncClient
async def main():
async with AsyncClient(base_url="https://api.verifiable-labs.com") as client:
env = client.env("sparse-fourier-recovery")
# Multi-turn flow: keep submitting until session.complete is True
session = await env.start_session(seed=42)
while not session.complete:
answer = my_agent.solve(session.observation) # your code
await session.submit(answer_text=answer)
print("turns:", len(session.history))
asyncio.run(main())
Leaderboard
lb = client.leaderboard("sparse-fourier-recovery")
for row in lb.top_models(n=3):
print(f"{row.model:35s} mean={row.mean_reward:.3f} n={row.n}")
Public surface
| name | sync / async | purpose |
|---|---|---|
Client(api_key=None, base_url=...) |
sync | top-level client |
AsyncClient(api_key=None, base_url=...) |
async | top-level client |
client.health() |
both | liveness + version |
client.environments() |
both | list all 10 envs |
client.env(env_id) |
both | returns Environment handle |
client.leaderboard(env_id) |
both | aggregated benchmark numbers |
env.evaluate(seed, answer) |
both | one-shot eval, returns SubmitResponse |
env.start_session(seed) |
both | returns multi-turn Session |
session.submit(answer_text=...) |
both | append a turn, returns score |
session.history |
sync (property) | list of past SubmitResponses |
session.complete |
sync (property) | bool — env signalled done |
session.refresh() |
both | re-fetch state from the server |
Exceptions
The SDK raises typed exceptions on non-2xx HTTP status codes; callers
can except on the specific failure mode.
from verifiable_labs import (
VerifiableLabsError, # base class
TransportError, # network / timeout
InvalidRequestError, # 400 / 422
NotFoundError, # 404
RateLimitError, # 429
ServerError, # 5xx
)
Configuration
Client(
api_key=None, # forward-compat for v0.2; no effect in v0.1
base_url="http://localhost:8000",
timeout=30.0, # httpx total-timeout in seconds
http_client=None, # inject your own httpx.Client for custom transport
)
AsyncClient takes the same args + accepts an httpx.AsyncClient.
What's NOT in v0.1
Same caveats as the Hosted Evaluation API:
- No authentication.
api_key=is accepted for forward-compat but unused. - Multi-turn sessions don't yet route turns through the env's
residual-feedback rollout (server records turns but doesn't
dispatch). The SDK exposes the full
SessionAPI anyway so the shape is stable for v0.2. - Structured
answerdicts return HTTP 422; pass strings. - No persistence — session store is in-memory on the API side.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file verifiable_labs-0.1.0a4.tar.gz.
File metadata
- Download URL: verifiable_labs-0.1.0a4.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d530992c0f27d827a76ab6d6e3d64234638ec2a1ccb663f26be94ab796e31192
|
|
| MD5 |
2837bf1d621120e859d331cf4d0c4e78
|
|
| BLAKE2b-256 |
c215fba3c891c63b7d218e3853919f188e37f14a8bac26a26b032af86b5ab9e0
|
File details
Details for the file verifiable_labs-0.1.0a4-py3-none-any.whl.
File metadata
- Download URL: verifiable_labs-0.1.0a4-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd44a8e85ded1c527cffbf1016103d9cd4673d7173d44b7db03ec9db240ce2df
|
|
| MD5 |
0992eccb8cb197a3de942c183e42b3ff
|
|
| BLAKE2b-256 |
c29f90c2a8977244fbd153ac5c6d8fe10d4fb214957b09d852408e67feda29dc
|