Python SDK + CLI for Verifiable Labs — evaluate frontier LLMs on conformal-calibrated scientific RL environments.

These details have not been verified by PyPI

Project links

Project description

verifiable-labs

Python SDK for the Verifiable Labs Hosted Evaluation API — evaluate frontier LLMs on conformal-calibrated scientific RL environments without writing any HTTP plumbing.

v0.1.0a1 — alpha. The Hosted Evaluation API itself is v0.1.0-alpha (open / rate-limited / no auth / single-process session store). This SDK is a thin httpx wrapper that mirrors the 8-endpoint API surface; it'll keep working when we add auth + persistence in v0.2.

Install

pip install verifiable-labs

Python >=3.11 required.

Quickstart

Synchronous

from verifiable_labs import Client

with Client() as client:                                # localhost:8000 by default
    print(client.health().version)                      # "0.1.0-alpha"

    env = client.env("stelioszach/sparse-fourier-recovery")
    result = env.evaluate(
        seed=0,
        answer='{"support_idx": [12, 47, 91], "support_amp_x1000": [800, -300, 1200]}',
        env_kwargs={"calibration_quantile": 2.0},
    )
    print(f"reward={result.reward:.3f}  parse_ok={result.parse_ok}")

Asynchronous

import asyncio
from verifiable_labs import AsyncClient

async def main():
    async with AsyncClient(base_url="https://api.verifiable-labs.com") as client:
        env = client.env("sparse-fourier-recovery")
        # Multi-turn flow: keep submitting until session.complete is True
        session = await env.start_session(seed=42)
        while not session.complete:
            answer = my_agent.solve(session.observation)         # your code
            await session.submit(answer_text=answer)
        print("turns:", len(session.history))

asyncio.run(main())

Leaderboard

lb = client.leaderboard("sparse-fourier-recovery")
for row in lb.top_models(n=3):
    print(f"{row.model:35s}  mean={row.mean_reward:.3f}  n={row.n}")

Public surface

name	sync / async	purpose
`Client(api_key=None, base_url=...)`	sync	top-level client
`AsyncClient(api_key=None, base_url=...)`	async	top-level client
`client.health()`	both	liveness + version
`client.environments()`	both	list all 10 envs
`client.env(env_id)`	both	returns `Environment` handle
`client.leaderboard(env_id)`	both	aggregated benchmark numbers
`env.evaluate(seed, answer)`	both	one-shot eval, returns `SubmitResponse`
`env.start_session(seed)`	both	returns multi-turn `Session`
`session.submit(answer_text=...)`	both	append a turn, returns score
`session.history`	sync (property)	list of past `SubmitResponse`s
`session.complete`	sync (property)	`bool` — env signalled done
`session.refresh()`	both	re-fetch state from the server

Exceptions

The SDK raises typed exceptions on non-2xx HTTP status codes; callers can except on the specific failure mode.

from verifiable_labs import (
    VerifiableLabsError,        # base class
    TransportError,             # network / timeout
    InvalidRequestError,        # 400 / 422
    NotFoundError,              # 404
    RateLimitError,             # 429
    ServerError,                # 5xx
)

Configuration

Client(
    api_key=None,               # forward-compat for v0.2; no effect in v0.1
    base_url="http://localhost:8000",
    timeout=30.0,               # httpx total-timeout in seconds
    http_client=None,           # inject your own httpx.Client for custom transport
)

AsyncClient takes the same args + accepts an httpx.AsyncClient.

What's NOT in v0.1

Same caveats as the Hosted Evaluation API:

No authentication. api_key= is accepted for forward-compat but unused.
Multi-turn sessions don't yet route turns through the env's residual-feedback rollout (server records turns but doesn't dispatch). The SDK exposes the full Session API anyway so the shape is stable for v0.2.
Structured answer dicts return HTTP 422; pass strings.
No persistence — session store is in-memory on the API side.

License

Apache-2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a4 pre-release

Apr 28, 2026

0.1.0a3 pre-release

Apr 28, 2026

0.1.0a2 pre-release

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifiable_labs-0.1.0a4.tar.gz (26.8 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

verifiable_labs-0.1.0a4-py3-none-any.whl (24.9 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file verifiable_labs-0.1.0a4.tar.gz.

File metadata

Download URL: verifiable_labs-0.1.0a4.tar.gz
Upload date: Apr 28, 2026
Size: 26.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for verifiable_labs-0.1.0a4.tar.gz
Algorithm	Hash digest
SHA256	`d530992c0f27d827a76ab6d6e3d64234638ec2a1ccb663f26be94ab796e31192`
MD5	`2837bf1d621120e859d331cf4d0c4e78`
BLAKE2b-256	`c215fba3c891c63b7d218e3853919f188e37f14a8bac26a26b032af86b5ab9e0`

See more details on using hashes here.

File details

Details for the file verifiable_labs-0.1.0a4-py3-none-any.whl.

File metadata

Download URL: verifiable_labs-0.1.0a4-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for verifiable_labs-0.1.0a4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd44a8e85ded1c527cffbf1016103d9cd4673d7173d44b7db03ec9db240ce2df`
MD5	`0992eccb8cb197a3de942c183e42b3ff`
BLAKE2b-256	`c29f90c2a8977244fbd153ac5c6d8fe10d4fb214957b09d852408e67feda29dc`

See more details on using hashes here.

verifiable-labs 0.1.0a4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

verifiable-labs

Install

Quickstart

Synchronous

Asynchronous

Leaderboard

Public surface

Exceptions

Configuration

What's NOT in v0.1

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes