Python SDK for the Ashr Labs API

These details have not been verified by PyPI

Project links

Project description

Ashr Labs Python SDK

A Python client library for evaluating AI agents against Ashr Labs test datasets.

Documentation

Testing Your Agent — start here (includes debugging failures with transcripts and classification)
Quick Start Guide
Installation
Authentication
API Reference
Error Handling
Examples

Installation

pip install ashr-labs

Quick Start

from ashr_labs import AshrLabsClient, EvalRunner

# Only need your API key — base_url and tenant_id are automatic
client = AshrLabsClient(api_key="tp_your_api_key_here")

# Fetch a dataset and run your agent against it
runner = EvalRunner.from_dataset(client, dataset_id=42)
run = runner.run(my_agent)

# Submit results — grading happens server-side
created = run.deploy(client, dataset_id=42)

# Wait for grading to complete (typically 1-3 minutes)
graded = client.poll_run(created["id"])
metrics = graded["result"]["aggregate_metrics"]
print(f"Passed: {metrics['tests_passed']}/{metrics['total_tests']}")

Your agent just needs two methods:

class MyAgent:
    def respond(self, message: str) -> dict:
        # Call your LLM, return {"text": "...", "tool_calls": [...]}
        return {"text": "response", "tool_calls": []}

    def reset(self) -> None:
        # Clear conversation history between scenarios
        pass

See Testing Your Agent for a full end-to-end guide.

Agents

Agents group your datasets and define how they should be generated and graded. Create an agent once, then generate consistent datasets for it.

# Create an agent with tool definitions and grading config
agent = client.create_agent(
    name="Support Bot",
    description="Spanish-language healthcare scheduling agent",
    config={
        "tool_definitions": [
            {"name": "fetch_kareo_data", "required": True, "description": "Fetch appointment availability"},
            {"name": "save_data", "required": True, "description": "Persist caller info"},
            {"name": "end_session", "required": False, "description": "Close the conversation"},
        ],
        "behavior_rules": [
            {"rule": "Always fetch before quoting availability", "strictness": "required"},
            {"rule": "Save caller name via save_data", "strictness": "required"},
        ],
        "grading_config": {
            "tool_strictness": {
                "fetch_kareo_data": "required",
                "end_session": "optional",
                "await_user_response": "optional",
            },
        },
    },
)

# Link a dataset to the agent
client.set_dataset_agent(dataset_id=42, agent_id=agent["id"])

# Submit a run and auto-link to agent
run.deploy(client, dataset_id=42, agent_id=agent["id"])

Grading behavior

The grading system uses agent config to make smarter decisions:

required tools: Must be called. If the agent skips a required tool, it's a failure.
optional tools: If the agent achieves the same intent via text (e.g. ends the conversation naturally instead of calling end_session), the grader recovers it as a partial match instead of a failure.
expected tools: Should be called, but a miss is a warning, not a failure.

Observability — Production Tracing

Trace your agent in production. Captures LLM calls, tool invocations, and events. Never crashes your agent — if the backend is unreachable, errors are logged silently.

# Context managers (recommended) — auto-end on exit, auto-capture errors
with client.trace("handle-ticket", user_id="user_42") as trace:
    with trace.generation("classify", model="claude-sonnet-4-6",
                          input=[{"role": "user", "content": "help"}]) as gen:
        result = call_llm(...)
        gen.end(output=result, usage={"input_tokens": 50, "output_tokens": 12})

    with trace.span("tool:search", input={"q": "..."}) as tool:
        data = search(...)
        tool.end(output=data)

# Analytics
analytics = client.get_observability_analytics(days=7)
print(f"Traces: {analytics['overview']['total_traces']}")
print(f"Tool calls: {analytics['overview']['total_tool_calls']}")

See API Reference for full Trace/Span/Generation docs.

VM Stream Logs

Attach virtual machine session logs to test results for browser-based or desktop-based agents:

test = run.add_test("checkout_flow")
test.start()
# ... run agent, add tool calls and responses ...

# Kernel browser session (first-class support)
test.set_kernel_vm(
    session_id="kern_sess_abc123",
    duration_ms=15000,
    logs=[
        {"ts": 0, "type": "navigation", "data": {"url": "https://app.example.com"}},
        {"ts": 1200, "type": "action", "data": {"action": "click", "selector": "#login"}},
    ],
    replay_id="replay_abc123",
    replay_view_url="https://www.kernel.sh/replays/replay_abc123",
    stealth=True,
    viewport={"width": 1920, "height": 1080},
)

# Or use the generic set_vm_stream() for any provider
test.set_vm_stream(
    provider="browserbase",
    session_id="sess_abc123",
    duration_ms=45000,
    logs=[
        {"ts": 0, "type": "navigation", "data": {"url": "https://app.example.com"}},
        {"ts": 1200, "type": "action", "data": {"action": "click", "selector": "#login"}},
    ],
)
test.complete()

Available Methods

All methods that accept tenant_id auto-resolve it from your API key if omitted.

Agents

Method	Description
`list_agents()`	List all agents with dataset counts
`create_agent(name, description, config)`	Create a new agent
`update_agent(agent_id, name, description, config)`	Update an agent
`delete_agent(agent_id)`	Soft-delete an agent
`get_agent_datasets(agent_id)`	Get datasets linked to an agent
`set_dataset_agent(dataset_id, agent_id)`	Link/unlink a dataset to an agent

Datasets

Method	Description
`get_dataset(dataset_id, ...)`	Get a dataset by ID
`list_datasets(limit, cursor, ...)`	List datasets (cursor-based pagination)

Runs

Method	Description
`create_run(dataset_id, result, ...)`	Create a new test run
`get_run(run_id)`	Get a run by ID
`list_runs(dataset_id, limit)`	List runs
`delete_run(run_id)`	Delete a run
`poll_run(run_id, timeout, poll_interval)`	Wait for server-side grading to complete

EvalRunner

Method	Description
`EvalRunner.from_dataset(client, dataset_id)`	Create a runner from a dataset
`runner.run(agent, max_workers=1, on_environment=...)`	Run agent against all scenarios, return `RunBuilder`
`runner.run_and_deploy(agent, client, dataset_id, max_workers=1)`	Run and submit in one call

RunBuilder

Method	Description
`RunBuilder()`	Create a new run builder
`run.start()`	Mark the run as started
`run.add_test(test_id)`	Add a test and get a `TestBuilder`
`run.complete(status)`	Mark the run as completed
`run.build()`	Serialize to a result dict
`run.deploy(client, dataset_id, agent_id)`	Build and submit via the API

TestBuilder

Method	Description
`test.start()`	Mark the test as started
`test.add_user_file(file_path, description)`	Record a user file upload
`test.add_user_text(text, description)`	Record a user text input
`test.add_tool_call(expected, actual, match_status)`	Record an agent tool call
`test.add_agent_response(expected_response, actual_response, match_status)`	Record an agent response
`test.set_vm_stream(provider, session_id, logs, ...)`	Attach VM session logs
`test.set_kernel_vm(session_id, ...)`	Attach Kernel VM session (convenience)
`test.complete(status)`	Mark the test as completed

Requests

Method	Description
`create_request(request_name, request, ...)`	Create a new request
`get_request(request_id)`	Get a request by ID
`list_requests(status, limit, cursor)`	List requests

Observability

Method	Description
`client.trace(name, ...)`	Start a production trace (returns `Trace`)
`trace.span(name, ...)` / `trace.generation(name, ...)`	Add spans or LLM calls
`trace.end(output=...)`	Flush trace to backend (never raises)
`list_observability_traces(user_id, session_id, ...)`	List traces
`get_observability_trace(trace_id)`	Get trace with full observation tree
`get_observability_analytics(days)`	Analytics: tokens, latency, errors, tool perf
`get_observability_errors(days, limit, page)`	Traces with errors
`get_observability_tool_errors(days, limit, page)`	Traces with tool failures

API Keys & Session

Method	Description
`init()`	Validate credentials and get user/tenant info
`list_api_keys(include_inactive)`	List API keys for your tenant
`revoke_api_key(api_key_id)`	Revoke an API key
`health_check()`	Check if the API is reachable

Error Handling

from ashr_labs import AshrLabsClient, NotFoundError, AuthenticationError

client = AshrLabsClient(api_key="tp_...")

try:
    dataset = client.get_dataset(dataset_id=999)
except AuthenticationError:
    print("Invalid API key")
except NotFoundError:
    print("Dataset not found")

Configuration

# All defaults — just pass API key
client = AshrLabsClient(api_key="tp_...")

# From environment (reads ASHR_LABS_API_KEY)
client = AshrLabsClient.from_env()

# Custom timeout
client = AshrLabsClient(api_key="tp_...", timeout=60)

# Custom base URL (for self-hosted)
client = AshrLabsClient(api_key="tp_...", base_url="https://your-api.example.com")

Requirements

Python 3.10+
No external dependencies (uses only standard library)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

May 5, 2026

0.3.0

May 5, 2026

This version

0.2.1

Apr 14, 2026

0.1.7

Apr 14, 2026

0.1.6

Mar 16, 2026

0.1.5

Mar 14, 2026

0.1.4

Mar 14, 2026

0.1.3

Mar 14, 2026

0.1.2

Mar 3, 2026

0.1.1

Mar 3, 2026

0.1.0

Feb 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ashr_labs-0.2.1.tar.gz (62.1 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ashr_labs-0.2.1-py3-none-any.whl (51.7 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file ashr_labs-0.2.1.tar.gz.

File metadata

Download URL: ashr_labs-0.2.1.tar.gz
Upload date: Apr 14, 2026
Size: 62.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ashr_labs-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`f01b2f8926c8341f92bdb0b6b5b6952860c988be71ba119908f0d9e83b94eb47`
MD5	`1670c56c09ea025f67ba98d0706abebd`
BLAKE2b-256	`159f5a9593c21968064c745ec7bbb7f7fd8abc76131d78b0aa28dc6b67f4fd8c`

See more details on using hashes here.

File details

Details for the file ashr_labs-0.2.1-py3-none-any.whl.

File metadata

Download URL: ashr_labs-0.2.1-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 51.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ashr_labs-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7259d2944756ce696f9fdc9d01b5498f8bcfabb5087ce7ec59b6d57c66c665b8`
MD5	`c6f8759d52ffd5b7d06dfe899104abb5`
BLAKE2b-256	`793eaa19e4370f9a7b280ed0ff868450fdd402f50da045b0bc2a1f48cedbe1c7`

See more details on using hashes here.

ashr-labs 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ashr Labs Python SDK

Documentation

Installation

Quick Start

Agents

Grading behavior

Observability — Production Tracing

VM Stream Logs

Available Methods

Agents

Datasets

Runs

EvalRunner

RunBuilder

TestBuilder

Requests

Observability

API Keys & Session

Error Handling

Configuration

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes