Skip to main content

Python SDK for the Ashr Labs API

Project description

Ashr Labs Python SDK

A Python client library for evaluating AI agents against Ashr Labs test datasets.

Documentation

Installation

pip install ashr-labs

Quick Start

from ashr_labs import AshrLabsClient, EvalRunner

# Only need your API key — base_url and tenant_id are automatic
client = AshrLabsClient(api_key="tp_your_api_key_here")

# Fetch a dataset and run your agent against it
runner = EvalRunner.from_dataset(client, dataset_id=42)
run = runner.run(my_agent)

# Submit results — grading happens server-side
created = run.deploy(client, dataset_id=42)

# Wait for grading to complete (typically 1-3 minutes)
graded = client.poll_run(created["id"])
metrics = graded["result"]["aggregate_metrics"]
print(f"Passed: {metrics['tests_passed']}/{metrics['total_tests']}")

Your agent just needs two methods:

class MyAgent:
    def respond(self, message: str) -> dict:
        # Call your LLM, return {"text": "...", "tool_calls": [...]}
        return {"text": "response", "tool_calls": []}

    def reset(self) -> None:
        # Clear conversation history between scenarios
        pass

See Testing Your Agent for a full end-to-end guide.

Agents

Agents group your datasets and define how they should be generated and graded. Create an agent once, then generate consistent datasets for it.

# Create an agent with tool definitions and grading config
agent = client.create_agent(
    name="Support Bot",
    description="Spanish-language healthcare scheduling agent",
    config={
        "tool_definitions": [
            {"name": "fetch_kareo_data", "required": True, "description": "Fetch appointment availability"},
            {"name": "save_data", "required": True, "description": "Persist caller info"},
            {"name": "end_session", "required": False, "description": "Close the conversation"},
        ],
        "behavior_rules": [
            {"rule": "Always fetch before quoting availability", "strictness": "required"},
            {"rule": "Save caller name via save_data", "strictness": "required"},
        ],
        "grading_config": {
            "tool_strictness": {
                "fetch_kareo_data": "required",
                "end_session": "optional",
                "await_user_response": "optional",
            },
        },
    },
)

# Link a dataset to the agent
client.set_dataset_agent(dataset_id=42, agent_id=agent["id"])

# Submit a run and auto-link to agent
run.deploy(client, dataset_id=42, agent_id=agent["id"])

Grading behavior

The grading system uses agent config to make smarter decisions:

  • required tools: Must be called. If the agent skips a required tool, it's a failure.
  • optional tools: If the agent achieves the same intent via text (e.g. ends the conversation naturally instead of calling end_session), the grader recovers it as a partial match instead of a failure.
  • expected tools: Should be called, but a miss is a warning, not a failure.

Observability — Production Tracing

Trace your agent in production. Captures LLM calls, tool invocations, and events. Never crashes your agent — if the backend is unreachable, errors are logged silently.

# Context managers (recommended) — auto-end on exit, auto-capture errors
with client.trace("handle-ticket", user_id="user_42") as trace:
    with trace.generation("classify", model="claude-sonnet-4-6",
                          input=[{"role": "user", "content": "help"}]) as gen:
        result = call_llm(...)
        gen.end(output=result, usage={"input_tokens": 50, "output_tokens": 12})

    with trace.span("tool:search", input={"q": "..."}) as tool:
        data = search(...)
        tool.end(output=data)

# Analytics
analytics = client.get_observability_analytics(days=7)
print(f"Traces: {analytics['overview']['total_traces']}")
print(f"Tool calls: {analytics['overview']['total_tool_calls']}")

See API Reference for full Trace/Span/Generation docs.

Voice Observability — LiveKit

For realtime voice agents, the SDK ships an ashr_labs.voice_obs submodule that captures STT/LLM/TTS metrics, turn boundaries, barge-ins, and mixed-audio replay from a LiveKit AgentSession. Two-line attach:

import os
from ashr_labs.voice_obs.livekit import VoiceObservability

obs = VoiceObservability(api_key=os.environ["ASHR_API_KEY"])
obs.attach(session, agent_id="support_v3", agent_version="v42")

LiveKit deps live behind an extra:

pip install ashr-labs[livekit]

Runnable demos:

python -m ashr_labs.voice_obs.examples.livekit_worker dev          # minimal
python -m ashr_labs.voice_obs.examples.ashr_support_agent dev      # full demo

Voice sessions land in the same Observability panel as text-trace sessions; the dashboard auto-renders turns, transcripts, per-stage cost, latency, and audio replay.

VM Stream Logs

Attach virtual machine session logs to test results for browser-based or desktop-based agents:

test = run.add_test("checkout_flow")
test.start()
# ... run agent, add tool calls and responses ...

# Kernel browser session (first-class support)
test.set_kernel_vm(
    session_id="kern_sess_abc123",
    duration_ms=15000,
    logs=[
        {"ts": 0, "type": "navigation", "data": {"url": "https://app.example.com"}},
        {"ts": 1200, "type": "action", "data": {"action": "click", "selector": "#login"}},
    ],
    replay_id="replay_abc123",
    replay_view_url="https://www.kernel.sh/replays/replay_abc123",
    stealth=True,
    viewport={"width": 1920, "height": 1080},
)

# Or use the generic set_vm_stream() for any provider
test.set_vm_stream(
    provider="browserbase",
    session_id="sess_abc123",
    duration_ms=45000,
    logs=[
        {"ts": 0, "type": "navigation", "data": {"url": "https://app.example.com"}},
        {"ts": 1200, "type": "action", "data": {"action": "click", "selector": "#login"}},
    ],
)
test.complete()

Available Methods

All methods that accept tenant_id auto-resolve it from your API key if omitted.

Agents

Method Description
list_agents() List all agents with dataset counts
create_agent(name, description, config) Create a new agent
update_agent(agent_id, name, description, config) Update an agent
delete_agent(agent_id) Soft-delete an agent
get_agent_datasets(agent_id) Get datasets linked to an agent
set_dataset_agent(dataset_id, agent_id) Link/unlink a dataset to an agent

Datasets

Method Description
get_dataset(dataset_id, ...) Get a dataset by ID
list_datasets(limit, cursor, ...) List datasets (cursor-based pagination)

Runs

Method Description
create_run(dataset_id, result, ...) Create a new test run
get_run(run_id) Get a run by ID
list_runs(dataset_id, limit) List runs
delete_run(run_id) Delete a run
poll_run(run_id, timeout, poll_interval) Wait for server-side grading to complete

EvalRunner

Method Description
EvalRunner.from_dataset(client, dataset_id) Create a runner from a dataset
runner.run(agent, max_workers=1, on_environment=...) Run agent against all scenarios, return RunBuilder
runner.run_and_deploy(agent, client, dataset_id, max_workers=1) Run and submit in one call

RunBuilder

Method Description
RunBuilder() Create a new run builder
run.start() Mark the run as started
run.add_test(test_id) Add a test and get a TestBuilder
run.complete(status) Mark the run as completed
run.build() Serialize to a result dict
run.deploy(client, dataset_id, agent_id) Build and submit via the API

TestBuilder

Method Description
test.start() Mark the test as started
test.add_user_file(file_path, description) Record a user file upload
test.add_user_text(text, description) Record a user text input
test.add_tool_call(expected, actual, match_status) Record an agent tool call
test.add_agent_response(expected_response, actual_response, match_status) Record an agent response
test.set_vm_stream(provider, session_id, logs, ...) Attach VM session logs
test.set_kernel_vm(session_id, ...) Attach Kernel VM session (convenience)
test.complete(status) Mark the test as completed

Requests

Method Description
create_request(request_name, request, ...) Create a new request
get_request(request_id) Get a request by ID
list_requests(status, limit, cursor) List requests

Observability

Method Description
client.trace(name, ...) Start a production trace (returns Trace)
trace.span(name, ...) / trace.generation(name, ...) Add spans or LLM calls
trace.end(output=...) Flush trace to backend (never raises)
list_observability_traces(user_id, session_id, ...) List traces
get_observability_trace(trace_id) Get trace with full observation tree
get_observability_analytics(days) Analytics: tokens, latency, errors, tool perf
get_observability_errors(days, limit, page) Traces with errors
get_observability_tool_errors(days, limit, page) Traces with tool failures

API Keys & Session

Method Description
init() Validate credentials and get user/tenant info
list_api_keys(include_inactive) List API keys for your tenant
revoke_api_key(api_key_id) Revoke an API key
health_check() Check if the API is reachable

Error Handling

from ashr_labs import AshrLabsClient, NotFoundError, AuthenticationError

client = AshrLabsClient(api_key="tp_...")

try:
    dataset = client.get_dataset(dataset_id=999)
except AuthenticationError:
    print("Invalid API key")
except NotFoundError:
    print("Dataset not found")

Configuration

# All defaults — just pass API key
client = AshrLabsClient(api_key="tp_...")

# From environment (reads ASHR_LABS_API_KEY)
client = AshrLabsClient.from_env()

# Custom timeout
client = AshrLabsClient(api_key="tp_...", timeout=60)

# Custom base URL (for self-hosted)
client = AshrLabsClient(api_key="tp_...", base_url="https://your-api.example.com")

Requirements

  • Python 3.10+
  • No external dependencies (uses only standard library)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ashr_labs-0.3.0.tar.gz (89.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ashr_labs-0.3.0-py3-none-any.whl (85.2 kB view details)

Uploaded Python 3

File details

Details for the file ashr_labs-0.3.0.tar.gz.

File metadata

  • Download URL: ashr_labs-0.3.0.tar.gz
  • Upload date:
  • Size: 89.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ashr_labs-0.3.0.tar.gz
Algorithm Hash digest
SHA256 831cb10d43c59fe67a95111ffb561e12b520cdabe0b5c3905210c23224799265
MD5 2fb70200249fc58a2582f0acd5ffcf74
BLAKE2b-256 c55e813926fcd5c33acaa9727bdb7d393ca35f43e7ab7925e491b4dba50ccd04

See more details on using hashes here.

File details

Details for the file ashr_labs-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ashr_labs-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 85.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for ashr_labs-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7447b4399427c4eb562f78fcf024171653f51cf57e35f1d3ecd81fbfd2fba411
MD5 c14e48b2211edcc64c7f13bde231d7de
BLAKE2b-256 4418c66b7f494cf536a75416f7616e9b8c90cef9bbfa2df30668212402d15693

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page