Skip to main content

Simulation and trace-based evaluation for agentic systems

Project description

understudy

PyPI version Python 3.12+ License: MIT

Test your AI agents with simulated users.

Installation

pip install understudy[all]

Quick Start

1. Wrap your agent

from understudy.adk import ADKApp
from my_agent import agent

app = ADKApp(agent=agent)

2. Mock your tools

Your agent has tools that call external services. Mock them for testing:

from understudy.mocks import MockToolkit

mocks = MockToolkit()

@mocks.handle("lookup_order")
def lookup_order(order_id: str) -> dict:
    return {"order_id": order_id, "items": [...], "status": "delivered"}

@mocks.handle("create_return")
def create_return(order_id: str, item_sku: str, reason: str) -> dict:
    return {"return_id": "RET-001", "status": "created"}

3. Write a scene

Create scenes/return_backpack.yaml:

id: return_eligible_backpack
description: Customer wants to return a backpack

starting_prompt: "I'd like to return an item please."
conversation_plan: |
  Goal: Return the hiking backpack from order ORD-10031.
  - Provide order ID when asked
  - Return reason: too small

persona: cooperative
max_turns: 15

expectations:
  required_tools:
    - lookup_order
    - create_return
  allowed_terminal_states:
    - return_created

4. Run simulation

from understudy import Scene, run, check

scene = Scene.from_file("scenes/return_backpack.yaml")
trace = run(app, scene, mocks=mocks)

assert trace.called("lookup_order")
assert trace.called("create_return")
assert trace.terminal_state == "return_created"

Or with pytest (define app and mocks fixtures in conftest.py):

pytest test_returns.py -v

CLI Commands

After running simulations, use the CLI to inspect results:

# List all saved runs
understudy list

# Show aggregate metrics (pass rate, avg turns, tool usage, terminal states)
understudy summary

# Show details for a specific run
understudy show <run_id>

# Generate static HTML report
understudy report --output report.html

# Start interactive report browser
understudy serve --port 8080

# Delete runs
understudy delete <run_id>
understudy clear

LLM Judges

For qualities that can't be checked deterministically:

from understudy.judges import Judge

empathy_judge = Judge(
    rubric="The agent acknowledged frustration and was empathetic while enforcing policy.",
    samples=5,
)

result = empathy_judge.evaluate(trace)
assert result.score == 1

Built-in rubrics:

from understudy.judges import (
    TOOL_USAGE_CORRECTNESS,
    POLICY_COMPLIANCE,
    TONE_EMPATHY,
    ADVERSARIAL_ROBUSTNESS,
    TASK_COMPLETION,
)

Report Contents

The understudy summary command shows:

  • Pass rate - percentage of scenes that passed all expectations
  • Avg turns - average conversation length
  • Tool usage - distribution of tool calls across runs
  • Terminal states - breakdown of how conversations ended
  • Agents - which agents were invoked

The HTML report (understudy report) includes:

  • All metrics above
  • Full conversation transcripts
  • Tool call details with arguments
  • Expectation check results
  • Judge evaluation results (when used)

Documentation

See the full documentation for:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

understudy-0.1.0.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

understudy-0.1.0-py3-none-any.whl (34.9 kB view details)

Uploaded Python 3

File details

Details for the file understudy-0.1.0.tar.gz.

File metadata

  • Download URL: understudy-0.1.0.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for understudy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 298ffd0b1eb246349669d45707eb8e3482d87d27cc9a5e0d6a1a8147f78488d0
MD5 1b8269be0c3562a61301ac1700407a4d
BLAKE2b-256 ccff874fb7a92b482d22ab3bc029181276a25647765911f7d6a9919a172dd170

See more details on using hashes here.

Provenance

The following attestation bundles were made for understudy-0.1.0.tar.gz:

Publisher: python-publish.yml on gojiplus/understudy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file understudy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: understudy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for understudy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e4ac856e8d3a5f1210198a77f912e0ed422f97cb3ec5b4d53aa610312750325
MD5 da4dfb3f36ba949084fc9e5d4dd90093
BLAKE2b-256 d2bb67b8d71ac6716bcac07eb82b7fd76e9606def9ed4b7e40e50a55ab0b9669

See more details on using hashes here.

Provenance

The following attestation bundles were made for understudy-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on gojiplus/understudy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page