Skip to main content

Testing and validation framework for monocle AI agent tracing

Project description

Monocle Test Tools

A comprehensive testing and validation framework for monocle AI agent tracing. This package provides tools for validating agent behavior, tool invocations, inference responses, and overall AI system performance.

Features

  • Agentic Response: Verify that agent request got the appropreate response.
  • Agent Invocation: Verify that specific agents are invoked and delegate tasks correctly.
  • Tool Validation: Ensure tools are called with expected inputs and produce expected outputs
  • Inference Testing: Test model inference responses against expected schemas or content
  • Cost/Performance/Quality: Verify token usage, error states, warnings
  • Evaluation: Integrate with any third party or custom evaluation tools to validate LLM responses

How does it work

The test tool runs your agent or workflow code with Monocle instrumentation enabled. It examines the traces generated by the genAI components used in your code (eg Google ADK, LangGraph etc) and verifies the test conditions you want to validated.

Installation

pip install monocle_test_tools

Quick Start

Here's a test that executes a root_travel_agent with a few inputs and validates it's response and tools invoked.

from monocle_test_tools import TestCase, MonocleValidator
from adk_travel_agent import root_travel_agent

# Test cases for testing travel booking agent
agent_test_cases:list[TestCase] = [
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_output": "A flight from San Francisco to Mumbai has been booked, along with a four night stay in a two queen room at the Marriot Intercontinental in Juhu, Mumbai, starting November 27th, 2025.",
        "comparer": "similarity",
    },
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_spans": [
            {
            "span_type": "agentic.tool.invocation",
            "entities": [
                {"type": "tool", "name": "adk_book_hotel"},
                {"type": "agent", "name": "adk_hotel_booking_agent"}
            ],
        }
        ]
    }
]

# Run test cases using Monocle test framework
@MonocleValidator().monocle_testcase(agent_test_cases)
async def test_run_workflows(my_test_case: TestCase):
   await MonocleValidator().test_workflow_async(root_travel_agent, my_test_case)

if __name__ == "__main__":
    pytest.main([__file__]) 

Test format

Testcase

A TestCase defines the input, expected output, and evaluation criteria for testing
AI agent behaviors. It can contain multiple test spans representing different 
interaction points (tool invocations, agent delegations, etc.) within the test.

Each test case can specify comparison methods for evaluating test results against 
expected outcomes and can be configured to expect certain errors or warnings.
{
    "test_input": "Input data provided to the test case, can be a prompt or structured data.",
    "test_output": "Expected output that the test should produce.",
    "comparer": "Method used to compare actual results with expected results. The default comparer is does exact match. The 'similarty' comparer does a fuzzy match using bert score",
    "test_spans": "Array of TestSpan objects defining specific interactions to test."
}

TestSpan

Represents a specific interaction or event within a test case in the Monocle testing framework.

A TestSpan defines a single testable unit of interaction such as a tool invocation,
agent delegation, or inference process. Each span captures the entities involved,
inputs and outputs, and validation criteria for that specific interaction.

Test spans enforce specific validation rules based on their type. For example:
- Tool invocation spans must include at least one tool entity
- Agentic delegation spans must include at least two agent entities (delegator and delegatee)
- Agentic invocation spans must include at least one agent entity
    "span_type": "Type of interaction this span represents (e.g., tool invocation, agent delegation)"
    "entities": "List of entities (tools, agents) involved in this interaction. Each entity has two attributes, name and type. The type can be 'tool' or 'agent' or 'inference'"
    "input": "Input provided to this interaction"
    "output": "Expected output from this interaction"
    "test_type": "Whether this is a 'positive' (expected to succeed) or 'negative' (expected to fail) test"

Check out these examples of test cases.

.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monocle_test_tools-0.7.5.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

monocle_test_tools-0.7.5-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file monocle_test_tools-0.7.5.tar.gz.

File metadata

  • Download URL: monocle_test_tools-0.7.5.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for monocle_test_tools-0.7.5.tar.gz
Algorithm Hash digest
SHA256 2ea47f5ef18c930371dee5513fe6f7793a5c01af5397f88a5ba1cc0e88162e2a
MD5 2655aaf6de496b677ef90bb1004698b5
BLAKE2b-256 7448ea569b7b3e6f9432c2b3bb40a152389f866b7e8d23c852ba734c37882bee

See more details on using hashes here.

File details

Details for the file monocle_test_tools-0.7.5-py3-none-any.whl.

File metadata

File hashes

Hashes for monocle_test_tools-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3f4dd166af944c802bf11c95514b76715937b577e0c0893467750175af0fabc6
MD5 976fb392b134c6d3742bc3596c52fa7f
BLAKE2b-256 adcbb75550f29581e074240bd6d8f7358152a9f0f592ed4d5328cdb9503c9cb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page