Skip to main content

Testing and validation framework for monocle AI agent tracing

Project description

Monocle Test Tools

A comprehensive testing and validation framework for monocle AI agent tracing. This package provides tools for validating agent behavior, tool invocations, inference responses, and overall AI system performance.

Features

  • Agentic Response: Verify that agent request got the appropreate response.
  • Agent Invocation: Verify that specific agents are invoked and delegate tasks correctly.
  • Tool Validation: Ensure tools are called with expected inputs and produce expected outputs
  • Inference Testing: Test model inference responses against expected schemas or content
  • Cost/Performance/Quality: Verify token usage, error states, warnings
  • Evaluation: Integrate with any third party or custom evaluation tools to validate LLM responses

How does it work

The test tool runs your agent or workflow code with Monocle instrumentation enabled. It examines the traces generated by the genAI components used in your code (eg Google ADK, LangGraph etc) and verifies the test conditions you want to validated.

Installation

pip install monocle_test_tools

Quick Start

Here's a test that executes a root_travel_agent with a few inputs and validates it's response and tools invoked.

from monocle_test_tools import TestCase, MonocleValidator
from adk_travel_agent import root_travel_agent

# Test cases for testing travel booking agent
agent_test_cases:list[TestCase] = [
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_output": "A flight from San Francisco to Mumbai has been booked, along with a four night stay in a two queen room at the Marriot Intercontinental in Juhu, Mumbai, starting November 27th, 2025.",
        "comparer": "similarity",
    },
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_spans": [
            {
            "span_type": "agentic.tool.invocation",
            "entities": [
                {"type": "tool", "name": "adk_book_hotel"},
                {"type": "agent", "name": "adk_hotel_booking_agent"}
            ],
        }
        ]
    }
]

# Run test cases using Monocle test framework
@MonocleValidator().monocle_testcase(agent_test_cases)
async def test_run_workflows(my_test_case: TestCase):
   await MonocleValidator().test_workflow_async(root_travel_agent, my_test_case)

if __name__ == "__main__":
    pytest.main([__file__]) 

Test format

Testcase

A TestCase defines the input, expected output, and evaluation criteria for testing
AI agent behaviors. It can contain multiple test spans representing different 
interaction points (tool invocations, agent delegations, etc.) within the test.

Each test case can specify comparison methods for evaluating test results against 
expected outcomes and can be configured to expect certain errors or warnings.
{
    "test_input": "Input data provided to the test case, can be a prompt or structured data.",
    "test_output": "Expected output that the test should produce.",
    "comparer": "Method used to compare actual results with expected results. The default comparer is does exact match. The 'similarty' comparer does a fuzzy match using bert score",
    "test_spans": "Array of TestSpan objects defining specific interactions to test."
}

TestSpan

Represents a specific interaction or event within a test case in the Monocle testing framework.

A TestSpan defines a single testable unit of interaction such as a tool invocation,
agent delegation, or inference process. Each span captures the entities involved,
inputs and outputs, and validation criteria for that specific interaction.

Test spans enforce specific validation rules based on their type. For example:
- Tool invocation spans must include at least one tool entity
- Agentic delegation spans must include at least two agent entities (delegator and delegatee)
- Agentic invocation spans must include at least one agent entity
    "span_type": "Type of interaction this span represents (e.g., tool invocation, agent delegation)"
    "entities": "List of entities (tools, agents) involved in this interaction. Each entity has two attributes, name and type. The type can be 'tool' or 'agent' or 'inference'"
    "input": "Input provided to this interaction"
    "output": "Expected output from this interaction"
    "test_type": "Whether this is a 'positive' (expected to succeed) or 'negative' (expected to fail) test"

Check out these examples of test cases.

.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monocle_test_tools-0.6.0.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

monocle_test_tools-0.6.0-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file monocle_test_tools-0.6.0.tar.gz.

File metadata

  • Download URL: monocle_test_tools-0.6.0.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for monocle_test_tools-0.6.0.tar.gz
Algorithm Hash digest
SHA256 8bf6a13d6bf629ff0392853b7c46f80e43ed3202987740e49a7ce0a2f6071f5f
MD5 7f879aacb8430889f09539101b04016b
BLAKE2b-256 be964cfc20dbf5e5c6e8b25b718a1a62f536d99d8420aad75134882ca66ca406

See more details on using hashes here.

File details

Details for the file monocle_test_tools-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for monocle_test_tools-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2df89e7276ef4697fde1920e4aa8ec42684c02020640c3c519fd4093001f4b2
MD5 8eed9180109e61b556baabfebd868265
BLAKE2b-256 3673e4a232f1341d9f9dcd178a5289a28fe048b5c7053ce1cd9c66957b0d3c1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page