Skip to main content

Testing and validation framework for monocle AI agent tracing

Project description

Monocle Test Tools

A comprehensive testing and validation framework for monocle AI agent tracing. This package provides tools for validating agent behavior, tool invocations, inference responses, and overall AI system performance.

Features

  • Agentic Response: Verify that agent request got the appropreate response.
  • Agent Invocation: Verify that specific agents are invoked and delegate tasks correctly.
  • Tool Validation: Ensure tools are called with expected inputs and produce expected outputs
  • Inference Testing: Test model inference responses against expected schemas or content
  • Cost/Performance/Quality: Verify token usage, error states, warnings
  • Evaluation: Integrate with any third party or custom evaluation tools to validate LLM responses

How does it work

The test tool runs your agent or workflow code with Monocle instrumentation enabled. It examines the traces generated by the genAI components used in your code (eg Google ADK, LangGraph etc) and verifies the test conditions you want to validated.

Installation

pip install monocle_test_tools

Quick Start

Here's a test that executes a root_travel_agent with a few inputs and validates it's response and tools invoked.

from monocle_test_tools import TestCase, MonocleValidator
from adk_travel_agent import root_travel_agent

# Test cases for testing travel booking agent
agent_test_cases:list[TestCase] = [
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_output": "A flight from San Francisco to Mumbai has been booked, along with a four night stay in a two queen room at the Marriot Intercontinental in Juhu, Mumbai, starting November 27th, 2025.",
        "comparer": "similarity",
    },
    {
        "test_input": ["Book a flight from San Francisco to Mumbai for 26th Nov 2025. Book a two queen room at Marriot Intercontinental at Juhu, Mumbai for 27th Nov 2025 for 4 nights."],
        "test_spans": [
            {
            "span_type": "agentic.tool.invocation",
            "entities": [
                {"type": "tool", "name": "adk_book_hotel"},
                {"type": "agent", "name": "adk_hotel_booking_agent"}
            ],
        }
        ]
    }
]

# Run test cases using Monocle test framework
@MonocleValidator().monocle_testcase(agent_test_cases)
async def test_run_workflows(my_test_case: TestCase):
   await MonocleValidator().test_workflow_async(root_travel_agent, my_test_case)

if __name__ == "__main__":
    pytest.main([__file__]) 

Test format

Testcase

A TestCase defines the input, expected output, and evaluation criteria for testing
AI agent behaviors. It can contain multiple test spans representing different 
interaction points (tool invocations, agent delegations, etc.) within the test.

Each test case can specify comparison methods for evaluating test results against 
expected outcomes and can be configured to expect certain errors or warnings.
{
    "test_input": "Input data provided to the test case, can be a prompt or structured data.",
    "test_output": "Expected output that the test should produce.",
    "comparer": "Method used to compare actual results with expected results. The default comparer is does exact match. The 'similarty' comparer does a fuzzy match using bert score",
    "test_spans": "Array of TestSpan objects defining specific interactions to test."
}

TestSpan

Represents a specific interaction or event within a test case in the Monocle testing framework.

A TestSpan defines a single testable unit of interaction such as a tool invocation,
agent delegation, or inference process. Each span captures the entities involved,
inputs and outputs, and validation criteria for that specific interaction.

Test spans enforce specific validation rules based on their type. For example:
- Tool invocation spans must include at least one tool entity
- Agentic delegation spans must include at least two agent entities (delegator and delegatee)
- Agentic invocation spans must include at least one agent entity
    "span_type": "Type of interaction this span represents (e.g., tool invocation, agent delegation)"
    "entities": "List of entities (tools, agents) involved in this interaction. Each entity has two attributes, name and type. The type can be 'tool' or 'agent' or 'inference'"
    "input": "Input provided to this interaction"
    "output": "Expected output from this interaction"
    "test_type": "Whether this is a 'positive' (expected to succeed) or 'negative' (expected to fail) test"

Check out these examples of test cases.

.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

monocle_test_tools-0.7.3.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

monocle_test_tools-0.7.3-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file monocle_test_tools-0.7.3.tar.gz.

File metadata

  • Download URL: monocle_test_tools-0.7.3.tar.gz
  • Upload date:
  • Size: 38.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for monocle_test_tools-0.7.3.tar.gz
Algorithm Hash digest
SHA256 97a7296cc625c0a8df1548965f77806c8e7dfcd9afe2c9a18051193c9b186153
MD5 0e8ce63a9a22cd0abdfac9fa68cf196a
BLAKE2b-256 0557d97f7e0c3c4bcb0d1ff36011d220778c970775caa6265143b2e8d7331b69

See more details on using hashes here.

File details

Details for the file monocle_test_tools-0.7.3-py3-none-any.whl.

File metadata

File hashes

Hashes for monocle_test_tools-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a64c0d724aa8f54c967247d4a900c338cb0612daab52949e76493a84560a77d7
MD5 90db82ef02107de4c0a38b6aa70817d5
BLAKE2b-256 9b36d0123969e6ca5f5e8d4d916b705ce47709e6ccdd23da9817ef96a43a147f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page