Service for computing metrics on AI agent telemetry data

These details have not been verified by PyPI

Project description

Metric Computation Engine

The Metric Computation Engine (MCE) is a tool for computing metrics from observability telemetry collected from our instrumentation SDK (https://github.com/agntcy/observe). The list of currently supported metrics is defined below, but the MCE was designed to make it easy to implement new metrics and extend the library over time.

The MCE is available as a Docker image for service deployment or as a Python package for direct integration. It can also be installed manually, as described below.

Supported metrics

Metrics can be computed at three levels of aggregation: span level, session level and population level (which is a batch of sessions).

The current supported metrics are listed in the table below, along with their aggregation levels.

Metric Name	Aggregation level
Tool Utilisation Accuracy	Span
Tool Error	Span
Tool Error Rate	Session
Groundedness	Session
Agent to Agent Interactions	Session
Agent to Tool Interactions	Session
Cycles Count	Session

Tool Utilization Accuracy: Measures the application's ability to select and use the appropriate tools efficiently.

Tool Error: Indicates whether a tool failed or not.

Tool Error Rate: Measures the rate of tool errors throughout a session.

Groundedness: Measures how much the output is backed by retrieved documents (in a RAG pipeline).

Agent to Agent Interactions: Counts the interactions between pairs of agents.

Agent to Tool Interactions: Counts the interactions between one agent and a tool.

Cycle Count: Counts how many times an entity returns to the previous entity.

Prerequisites

Instrumentation of agentic apps must be done through AGNTCY's observe SDK as the MCE relies on its observability data schema.

Getting started

Several example scripts are available to help you get started with the MCE.

MCE usage

The MCE can be used in two ways: as a REST API service or as a Python module. Both methods allow you to compute various metrics on your agent telemetry data. The preferred usage for the MCE is to deploy it as a service.

There are three main input parameters to the MCE, as you will see in the above test code: metrics, llm_judge_config, and batch_config.

1. Metrics Parameter

The metrics parameter is a list of metric names that you want to compute. Each metric operates at different levels (span, session, or population) and may have different computational requirements. You can specify any combination of the available metrics:

"metrics": [
    "ToolUtilizationAccuracy",
    "ToolError",
    "ToolErrorRate",
    "AgentToToolInteractions",
    "AgentToAgentInteractions",
    "CyclesCount",
    "Groundedness",
]

2. LLM Judge Config

The llm_judge_config parameter configures the LLM used for metrics that require LLM-as-a-Judge evaluation (such as ToolUtilizationAccuracy and Groundedness):

"llm_judge_config": {
    "OPENAI_API_KEY": "your_openai_api_key",
    "LLM_MODEL_NAME": "gpt-4o",
    "LLM_BASE_MODEL_URL": "https://api.openai.com/v1",  # Optional: for custom OpenAI-compatible endpoints
    "CUSTOM_API_KEY": "",      # Optional: for custom API endpoints
}

Configuration options:

OPENAI_API_KEY: Your OpenAI API key for using GPT models
LLM_MODEL_NAME: The specific model to use (e.g., "gpt-4o")
LLM_BASE_MODEL_URL: Optional base URL for custom OpenAI-compatible endpoints
CUSTOM_API_KEY: Optional API key for custom model endpoints

Future support is planned for Anthropic, Google Gemini, and Mistral models.

3. Batch Config

The batch_config parameter determines which sessions from your database will be included in the metric computation. You have three options:

Option 1: By Number of Sessions

"batch_config": {
    "num_sessions": 10  # Get the last 10 sessions
}

This retrieves the most recent N agent sessions from the database.

Option 2: By Time Range

"batch_config": {
    "time_range": {
        "start": "2024-01-01T00:00:00Z",
        "end": "2024-12-31T23:59:59Z"
    }
}

This retrieves all agent sessions that occurred within the specified time window.

Option 3: By App Name (Not yet implemented)

"batch_config": {
    "app_name": "my_agent_app"
}

This would retrieve agent sessions associated with a specific application or project name.

Deployment as a service

For easy deployment of the MCE as a service, a docker compose file is provided. This file locally deploys an instance of an OTel collector, an instance of Clickhouse DB, an instance of the API layer, and an instance of the MCE. OTel+Clickhouse is the default setup for retrieving and storing traces from agentic apps. The API layer provides an interface for other components such as the MCE to interact with the corresponding data. The MCE enables developers to measure their agentic applications.

Once deployed, you can generate traces from an agentic app instrumented with our Observe SDK.

API Endpoints

GET / - Returns endpoints
GET /status - Get server status response
POST /compute_metrics - Compute metrics from provided configuration

Manual installation for module usage.

To install MCE manually, you will need:

Python 3.10 or higher
uv package manager

Install uv (if not installed) If you are installing in the OS:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```
or

If you are installing in a virtual environment (mamba, conda):
```
pip install uv
```
Install the package:
```
chmod +x install.sh
./install.sh
```
Set up environment variables:
```
cp .env.example .env
# Edit .env with your API keys and configuration
```
Configure the following variables in your .env file:

Server Configuration:
- HOST - Server host (default: 0.0.0.0)
- PORT - Server port (default: 8000)
- RELOAD - Enable auto-reload (default: false)
- API_BASE_URL - Data API endpoint (default: http://localhost:8080)
LLM Configuration:
- LLM_BASE_MODEL_URL - LLM endpoint (default: https://api.openai.com/v1)
- LLM_MODEL_NAME - LLM Model name (default: gpt-4o)
- OPENAI_API_KEY - OpenAI API key for LLM-based metrics
- ANTHROPIC_API_KEY - Anthropic API key (planned support)
- GEMINI_API_KEY - Google Gemini API key (planned support)
- MISTRAL_API_KEY - Mistral API key (planned support)
- CUSTOM_API_KEY - Custom Deployment API key (i.e. Azure, Bedrock, etc)

Run the server:

source .venv/bin/activate
mce-server

 .venv/bin/activate
uv run --env-file .env  mce-server

The server will be available at http://localhost:8000 This assumes that you have the API layer deployed at the address defined through the env variable API_BASE_URL.

Running Unit Tests

This project uses pytest for running unit tests.

Run All Tests:
```
uv run pytest
```
Run Tests in a Specific Folder:
```
uv run pytest tests/test_metrics
```

Run a Specific Test File:

uv run pytest tests/mce_tests/test_metrics/session/test_agent_to_tool_interactions.py

Contributing

Contributions are welcome! Please follow these steps to contribute:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
Programming Language

Release history Release notifications | RSS feed

This version

0.1.0

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ioa_metrics_computation_engine-0.1.0.tar.gz (242.6 kB view details)

Uploaded Jul 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ioa_metrics_computation_engine-0.1.0-py3-none-any.whl (107.3 kB view details)

Uploaded Jul 17, 2025 Python 3

File details

Details for the file ioa_metrics_computation_engine-0.1.0.tar.gz.

File metadata

Download URL: ioa_metrics_computation_engine-0.1.0.tar.gz
Upload date: Jul 17, 2025
Size: 242.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ioa_metrics_computation_engine-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0f980a1a63c102f2fd04dcdd3a25c9c89bacbcf2e717bf977317bf0d6f16992f`
MD5	`cf0535fd4ff4583c987ff00bb5d56ab5`
BLAKE2b-256	`86fb2c3c693a272ddf48b53efd67bb911799c5bed95d20ec5da1b7a8e8632d61`

See more details on using hashes here.

File details

Details for the file ioa_metrics_computation_engine-0.1.0-py3-none-any.whl.

File metadata

Download URL: ioa_metrics_computation_engine-0.1.0-py3-none-any.whl
Upload date: Jul 17, 2025
Size: 107.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ioa_metrics_computation_engine-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c4c8feca778b3d8f425a75ff6ecf5c4999ecb71354c926aaacdbfed0951ce07`
MD5	`2c361b9859c91a8aa1f800b9de3fd1b0`
BLAKE2b-256	`3f0439c6cfbbbf760c7cbb78e3b4fa0b396a49fea12abbca35135d0b558a579b`

See more details on using hashes here.

ioa-metrics-computation-engine 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Metric Computation Engine

Supported metrics

Prerequisites

Getting started

MCE usage

1. Metrics Parameter

2. LLM Judge Config

3. Batch Config

Deployment as a service

Manual installation for module usage.

Running Unit Tests

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes