MCP pipeline evaluation toolkit — grade AI agent workflows on accuracy, cost, and reliability

These details have not been verified by PyPI

Project links

Project description

bifrost-eval

MCP pipeline evaluation toolkit — grade AI agent workflows on accuracy, cost, and reliability.

What It Does

bifrost-eval evaluates multi-agent MCP pipelines as complete workflows, not just individual prompts. It answers:

Did the pipeline get the right answer? (accuracy scoring)
Did agents use the right tools in the right order? (tool correctness)
How fast was it? (latency breakdown per agent/tool)
How much did it cost? (cost attribution per agent/tool)
How do different configurations compare? (A/B testing)

Install

pip install bifrost-eval

With agent-mcp-framework integration:

pip install bifrost-eval[amf]

Quick Start

import asyncio
from bifrost_eval import (
    AccuracyMetric,
    CostEfficiencyMetric,
    EvalRunner,
    EvalSuite,
    LatencyMetric,
    Scenario,
    ToolCorrectnessMetric,
)

# Define test scenarios
suite = EvalSuite(
    name="my-agent-eval",
    scenarios=[
        Scenario(
            name="basic-query",
            input_data={"query": "What is 2+2?"},
            expected_output=4,
            expected_tool_calls=["calculator"],
        ),
    ],
)

# Implement PipelineExecutor protocol for your agent
class MyExecutor:
    async def execute(self, scenario):
        from bifrost_eval import ExecutionTrace
        # Run your agent pipeline here
        return ExecutionTrace(output=4, success=True)

# Run evaluation
runner = EvalRunner(
    executor=MyExecutor(),
    metrics=[
        AccuracyMetric(weight=2.0),
        ToolCorrectnessMetric(weight=1.0),
        LatencyMetric(target_ms=5000),
        CostEfficiencyMetric(budget_usd=0.10),
    ],
)

result = asyncio.run(runner.run_suite(suite))
print(f"Pass rate: {result.pass_rate:.0%}")
print(f"Grade: {result.grade.value}")
print(f"Total cost: ${result.total_cost.total_usd:.4f}")

A/B Comparison

from bifrost_eval.adapters.comparison import ComparisonRunner

comparator = ComparisonRunner(metrics=[AccuracyMetric(), CostEfficiencyMetric()])
result = await comparator.compare(
    suite,
    {"config-a": executor_a, "config-b": executor_b},
)
print(f"Winner: {result.winner}")

agent-mcp-framework Integration

from agent_mcp_framework import SequentialPipeline
from bifrost_eval.adapters.amf_adapter import AMFAdapter

pipeline = SequentialPipeline("my-pipeline", agents=[...])
adapter = AMFAdapter(pipeline)
runner = EvalRunner(executor=adapter, metrics=[...])

CLI

# Validate a suite file
bifrost-eval validate suite.json

# Show version
bifrost-eval --version

Metrics

Metric	What It Measures	Default Weight
`AccuracyMetric`	Output correctness	1.0
`ToolCorrectnessMetric`	Right tools, right order	1.0
`LatencyMetric`	Speed vs target	1.0
`CostEfficiencyMetric`	Cost vs budget	1.0

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bifrost_eval-0.1.0.tar.gz (23.2 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bifrost_eval-0.1.0-py3-none-any.whl (15.2 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file bifrost_eval-0.1.0.tar.gz.

File metadata

Download URL: bifrost_eval-0.1.0.tar.gz
Upload date: Feb 12, 2026
Size: 23.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for bifrost_eval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`34cfd34d3f369458a5b3cedb0320978f4fa11073e850c20e2cd8dabdfa46d1cd`
MD5	`8f858bea5f9489af621deec9e86d453d`
BLAKE2b-256	`b003c8aee55c6c92a8e1f91c39212213013f8e2147765937b4061bdff38087a3`

See more details on using hashes here.

File details

Details for the file bifrost_eval-0.1.0-py3-none-any.whl.

File metadata

Download URL: bifrost_eval-0.1.0-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for bifrost_eval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a03cc7a4ebfc2323641c7e6b6570e4f30873a70baed17236939eebafaa751926`
MD5	`17c9b088893a3b82a194b9db448a1adb`
BLAKE2b-256	`b0a03d348dbbfa4d32b73123ed8ad8bf3f18fb52be3f1ce34b7865d8026a0ac1`

See more details on using hashes here.

bifrost-eval 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bifrost-eval

What It Does

Install

Quick Start

A/B Comparison

agent-mcp-framework Integration

CLI

Metrics

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes