Skip to main content

Lightweight SDK for building custom agentevals evaluators

Project description

agentevals-evaluator-sdk

Lightweight SDK for building custom agentevals evaluators.

An evaluator is a standalone program that scores agent traces. It reads EvalInput JSON from stdin and writes EvalResult JSON to stdout. This SDK provides the Python types and a @evaluator decorator that handles all the plumbing.

Installation

pip install agentevals-evaluator-sdk

Usage

from agentevals_evaluator_sdk import evaluator, EvalInput, EvalResult

@evaluator
def my_evaluator(input: EvalInput) -> EvalResult:
    scores = []
    for inv in input.invocations:
        score = 1.0 if inv.final_response else 0.0
        scores.append(score)

    return EvalResult(
        score=sum(scores) / len(scores) if scores else 0.0,
        per_invocation_scores=scores,
    )

if __name__ == "__main__":
    my_evaluator.run()

The @evaluator decorator marks your function as a runnable evaluator. Call .run() to execute it as a stdin/stdout script -- it reads JSON from stdin, calls your function, and writes the result to stdout. The decorated function can still be called directly in tests.

Types

  • EvalInput -- input payload with metric_name, threshold, config, invocations, and optional expected_invocations
  • EvalResult -- output payload with score (0.0-1.0), optional status, per_invocation_scores, and details (dict)
  • InvocationData -- a single agent turn with user_content, final_response, and intermediate_steps
  • IntermediateStepData -- the steps between user input and final response: tool_calls and tool_responses

Documentation

See the custom evaluators documentation for the full protocol reference and examples in other languages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentevals_evaluator_sdk-0.1.1.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentevals_evaluator_sdk-0.1.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file agentevals_evaluator_sdk-0.1.1.tar.gz.

File metadata

  • Download URL: agentevals_evaluator_sdk-0.1.1.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agentevals_evaluator_sdk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5d69872bc94c5f1d213a9156f2c3205ad9c89c9e37ffd49435070ac6dcc791c6
MD5 fcc0d371ab602a05d6461b49e7c931a2
BLAKE2b-256 00ed737397ee14c859dd08056949a9e6dccef2f96b3219809c021984f3113b64

See more details on using hashes here.

File details

Details for the file agentevals_evaluator_sdk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: agentevals_evaluator_sdk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agentevals_evaluator_sdk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7169549f9deaa8d659beeaea36268fbd72035dd59baba6f128da2234992ccdd
MD5 3771fc5b26eb2c0f20366623abc6b96c
BLAKE2b-256 a92183d22806c064950f07bd4f4c6764fde77ab677e08453d01c1786ef93bcbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page