Skip to main content

Evaluation framework for LLM Workers

Project description

llm-workers-evaluation

Evaluation framework for LLM Workers.

Overview

llm-workers-evaluation provides tools for running evaluation suites against LLM scripts and reporting scores.

  • llm-workers-evaluate: CLI tool for running evaluation suites

Installation

pip install llm-workers-evaluation

This will install llm-workers (core) as a dependency.

Usage

Running Evaluations

# Basic usage
llm-workers-evaluate my-script.yaml my-suite.yaml

# With custom iteration count
llm-workers-evaluate -n 5 my-script.yaml my-suite.yaml

# With verbose output
llm-workers-evaluate --verbose my-script.yaml my-suite.yaml

# With debug mode
llm-workers-evaluate --debug my-script.yaml my-suite.yaml

Evaluation Suite Format

Evaluation suites are YAML files defining tests that return scores between 0.0 and 1.0:

shared:
  data:
    expected: "hello"
  tools: []

iterations: 10

suites:
  basic:
    data: {}
    tools: []
    tests:
      always-pass:
        do:
          eval: 1.0
      always-fail:
        do:
          eval: 0.0
      conditional:
        data:
          value: "hello"
        do:
          eval: "${1.0 if value == expected else 0.0}"

Output Format

Results are output as YAML:

final_score: 0.75
per_suite:
  basic:
    final_score: 0.75
    per_test:
      always-pass: 1.0
      always-fail: 0.0
      conditional: 1.0

Score Handling

  • Tests must return a float between 0.0 and 1.0
  • None results are treated as 0.0
  • Non-numeric results are treated as 0.0
  • Scores below 0.0 are clamped to 0.0
  • Scores above 1.0 are clamped to 1.0
  • Exceptions during test execution result in 0.0 for that iteration

Data and Tool Merging

Data and tools are merged in order: shared -> suite -> test

  • For data: later values override earlier ones
  • For tools: lists are concatenated

Documentation

Full documentation: https://mrbagheera.github.io/llm-workers/

License

See main repository for license information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_workers_evaluation-1.1.2.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_workers_evaluation-1.1.2-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_workers_evaluation-1.1.2.tar.gz.

File metadata

  • Download URL: llm_workers_evaluation-1.1.2.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/23.6.0

File hashes

Hashes for llm_workers_evaluation-1.1.2.tar.gz
Algorithm Hash digest
SHA256 c281660097df41e5ca232ff3c82a6315559c90af64d1c5f0ec8f0f13e07ce061
MD5 64973be89c007c06eea76545a03a171d
BLAKE2b-256 72459df32f774c1a75026943c8d46f12e6410a098e3a929e0b6536038b85ab5f

See more details on using hashes here.

File details

Details for the file llm_workers_evaluation-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_workers_evaluation-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f79b09c4413cff5fba92b7f8c8f9838d8a66748ba13b5d258d0a8e1c502f1986
MD5 ee2b8e6d36623091569b0239e2c9597a
BLAKE2b-256 f16cb3ba603ce30c0e91ec6e047285343ab0e5efc2fc38077699a480d633c184

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page