Evaluation framework for LLM Workers

These details have not been verified by PyPI

Project links

Project description

llm-workers-evaluation

Evaluation framework for LLM Workers.

Overview

llm-workers-evaluation provides tools for running evaluation suites against LLM scripts and reporting scores.

llm-workers-evaluate: CLI tool for running evaluation suites

Installation

pip install llm-workers-evaluation

This will install llm-workers (core) as a dependency.

Usage

Running Evaluations

# Basic usage
llm-workers-evaluate my-script.yaml my-suite.yaml

# With custom iteration count
llm-workers-evaluate -n 5 my-script.yaml my-suite.yaml

# With verbose output
llm-workers-evaluate --verbose my-script.yaml my-suite.yaml

# With debug mode
llm-workers-evaluate --debug my-script.yaml my-suite.yaml

Evaluation Suite Format

Evaluation suites are YAML files defining tests that return scores between 0.0 and 1.0:

shared:
  data:
    expected: "hello"
  tools: []

iterations: 10

suites:
  basic:
    data: {}
    tools: []
    tests:
      always-pass:
        do:
          eval: 1.0
      always-fail:
        do:
          eval: 0.0
      conditional:
        data:
          value: "hello"
        do:
          eval: "${1.0 if value == expected else 0.0}"

Output Format

Results are output as YAML:

final_score: 0.75
per_suite:
  basic:
    final_score: 0.75
    per_test:
      always-pass: 1.0
      always-fail: 0.0
      conditional: 1.0

Score Handling

Tests must return a float between 0.0 and 1.0
None results are treated as 0.0
Non-numeric results are treated as 0.0
Scores below 0.0 are clamped to 0.0
Scores above 1.0 are clamped to 1.0
Exceptions during test execution result in 0.0 for that iteration

Data and Tool Merging

Data and tools are merged in order: shared -> suite -> test

For data: later values override earlier ones
For tools: lists are concatenated

Documentation

Full documentation: https://mrbagheera.github.io/llm-workers/

License

See main repository for license information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.4

Mar 4, 2026

1.1.3

Feb 27, 2026

This version

1.1.2

Feb 27, 2026

1.1.1

Feb 25, 2026

1.1.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_workers_evaluation-1.1.2.tar.gz (6.8 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_workers_evaluation-1.1.2-py3-none-any.whl (9.0 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file llm_workers_evaluation-1.1.2.tar.gz.

File metadata

Download URL: llm_workers_evaluation-1.1.2.tar.gz
Upload date: Feb 27, 2026
Size: 6.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/23.6.0

File hashes

Hashes for llm_workers_evaluation-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`c281660097df41e5ca232ff3c82a6315559c90af64d1c5f0ec8f0f13e07ce061`
MD5	`64973be89c007c06eea76545a03a171d`
BLAKE2b-256	`72459df32f774c1a75026943c8d46f12e6410a098e3a929e0b6536038b85ab5f`

See more details on using hashes here.

File details

Details for the file llm_workers_evaluation-1.1.2-py3-none-any.whl.

File metadata

Download URL: llm_workers_evaluation-1.1.2-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 9.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/23.6.0

File hashes

Hashes for llm_workers_evaluation-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f79b09c4413cff5fba92b7f8c8f9838d8a66748ba13b5d258d0a8e1c502f1986`
MD5	`ee2b8e6d36623091569b0239e2c9597a`
BLAKE2b-256	`f16cb3ba603ce30c0e91ec6e047285343ab0e5efc2fc38077699a480d633c184`

See more details on using hashes here.

llm-workers-evaluation 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llm-workers-evaluation

Overview

Installation

Usage

Running Evaluations

Evaluation Suite Format

Output Format

Score Handling

Data and Tool Merging

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes