Evaluation framework for LLM Workers
Project description
llm-workers-evaluation
Evaluation framework for LLM Workers.
Overview
llm-workers-evaluation provides tools for running evaluation suites against LLM scripts and reporting scores.
- llm-workers-evaluate: CLI tool for running evaluation suites
Installation
pip install llm-workers-evaluation
This will install llm-workers (core) as a dependency.
Usage
Running Evaluations
# Basic usage
llm-workers-evaluate my-script.yaml my-suite.yaml
# With custom iteration count
llm-workers-evaluate -n 5 my-script.yaml my-suite.yaml
# With verbose output
llm-workers-evaluate --verbose my-script.yaml my-suite.yaml
# With debug mode
llm-workers-evaluate --debug my-script.yaml my-suite.yaml
Evaluation Suite Format
Evaluation suites are YAML files defining tests that return scores between 0.0 and 1.0:
shared:
data:
expected: "hello"
tools: []
iterations: 10
suites:
basic:
data: {}
tools: []
tests:
always-pass:
do:
eval: 1.0
always-fail:
do:
eval: 0.0
conditional:
data:
value: "hello"
do:
eval: "${1.0 if value == expected else 0.0}"
Output Format
Results are output as YAML:
final_score: 0.75
per_suite:
basic:
final_score: 0.75
per_test:
always-pass: 1.0
always-fail: 0.0
conditional: 1.0
Score Handling
- Tests must return a float between 0.0 and 1.0
Noneresults are treated as 0.0- Non-numeric results are treated as 0.0
- Scores below 0.0 are clamped to 0.0
- Scores above 1.0 are clamped to 1.0
- Exceptions during test execution result in 0.0 for that iteration
Data and Tool Merging
Data and tools are merged in order: shared -> suite -> test
- For data: later values override earlier ones
- For tools: lists are concatenated
Documentation
Full documentation: https://mrbagheera.github.io/llm-workers/
License
See main repository for license information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_workers_evaluation-1.1.4.tar.gz.
File metadata
- Download URL: llm_workers_evaluation-1.1.4.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdae18df24bb1cb1e852e8f0174ff48e4f751b6e1a58541510b3fa1d09379f0d
|
|
| MD5 |
a77d1c50351ecf705e838f28b030e242
|
|
| BLAKE2b-256 |
d8785f525d7ecd46734c45b710461dab848e34305e70dc7c6aa1bd132d2edd95
|
File details
Details for the file llm_workers_evaluation-1.1.4-py3-none-any.whl.
File metadata
- Download URL: llm_workers_evaluation-1.1.4-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.4 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca6e968604478ade05bbd7efc97a9c34b8a697f93287c85fa978405ae8085b74
|
|
| MD5 |
3e0e389a66535b3cdedf9532ee9f10be
|
|
| BLAKE2b-256 |
567438e194b3bc4f3a1d6fcec0be53afe07ed513161c830ffe804ad1ee184fe6
|