Skip to main content

CLI tool that evaluates LLM outputs from production logs against a dual-dimension rubric.

Project description

eval-harness

A Python CLI that evaluates LLM outputs from production logs against a dual-dimension rubric (faithfulness + task completion).

Install

pip install -e ".[dev]"

Quickstart

export OPENRIXER_API_KEY=sk-or-...
eval-harness run path/to/logs.jsonl --judge meta-llama/llama-3.1-8b-instruct:free

Input JSONL schema:

{"input": "user prompt", "output": "model response", "reference": "optional ground truth"}

Commands

  • eval-harness run <file> — ingest, evaluate, and report
  • eval-harness judges — list free judge models (cached in ~/.eval-harness/judges.json)
  • eval-harness report --run-id UUID — show a stored run
  • eval-harness export --run-id UUID --format json|csv --output-file PATH
  • eval-harness cache [--stats] [--clear]

Exit codes: 0 all pass, 1 any failures, 2 evaluator error.

CI/CD example

- run: pip install eval-harness
- run: OPENRIXER_API_KEY=${{ secrets.OPENRIXER_API_KEY }} eval-harness run eval/cases.jsonl --pass-threshold 0.7

Development

pip install -e ".[dev]"
ruff check src tests && ruff format --check src tests
pytest tests/ -v --cov=src

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eval_harness_oni-0.1.0.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eval_harness_oni-0.1.0-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file eval_harness_oni-0.1.0.tar.gz.

File metadata

  • Download URL: eval_harness_oni-0.1.0.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for eval_harness_oni-0.1.0.tar.gz
Algorithm Hash digest
SHA256 593a332b2ed351c171ba2fed2b5ccf3214b53ba5efc672af3d1cd180557b0296
MD5 e585d8a39c804ef42449ff15f868368b
BLAKE2b-256 83f2a1bdc317b33b9bb552991ed1fd153320336a447c06607c60d5eb245b39e4

See more details on using hashes here.

File details

Details for the file eval_harness_oni-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for eval_harness_oni-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 234ddf6204ec317f6758ed0b9930f8a66fa07b3ef1e4417092046bf41092ddae
MD5 ca242be12d7ed4e7e08ecebf71bf85cd
BLAKE2b-256 677fabfbc88809e9803a942f0e1314729481356cc6f5da77c82862296dee2602

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page