Skip to main content

Evaluation harness for the MedEval triage agent. Scores accuracy, safety, hallucination, cost, and latency.

Project description

medeval-harness

Evaluation harness for the MedEval triage agent.

Scores a running MedEval agent against a 50-case ESI dataset on:

  • Exact and adjacent ESI level accuracy
  • Under-triage and over-triage rates (the safety metrics)
  • Hallucination rate (LLM-extracted facts unsupported by complaint text)
  • Decision-path consistency
  • Cost per case and per-evaluation
  • Latency (p50 / p95)

Install

pip install medeval-harness

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medeval_harness-0.1.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medeval_harness-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file medeval_harness-0.1.0.tar.gz.

File metadata

  • Download URL: medeval_harness-0.1.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for medeval_harness-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3debe19f34388a2e26956416828a66b4715bfea06fe89f59014919ac94e61c7b
MD5 aa336ca15c156f355e50b947db22f3e2
BLAKE2b-256 92afeff1cbada98e86e4b3115036b83f55b77fdf4b42129869dd58c8b916b0ce

See more details on using hashes here.

File details

Details for the file medeval_harness-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for medeval_harness-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23ccb0f9331c0c8843aba300c223d1bd4eb635bfb81dbfe7a903dc4c2a1ead4c
MD5 8eb536b9fe674646839b7adf29f08d5f
BLAKE2b-256 2ed105ce2d0411d461b9970292a21bccedce12b80d24ba14e3be344ac1abec64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page