Profile of Avalyset

Some features may not work without JavaScript. Please try enabling it if you encounter problems.

2 projects

Last released Jun 30, 2026

Measure LLM-judge verdict drift across model versions by re-grading a stored Inspect eval log with two graders over the same samples.

Last released Jun 25, 2026

A claim-support / faithfulness scorer for Inspect AI — does the transcript actually substantiate the claimed answer?

Supported by