Last released May 18, 2026
Cost-vs-accuracy CI for LLM ops. Pick the cheapest API tier, compare self-hosted vLLM vs cloud APIs on one Pareto, and grade open-ended outputs with an LLM-as-judge scorer — all on your own data with Wilson 95% CIs.
Supported by