Last released Jan 2, 2026
Strict, auditable HumanEval benchmark for GGUF models via llama.cpp
Supported by