3 projects
truthbench
A pipeline-based framework to evaluate factual consistency metrics.
truthscore
A fast, modular reimplementation of RAGAS's FactualCorrectness metric, supporting both open-weight and dedicated LLMs.
mab-forge
Toolset for generating Multi-Armed Bandit problems according to a user-defined difficulty