2 projects
llm-benchmark-toolkit
Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.
ai-safety-tester
LLM security testing framework with CVE-style severity scoring and multi-model benchmarking