4 projects
inceptbench
Comprehensive benchmark and evaluation framework for educational AI question generation
mirofish-simulator
Agentic student simulation using misconception matching - realistic wrong answers without LLM cheating
incept-eval
Standalone CLI tool for evaluating educational questions with comprehensive AI-powered assessment
agent-sprint-testkit
AgentSprint TestKit - Professional AI agent evaluation with OpenAI Evals integration