3 projects
evalsig
Statistical inference for LLM evaluations: paired tests, clustered SE, MDE, sequential testing, release gating.
mcpolish
Fast static linter for MCP servers. Catches vague, colliding, or misleading tool descriptions before agents pick the wrong tool.
memnex
Cross-channel memory infrastructure for conversational AI agents.