13 projects
llm-sentry
Unified AI reliability platform. One install, 12 diagnostic engines. Continuous monitoring, fault diagnosis, and compliance for LLM pipelines.
rag-pathology
Diagnose RAG pipeline failures by type and location. Four Soils classification. Epistemic mismatch detection.
agent-patrol
Runtime pathology detection for AI agents. Diagnoses loops, stalls, oscillation, drift, and silent abandonment.
chain-probe
Step-level semantic fault isolation for multi-step LLM pipelines. CASCADE analysis. Framework-agnostic.
context-recall
Test whether your LLM retrieves information from every position in its context window.
drift-sentinel
Verify that pull requests do what they say. Git-native PR semantic intent verifier.
inject-lock
Git-native prompt regression testing with judge calibration for LLM CI/CD pipelines
prompt-brittleness
Catch brittle prompts before production does — brittleness score and CI gate for LLM prompts under paraphrase
llm-contract
Define, version, and enforce behavioral contracts on LLM function calls
spec-drift
Semantic specification drift detector for LLM outputs — catches semantic violations Pydantic cannot see.
llm-mutation
Mutation testing for LLM prompts. Find the gaps in your eval suite before production does.
model-parity
Certify your replacement LLM is behaviorally equivalent before you migrate. 7 dimensions. YAML test suites. Parity certificate. CI gate.
cot-coherence
Detect silent incoherence in AI chain-of-thought reasoning