Skip to main content

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

Project description

Director-AI — Real-time LLM Hallucination Guardrail

Director-AI

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

CI Pre-commit CodeQL Tests PyPI Downloads Coverage Python Ruff mypy Sigstore License: AGPL v3 HF Spaces DOI Docs OpenSSF Best Practices OpenSSF Scorecard REUSE

Active Development — Director-AI is under intensive development. The core guardrail engine, NLI scoring pipeline, 5-SDK guard, FastAPI middleware, REST/gRPC servers, and intent-grounded injection detection are fully functional, tested (4 126+ passing tests, zero functional failures), and production-deployable via PyPI. We are currently adding Rust-accelerated signal paths and expanding adversarial robustness coverage. APIs may evolve as this work progresses.


What It Does

Director-AI sits between your LLM and the user. It scores every output for hallucination before it reaches anyone — and can halt generation mid-stream if coherence drops below threshold.

graph LR
    LLM["LLM<br/>(any provider)"] --> D["Director-AI"]
    D --> S["Scorer<br/>NLI + RAG"]
    D --> K["StreamingKernel<br/>token-level halt"]
    S --> V{Approved?}
    K --> V
    V -->|Yes| U["User"]
    V -->|No| H["HALT + evidence"]

Ten things make it different:

  1. Token-level streaming halt — not post-hoc review. Severs output the moment coherence degrades.
  2. Dual-entropy scoring — NLI contradiction detection (DeBERTa) + RAG fact-checking against your knowledge base.
  3. Meta-confidence — the guardrail tells you how confident it is in its own verdict. Route low-confidence results to human review.
  4. Structured output verification — JSON schema validation, tool call fabrication detection, code hallucinated API detection. Zero dependencies (stdlib only).
  5. Online calibration — collects human feedback, automatically adjusts thresholds for your deployment. The longer you use it, the better it gets.
  6. Contradiction tracking — detects when an AI contradicts itself across conversation turns.
  7. EU AI Act compliance — automated Article 15 documentation. Accuracy metrics, drift detection, feedback loop detection, audit trails, per-model breakdown with confidence intervals. Ready for August 2026 enforcement.
  8. Verification gems — numeric consistency checks, reasoning chain verification, temporal freshness scoring, cross-model consensus, conformal prediction intervals. All stdlib-only, zero dependencies.
  9. Agentic loop monitor — detects circular tool calls, goal drift, and budget exhaustion in AI agent loops. The first guardrail that monitors agent execution, not just individual calls.
  10. Adversarial self-test — 25-pattern robustness suite tests your guardrail against zero-width chars, homoglyphs, encoding tricks, and prompt injection.
  11. Intent-grounded injection detection — two-stage pipeline: regex pattern matching (fast) + bidirectional NLI divergence scoring (semantic). Detects the effect of injection in the output — works regardless of how the attack was encoded. Per-claim attribution with grounded/drifted/injected verdicts.

Scope

Pure Python core — no compiled extensions required. Optional Rust kernel (pip install director-ai[rust]) for SIMD-accelerated scoring. Works on any platform with Python 3.11+.

Layer Packages Install
Core (zero heavy deps) CoherenceScorer, StreamingKernel, GroundTruthStore, HaltMonitor pip install director-ai
NLI models DeBERTa, FactCG, MiniCheck, ONNX Runtime pip install director-ai[nli]
Vector DBs ChromaDB ([vector]), Pinecone ([pinecone]), Weaviate ([weaviate]), Qdrant ([qdrant]) pip install director-ai[vector]
LLM judge OpenAI, Anthropic escalation pip install director-ai[openai]
Observability OpenTelemetry spans pip install director-ai[otel]
Server FastAPI + Uvicorn pip install director-ai[server]

Four Ways to Add Guardrails

A: Wrap your SDK (6 lines)

Duck-type detection for five SDK shapes: OpenAI-compatible (OpenAI, vLLM, Groq, LiteLLM, Ollama), Anthropic, AWS Bedrock, Google Gemini, and Cohere.

from director_ai import guard
from openai import OpenAI

client = guard(
    OpenAI(),
    facts={"refund_policy": "Refunds within 30 days only"},
)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the refund policy?"}],
)

B: One-shot check (4 lines)

Score a single prompt/response pair without an SDK client:

from director_ai import score

cs = score("What is the refund policy?", response_text,
           facts={"refund": "Refunds within 30 days only"},
           threshold=0.3)
print(f"Coherence: {cs.score:.3f}  Approved: {cs.approved}")

C: Zero code changes (2 lines)

Point any OpenAI-compatible client at the proxy:

pip install director-ai[server]
director-ai proxy --port 8080 --facts kb.txt --threshold 0.3

Then set OPENAI_BASE_URL=http://localhost:8080/v1 in your app. Every response gets scored; hallucinations are rejected (or flagged with --on-fail warn).

D: FastAPI middleware (3 lines)

Guard your own API endpoints:

from director_ai.integrations.fastapi_guard import DirectorGuard

app.add_middleware(DirectorGuard,
    facts={"policy": "Refunds within 30 days only"},
    on_fail="reject",
)

Responses on POST endpoints get X-Director-Score and X-Director-Approved headers. Set paths=["/api/chat"] to limit which endpoints are scored.

Installation

pip install "director-ai[nli]"                    # recommended — NLI model scoring
pip install "director-ai[nli,vector,server]"       # production stack with RAG + REST API
pip install "director-ai[nli,voice]"               # voice AI with TTS adapters
pip install director-ai                            # heuristic-only (limited accuracy)

Extras: [vector] (ChromaDB), [voice] (ElevenLabs, OpenAI TTS, Deepgram), [finetune] (domain adaptation), [ingestion] (PDF/DOCX parsing), [colbert] (late-interaction retrieval). Framework integrations: [langchain], [llamaindex], [langgraph], [haystack], [crewai], Semantic Kernel, DSPy/Instructor. Kubernetes: Helm chart with GPU toggle, HPA, Sigstore-signed releases. Voice AI: VoiceGuard (sync) and AsyncVoiceGuard + voice_pipeline() (async) — real-time token filter for TTS pipelines with ElevenLabs, OpenAI TTS, and Deepgram adapters (guide).

Full installation guide: docs.

Docker

Dockerfile included for self-hosted builds. Pre-built images not yet published to a registry.

docker build -t director-ai .                                      # build locally
docker run -p 8080:8080 director-ai                                # CPU
docker build -f Dockerfile.gpu -t director-ai:gpu .                # GPU build
docker run --gpus all -p 8080:8080 director-ai:gpu                 # GPU

Benchmarks

Accuracy — LLM-AggreFact (29,320 samples)

Scoring model: yaxili96/FactCG-DeBERTa-v3-Large (0.4B params, MIT license).

Model Balanced Acc Params Latency Streaming
Bespoke-MiniCheck-7B 77.4% 7B ~100 ms No
Director-AI (FactCG) 75.8% 0.4B 14.6 ms Yes
MiniCheck-Flan-T5-L 75.0% 0.8B ~120 ms No
MiniCheck-DeBERTa-L 72.6% 0.4B ~120 ms No

75.8% balanced accuracy comes from the FactCG-DeBERTa-v3-Large model (77.2% in the NAACL 2025 paper; our eval yields 75.86% due to threshold tuning and data split version). Latency: 14.6 ms/pair measured on GTX 1060 6GB with ONNX GPU batching (16-pair batch, 30 iterations, 5 warmup). Director-AI's unique value is the system: NLI + KB + streaming halt.

Full results: benchmarks/comparison/COMPETITOR_COMPARISON.md. Performance trade-offs and E2E pipeline metrics: docs.

Domain Presets

10 built-in profiles with preset thresholds (starting points — adjust for your data):

director-ai config --profile medical   # threshold=0.30, NLI on, reranker on
director-ai config --profile finance   # threshold=0.30, w_fact=0.6
director-ai config --profile legal     # threshold=0.30, w_logic=0.6
director-ai config --profile creative  # threshold=0.40, permissive

Domain-specific benchmark scripts exist but have not yet been validated with measured results. Run them yourself (requires GPU + HuggingFace datasets):

python -m benchmarks.medical_eval   # MedNLI + PubMedQA
python -m benchmarks.legal_eval     # ContractNLI + CUAD (RAGBench)
python -m benchmarks.finance_eval   # FinanceBench + Financial PhraseBank
Known Limitations & When Not to Use

Accuracy

  • Heuristic fallback is weak: Without [nli], scoring uses word-overlap heuristics (~55% accuracy). Use strict_mode=True to reject (0.9) instead of guessing.
  • Summarisation FPR at 10.5%: Reduced from 95% via bidirectional NLI + baseline calibration (v3.5). AggreFact-CNN: 68.8%, ExpertQA: 59.1% (structurally expected at 0.4B params).
  • NLI-only scoring needs KB grounding: Without a knowledge base, PubMedQA F1=62.1%, FinanceBench 80%+ FPR. Load your domain facts into the vector store.

Performance

  • ONNX CPU is slow: 383 ms/pair without GPU. Use onnxruntime-gpu for production.
  • Long documents need ≥16GB VRAM: Legal contracts and SEC filings exceed 6GB during chunked NLI inference.

Configuration

  • Weights are domain-dependent: Default w_logic=0.6, w_fact=0.4 suits general QA. Adjust for your domain or use a built-in profile.
  • Threshold defaults differ by API surface: guard()/score() default to threshold=0.3 (permissive). DirectorConfig defaults to coherence_threshold=0.6 (conservative). Always set the threshold explicitly.

Privacy

  • LLM-as-judge sends data externally: When llm_judge_enabled=True, truncated prompt+response (500 chars) are sent to the configured provider. Do not enable in privacy-sensitive deployments without user consent. The default NLI-only mode runs entirely locally with no external calls.

Citation

@software{sotek2026director,
  author    = {Sotek, Miroslav},
  title     = {Director-AI: Real-time LLM Hallucination Guardrail},
  year      = {2026},
  url       = {https://github.com/anulum/director-ai},
  version   = {3.12.0},
  license   = {AGPL-3.0-or-later}
}

License

Dual-licensed:

  1. Open-Source: GNU AGPL v3.0 — research, personal use, open-source projects.
  2. Commercial: Proprietary license — removes copyleft for closed-source and SaaS.

See Licensing for pricing tiers and FAQ.

Contact: anulum.li | director.class.ai@anulum.li

Community

Join the Director-AI Discord for CI notifications, release announcements, and support. The Discord bot also provides /version, /docs, /install, /status, and /quickstart slash commands.

Contributing

See CONTRIBUTING.md. By contributing, you agree to AGPL v3 terms.


ANULUM      Fortis Studio
Developed by ANULUM / Fortis Studio

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

director_ai-3.12.0.tar.gz (604.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

director_ai-3.12.0-py3-none-any.whl (329.8 kB view details)

Uploaded Python 3

File details

Details for the file director_ai-3.12.0.tar.gz.

File metadata

  • Download URL: director_ai-3.12.0.tar.gz
  • Upload date:
  • Size: 604.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for director_ai-3.12.0.tar.gz
Algorithm Hash digest
SHA256 ee33841cd60e520fdbcd4a6c9d0f11feb6daeb9321d679d7f252939c17178a12
MD5 e68851641dd66d6a2e71ea023055b6ef
BLAKE2b-256 e25b043ec8a5e5c9ebe62d28b1f7b0d9bb2a1ba47acf43d7ca2a0dbe03a8f172

See more details on using hashes here.

Provenance

The following attestation bundles were made for director_ai-3.12.0.tar.gz:

Publisher: publish.yml on anulum/director-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file director_ai-3.12.0-py3-none-any.whl.

File metadata

  • Download URL: director_ai-3.12.0-py3-none-any.whl
  • Upload date:
  • Size: 329.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for director_ai-3.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 027852f5a638a654656c30e2b59373f8bc2ded7e5d494fe3584b4543a800b869
MD5 cf8305d40ff8ad50d7266ec4a9085eb2
BLAKE2b-256 efc3b5b3b33aba74c38fa23f6a5e9eb2c94a7f042a7bbdbe269c68b0c61fbfa3

See more details on using hashes here.

Provenance

The following attestation bundles were made for director_ai-3.12.0-py3-none-any.whl:

Publisher: publish.yml on anulum/director-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page