Skip to main content

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

Project description

Director-AI — Real-time LLM Hallucination Guardrail

Director-AI

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

CI Tests PyPI Coverage Python 3.10+ Docker License: AGPL v3 HF Spaces DOI Docs


What It Does

Director-AI sits between your LLM and the user. It scores every output for hallucination before it reaches anyone — and can halt generation mid-stream if coherence drops below threshold.

graph LR
    LLM["LLM<br/>(any provider)"] --> D["Director-AI"]
    D --> S["Scorer<br/>NLI + RAG"]
    D --> K["StreamingKernel<br/>token-level halt"]
    S --> V{Approved?}
    K --> V
    V -->|Yes| U["User"]
    V -->|No| H["HALT + evidence"]

Three things make it different:

  1. Token-level streaming halt — not post-hoc review. Severs output the moment coherence degrades.
  2. Dual-entropy scoring — NLI contradiction detection (DeBERTa) + RAG fact-checking against your knowledge base.
  3. Your data, your rules — ingest your own documents. The scorer checks against your ground truth.

Scope

100% Python — no compiled extensions required. Works on any platform with Python 3.10+.

Layer Packages Install
Core (zero heavy deps) CoherenceScorer, StreamingKernel, GroundTruthStore, SafetyKernel pip install director-ai
NLI models DeBERTa, FactCG, MiniCheck, ONNX Runtime pip install director-ai[nli]
Vector DBs ChromaDB, Pinecone, Weaviate, Qdrant pip install director-ai[vector]
LLM judge OpenAI, Anthropic escalation pip install director-ai[openai]
Observability OpenTelemetry spans pip install director-ai[otel]
Server FastAPI + Uvicorn pip install director-ai[server]

Quickstart

Method Command
pip install pip install director-ai
CLI scaffold director-ai quickstart --profile medical
Colab Open in Colab
HF Spaces Try it live
Docker docker run -p 8080:8080 ghcr.io/anulum/director-ai:latest

6-line guard

from director_ai import guard
from openai import OpenAI

client = guard(
    OpenAI(),
    facts={"refund_policy": "Refunds within 30 days only"},
    threshold=0.6,
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the refund policy?"}],
)

Score a response

from director_ai.core import CoherenceScorer, GroundTruthStore

store = GroundTruthStore()
store.add("sky color", "The sky is blue due to Rayleigh scattering.")

scorer = CoherenceScorer(threshold=0.6, ground_truth_store=store)
approved, score = scorer.review("What color is the sky?", "The sky is green.")

print(approved)     # False
print(score.score)  # 0.42

Streaming halt

from director_ai.core import StreamingKernel

kernel = StreamingKernel(hard_limit=0.4, window_size=5)
session = kernel.stream_tokens(token_generator, lambda tok: my_scorer(tok))

if session.halted:
    print(f"Halted at token {session.halt_index}: {session.halt_reason}")

Installation

pip install director-ai                      # heuristic scoring
pip install director-ai[nli]                 # NLI model (DeBERTa)
pip install director-ai[vector]              # ChromaDB knowledge base
pip install "director-ai[nli,vector,server]" # production stack

Framework integrations: [langchain], [llamaindex], [langgraph], [haystack], [crewai].

Full installation guide: docs.

Docker

docker run -p 8080:8080 ghcr.io/anulum/director-ai:latest        # CPU
docker run --gpus all -p 8080:8080 ghcr.io/anulum/director-ai:gpu # GPU

Benchmarks

Accuracy — LLM-AggreFact (29,320 samples)

Model Balanced Acc Params Latency Streaming
Bespoke-MiniCheck-7B 77.4% 7B ~100 ms No
Director-AI (FactCG) 75.8% 0.4B 14.6 ms Yes
MiniCheck-Flan-T5-L 75.0% 0.8B ~120 ms No
MiniCheck-DeBERTa-L 72.6% 0.4B ~120 ms No

75.8% balanced accuracy at 17x fewer params than the leader. 14.6 ms/pair with ONNX GPU batching — faster than every competitor at this accuracy tier. Director-AI's unique value is the system: NLI + KB + streaming halt.

Full results: benchmarks/comparison/COMPETITOR_COMPARISON.md.

Domain Presets

8 built-in profiles with tuned thresholds:

director-ai config --profile medical   # threshold=0.75, NLI on, reranker on
director-ai config --profile finance   # threshold=0.70, w_fact=0.6
director-ai config --profile legal     # threshold=0.68, w_logic=0.6
director-ai config --profile creative  # threshold=0.40, permissive

Known Limitations

  1. Heuristic fallback is weak: Without [nli], scoring uses word-overlap heuristics (~55% accuracy). Use strict_mode=True to return neutral 0.5 instead.
  2. Summarisation is a weak spot: NLI models under-perform on summarisation (AggreFact-CNN: 68.8%, ExpertQA: 59.1%).
  3. ONNX CPU is slow: 383 ms/pair without GPU. Use onnxruntime-gpu for production.
  4. Weights are domain-dependent: Default w_logic=0.6, w_fact=0.4 suits general QA. Adjust for your domain.
  5. Chunked NLI: Very short chunks (<3 sentences) may lose context.

Citation

@software{sotek2026director,
  author    = {Sotek, Miroslav},
  title     = {Director-AI: Real-time LLM Hallucination Guardrail},
  year      = {2026},
  url       = {https://github.com/anulum/director-ai},
  version   = {2.2.0},
  license   = {AGPL-3.0-or-later}
}

License

Dual-licensed:

  1. Open-Source: GNU AGPL v3.0 — research, personal use, open-source projects.
  2. Commercial: Proprietary license — removes copyleft for closed-source and SaaS.

See Licensing for pricing tiers and FAQ.

Contact: anulum.li/contact | invest@anulum.li

Contributing

See CONTRIBUTING.md. By contributing, you agree to AGPL v3 terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

director_ai-2.2.0.tar.gz (155.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

director_ai-2.2.0-py3-none-any.whl (98.6 kB view details)

Uploaded Python 3

File details

Details for the file director_ai-2.2.0.tar.gz.

File metadata

  • Download URL: director_ai-2.2.0.tar.gz
  • Upload date:
  • Size: 155.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for director_ai-2.2.0.tar.gz
Algorithm Hash digest
SHA256 f58a1fc8fbd31d4b797c31ba934992a12dbf65eabff5bf8b4650a5d4bf7b6832
MD5 631d7c15b9596c25b2b0a3d631ce97d9
BLAKE2b-256 1edc633abd6754a8bdcb739f90f77dc29c51b8156d813608c0a58b929190c53b

See more details on using hashes here.

Provenance

The following attestation bundles were made for director_ai-2.2.0.tar.gz:

Publisher: publish.yml on anulum/director-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file director_ai-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: director_ai-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 98.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for director_ai-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02fca97d23b4988b1e767cbce5c83ac08c0f68bd0e11e43b4297adeb049ad6f8
MD5 199f6e9a8e09e8d1f98a579b16bbede7
BLAKE2b-256 dc58e271e31e4717560fc7e78948ff5fdc3a8bc5ec67f187018c2c19ad8c3ed6

See more details on using hashes here.

Provenance

The following attestation bundles were made for director_ai-2.2.0-py3-none-any.whl:

Publisher: publish.yml on anulum/director-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page