Skip to main content

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

Project description

Director-AI — Real-time LLM Hallucination Guardrail

Director-AI

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

CI PyPI Coverage Python 3.10+ mypy Docker License: AGPL v3


What It Does

Director-AI sits between your LLM and the user. It scores every output for hallucination before it reaches anyone — and can halt generation mid-stream if coherence drops below threshold.

from director_ai import CoherenceAgent

agent = CoherenceAgent()
result = agent.process("What color is the sky?")

print(result.coherence.score)      # 0.94 — high coherence
print(result.coherence.approved)   # True
print(result.coherence.h_logical)  # 0.10 — low contradiction probability
print(result.coherence.h_factual)  # 0.10 — low factual deviation

Three things make it different:

  1. Token-level streaming halt — not post-hoc review. The safety kernel monitors coherence token-by-token and severs output the moment it degrades.
  2. Dual-entropy scoring — NLI contradiction detection (DeBERTa) + RAG fact-checking against your own knowledge base. Both must pass.
  3. Your data, your rules — ingest PDFs, directories, or any text into a ChromaDB-backed knowledge base. The scorer checks LLM output against your ground truth, not a generic model.

Architecture

          ┌──────────────────────────┐
          │    Coherence Agent       │
          │    (Orchestrator)        │
          └─────────┬────────────────┘
                    │
       ┌────────────┼────────────────┐
       │            │                │
┌──────▼──────┐ ┌───▼──────────┐ ┌───▼────────────┐
│  Generator  │ │  Coherence   │ │  Safety        │
│  (LLM       │ │  Scorer      │ │  Kernel        │
│   backend)  │ │              │ │  (streaming    │
│             │ │  NLI + RAG   │ │   interlock)   │
└─────────────┘ └───┬──────────┘ └────────────────┘
                    │
          ┌─────────▼─────────┐
          │  Ground Truth     │
          │  Store            │
          │  (ChromaDB / RAM) │
          └───────────────────┘

Installation

# Basic install (heuristic scoring, no GPU needed)
pip install director-ai

# With NLI model (DeBERTa-based contradiction detection)
pip install director-ai[nli]

# With vector store (ChromaDB for custom knowledge bases)
pip install director-ai[vector]

# With LangChain or LlamaIndex
pip install director-ai[langchain]
pip install director-ai[llamaindex]

# With REST API server
pip install director-ai[server]

# Fine-tuning pipeline
pip install director-ai[train]

# Everything
pip install "director-ai[nli,vector,server]"

# Development
git clone https://github.com/anulum/director-ai.git
cd director-ai
pip install -e ".[dev]"

Usage

Score a single response

from director_ai.core import CoherenceScorer, GroundTruthStore

store = GroundTruthStore()
store.add("sky color", "The sky is blue due to Rayleigh scattering.")

scorer = CoherenceScorer(threshold=0.6, ground_truth_store=store)
approved, score = scorer.review("What color is the sky?", "The sky is green.")

print(approved)     # False — contradicts ground truth
print(score.score)  # 0.42

With a real LLM backend

from director_ai import CoherenceAgent

# Works with any OpenAI-compatible endpoint (llama.cpp, vLLM, Ollama, etc.)
agent = CoherenceAgent(llm_api_url="http://localhost:8080/completion")
result = agent.process("Explain quantum entanglement")

if result.halted:
    print("Output blocked — coherence too low")
else:
    print(result.output)

Token-level streaming with halt

from director_ai.core import StreamingKernel

kernel = StreamingKernel(hard_limit=0.4, window_size=5, window_threshold=0.5)

session = kernel.stream_tokens(
    token_generator=my_token_iterator,
    coherence_callback=lambda tok: my_scorer(tok),
)

for event in session.events:
    if event.halted:
        print(f"\n[HALTED — {session.halt_reason}]")
        break
    print(event.token, end="")

NLI-based scoring (requires torch)

from director_ai.core import CoherenceScorer

scorer = CoherenceScorer(use_nli=True, threshold=0.6)
approved, score = scorer.review(
    "The Earth orbits the Sun.",
    "The Sun orbits the Earth."
)
print(score.h_logical)  # High — NLI detects contradiction

Custom knowledge base with ChromaDB

from director_ai.core import VectorGroundTruthStore

store = VectorGroundTruthStore()  # Uses ChromaDB
store.add_fact("company policy", "Refunds are available within 30 days.")
store.add_fact("pricing", "Enterprise plan starts at $99/month.")

scorer = CoherenceScorer(ground_truth_store=store)
approved, score = scorer.review(
    "What is the refund policy?",
    "We offer full refunds within 90 days."  # Wrong
)
# approved = False — contradicts your KB

Integration examples

See examples/ for ready-to-run integrations:

Example Backend What it shows
quickstart.py None Guard any output in 10 lines
openai_guard.py OpenAI Score + streaming halt for GPT-4o
ollama_guard.py Ollama Local LLM guard with Llama 3
langchain_guard.py LangChain Output checker for chains
streaming_halt_demo.py Simulated All 3 halt mechanisms visualised

Interactive demo

Try Director-AI in the browser — no install needed:

Open in Colab

Or run the Gradio demo locally:

pip install director-ai gradio
python demo/app.py

Scoring Formula

Coherence = 1 - (0.6 * H_logical + 0.4 * H_factual)
Component Source Range Meaning
H_logical NLI model (DeBERTa) 0-1 Contradiction probability
H_factual RAG retrieval 0-1 Ground truth deviation
  • Score >= 0.6 → approved (configurable)
  • Score < 0.5 → safety kernel emergency halt

Benchmarks

Evaluated on LLM-AggreFact (29,320 samples across 11 datasets):

Model AggreFact Balanced Acc Latency (avg)
DeBERTa-v3-base (baseline) 66.2% 220 ms
Fine-tuned DeBERTa-v3-large 64.7% 223 ms
Fine-tuned DeBERTa-v3-base 59.0% 220 ms

Per-dataset highlights:

Dataset Balanced Accuracy Notes
Reveal 80.7% Strong on factual claims
FactCheck-GPT 71.7% Good on GPT-generated text
Lfqa 64.8% Long-form QA
RAGTruth 58.9% RAG-specific hallucination
AggreFact-CNN 53.0% Summarization (known weak spot)

Head-to-head (same benchmark, same metric — LLM-AggreFact leaderboard):

Tool Bal. Acc Params Latency Streaming
Bespoke-MiniCheck-7B 77.4% 7B ~100 ms (GPU) No
MiniCheck-Flan-T5-L 75.0% 0.8B ~120 ms No
MiniCheck-DeBERTa-L 72.6% 0.4B ~120 ms No
HHEM-2.1-Open 71.8% ~0.4B ~200 ms No
Director-AI 66.2% 0.4B 220 ms Yes

Honest assessment: The NLI scorer alone is not state-of-the-art. Director-AI's value is in the system — combining NLI with your own KB facts, streaming token-level gating, and configurable halt thresholds. No competitor offers real-time streaming halt. The NLI component is pluggable; swap in any model that improves on these numbers.

Full comparison with SelfCheckGPT, RAGAS, NeMo Guardrails, Lynx, and others in benchmarks/comparison/. Benchmark scripts in benchmarks/. Fine-tuning pipeline in training/.

Package Structure

src/director_ai/
├── core/                           # Production API
│   ├── agent.py                    # CoherenceAgent — main orchestrator
│   ├── scorer.py                   # Dual-entropy coherence scorer
│   ├── kernel.py                   # Safety kernel (streaming interlock)
│   ├── streaming.py                # Token-level streaming oversight
│   ├── async_streaming.py          # Non-blocking async streaming
│   ├── nli.py                      # NLI scorer (DeBERTa)
│   ├── actor.py                    # LLM generator interface
│   ├── knowledge.py                # Ground truth store (in-memory)
│   ├── vector_store.py             # Vector store (ChromaDB backend)
│   ├── policy.py                   # YAML declarative policy engine
│   ├── audit.py                    # Structured JSONL audit logger
│   ├── tenant.py                   # Multi-tenant KB isolation
│   ├── sanitizer.py                # Prompt injection hardening
│   ├── bridge.py                   # Physics-backed scorer (optional)
│   └── types.py                    # CoherenceScore, ReviewResult
├── integrations/                   # Framework integrations
│   ├── langchain.py                # LangChain Runnable guardrail
│   └── llamaindex.py               # LlamaIndex postprocessor
├── cli.py                          # CLI: review, process, batch, serve
├── server.py                       # FastAPI REST wrapper
benchmarks/                         # AggreFact evaluation suite
training/                           # DeBERTa fine-tuning pipeline

Testing

pytest tests/ -v

License

Dual-licensed:

  1. Open-Source: GNU AGPL v3.0 — academic research, personal use, open-source projects
  2. Commercial: Proprietary license from ANULUM — closed-source and commercial use

See NOTICE for full terms and third-party acknowledgements.

Citation

@software{sotek2026director,
  author    = {Sotek, Miroslav},
  title     = {Director-AI: Real-time LLM Hallucination Guardrail},
  year      = {2026},
  url       = {https://github.com/anulum/director-ai},
  version   = {1.0.0},
  license   = {AGPL-3.0-or-later}
}

Contributing

See CONTRIBUTING.md for guidelines. By contributing, you agree to the Code of Conduct and AGPL v3 licensing terms.

Security

See SECURITY.md for reporting vulnerabilities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

director_ai-1.0.0.tar.gz (87.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

director_ai-1.0.0-py3-none-any.whl (68.1 kB view details)

Uploaded Python 3

File details

Details for the file director_ai-1.0.0.tar.gz.

File metadata

  • Download URL: director_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 87.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for director_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3cfb59fe7a0d78dbcedd12c177b911be01461671674c02aff1d266643d75ee95
MD5 e9b7fe389fe3e3a4b394a9c5a0f53313
BLAKE2b-256 7cc072cf09d825ed0029c9fa04c6424a1714738afc764de9f35ec5a79af221df

See more details on using hashes here.

Provenance

The following attestation bundles were made for director_ai-1.0.0.tar.gz:

Publisher: publish.yml on anulum/director-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file director_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: director_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 68.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for director_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d21fe99473b9140e1f23176dc8cbb9b0a211ba27e48cb3b29727b600b5dd4b51
MD5 4c630dca6491f98d4131c1ae4f775fe9
BLAKE2b-256 2afd4042df61a3ac4e7cd026b5684ddbc8d9784ddc396586fbdcc5808658c05a

See more details on using hashes here.

Provenance

The following attestation bundles were made for director_ai-1.0.0-py3-none-any.whl:

Publisher: publish.yml on anulum/director-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page