Skip to main content

A composable validation framework for LLM inputs and outputs

Project description

pip install validate-llm

Composable validation guardrails for LLM pipelines — accuracy, relevancy, toxicity, privacy, and bias checks in one pipeline.

Documentation →


Install

pip install validate-llm

Requires Python 3.11+. Optional extras:

pip install "validate-llm[demo]"   # FastAPI demo server + web UI
pip install "validate-llm[test]"   # pytest + datasets
pip install "validate-llm[dev]"    # everything

Quick start

from llm_validation_framework import ValidationFramework, LLMProvider, Pipe
from llm_validation_framework import ToxicityAgent, PrivacyAgent, AccuracyAgent
from llm_validation_framework.config_loader import load_api_key

llm = LLMProvider(provider="anthropic", model="claude-haiku-4-5-20251001", key=load_api_key())

vf = ValidationFramework(
    llm=llm,
    input_guardrail=Pipe(steps=[ToxicityAgent()], verbose=False),
    output_guardrail=Pipe(steps=[ToxicityAgent(), PrivacyAgent(), AccuracyAgent()], verbose=False),
)

result = vf.validate("What is the Pacific Ocean?")
print(result["status"], result["score"])  # PASS 0.87

validate() returns a structured dict with status, score, and per-agent results for both the input and output guardrails. See the docs for the full schema.

Agents

Agent What it does Needs API key
ToxicityAgent Three-layer check: profanity filter → toxicity model → semantic similarity No
PrivacyAgent Regex scan for SSN, credit cards, API keys; optional system prompt leakage detection No
AccuracyAgent LLM-as-a-judge factual accuracy + relevancy, with optional RAG grounding Yes
RelevancyAgent LLM-as-a-judge check that the answer addresses the question Yes
BiasAgent LLM-as-a-judge scan for stereotypes and discriminatory language Yes

ToxicityAgent and PrivacyAgent run fully locally with no external calls.

Config

export ANTHROPIC_API_KEY=your-key

Or create a config.ini at the repo root (gitignored):

[ANTHROPIC]
API_KEY=your-key

Supported providers follow litellm's naming.

RAG grounding

Pass a retriever to AccuracyAgent to ground factual checks against your own corpus:

from llm_validation_framework import AccuracyAgent, RAGProvider

accuracy = AccuracyAgent(rag=RAGProvider(your_vectorstore.as_retriever()))

See the RAG Integration guide for a full walkthrough.

Demo

The demo is a FastAPI backend + static web UI.

# Terminal 1
uvicorn demo.api_server:app --host 127.0.0.1 --port 5050

# Terminal 2
python demo/serve_ui.py

Open http://127.0.0.1:8000.

Contributors

  • Hitha Shri Nagaruru
  • James Wu
  • Lewis Lui
  • Thomas Yeoh

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

validate_llm-0.1.0.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

validate_llm-0.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file validate_llm-0.1.0.tar.gz.

File metadata

  • Download URL: validate_llm-0.1.0.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for validate_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d473566128156894586334605885bb5c3d51627dd1591d017170c7b66299a02
MD5 418c89bc4fc174a9296adc4c39cb0e46
BLAKE2b-256 66f4c370c799165da3b4635de8112abf2b07ab7c13d6ae8a6a52fa53e98918f6

See more details on using hashes here.

File details

Details for the file validate_llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: validate_llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for validate_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19b0b46db7a890097094402097226ff3257b96cbed1b5c2c152d28f8fdda47d9
MD5 4fcb82ef075cf479bd65034690b6d888
BLAKE2b-256 1ae60e6267ae86a736961234fcaa44a8b9e0c5c68277b7f05465b7ed5756a4a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page