Lightweight, extensible Python framework that validates LLM inputs and outputs with fast rule-based validation and pluggable AI judges
Project description
๐ trustguard
Bidirectional validation for LLM applications - secure both input and output with pluggable AI judges
Quick Start Guide - Get up and running in 5 minutes โข
Documentation uaer manual โข
๐ Overview
trustguard is a lightweight, extensible Python framework that provides comprehensive validation for Large Language Model (LLM) applications. It operates at both ends of the LLM pipeline:
- Input Validation: Blocks harmful prompts, jailbreak attempts, and toxic user content before they reach your LLM
- Output Validation: Filters unsafe responses, PII leakage, and policy violations before they reach your users
The framework combines two complementary approaches:
- Fast rule-based validation (microseconds) for deterministic checks like PII detection, blocklist filtering, and toxicity detection
- Pluggable judge system that can use any AI model (OpenAI GPT-4, Anthropic Claude, local Ollama, or custom models) for nuanced, context-aware evaluation
With its modular architecture, trustguard is easy to extend with custom rules, judges, and schemas - making it suitable for everything from simple chatbots to complex enterprise AI applications.
โจ Key Features
| Feature | Description |
|---|---|
| ๐ Lightweight | Pure Python, minimal dependencies, no external services required |
| ๐ Schema Validation | Enforce JSON structure with Pydantic V2 |
| ๐ก๏ธ Built-in Rules | PII detection, blocklist filtering, toxicity checks, quality validation |
| ๐ค Pluggable Judges | Use ANY AI model as a safety validator |
| ๐ฏ Universal Adapter | Wrap Hugging Face, Groq, internal APIs with CallableJudge |
| ๐ Ensemble Judges | Combine multiple judges with voting strategies for maximum accuracy |
| ๐ Provider Wrappers | One-line integration with OpenAI, Anthropic, and local Ollama |
| ๐ Batch Validation | Validate multiple responses with detailed reporting |
| ๐ Statistics | Built-in metrics tracking for monitoring and optimization |
| ๐ฅ๏ธ CLI | Command-line interface for quick testing and integration |
๐๏ธ Architecture
Raw Input โ JSON Extraction โ Schema Validation โ Rules โ Judge โ Result
trustguard/
โโโ core/ # Core validation engine
โโโ rules/ # Built-in validation rules
โ โโโ pii.py # Email/phone detection
โ โโโ blocklist.py # Forbidden terms
โ โโโ toxicity.py # Harmful content
โ โโโ quality.py # Length/repetition checks
โโโ schemas/ # Pydantic schemas
โโโ judges/ # Pluggable judge system
โ โโโ base.py # Abstract base class
โ โโโ openai.py # GPT-4/GPT-3.5 judges
โ โโโ ollama.py # Local model judges
โ โโโ anthropic.py # Claude judges
โ โโโ custom.py # Universal adapter
โ โโโ ensemble.py # Combine multiple judges
โโโ wrappers/ # LLM provider wrappers
๐ฆ Installation
Basic Installation
pip install trustguard
With Judge Support
# OpenAI judges (GPT-4, GPT-3.5)
pip install trustguard[openai]
# Anthropic Claude judges
pip install trustguard[anthropic]
# Local Ollama judges
pip install trustguard[ai]
# Everything
pip install trustguard[all]
Development Installation
git clone https://github.com/Dr-Mo-Khalaf/trustguard.git
cd trustguard
pip install -e ".[dev]"
Production with uv (Recommended)
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment
uv init
# Install trustguard
uv add trustguard
๐ Quick Start
1. Basic Validation
from trustguard import TrustGuard
from trustguard.schemas import GenericResponse
# Initialize with a schema
guard = TrustGuard(schema_class=GenericResponse)
# Validate an LLM response
result = guard.validate('''
{
"content": "I can help you reset your password",
"sentiment": "positive",
"tone": "helpful",
"is_helpful": true
}
''')
if result.is_approved:
print(f"โ
Safe: {result.data}")
else:
print(f"๐ Blocked: {result.log}")
2. Add Custom Rules
def check_profanity(data, raw_text, context=None):
profanity_list = ["badword1", "badword2"]
content = data.get("content", "").lower()
for word in profanity_list:
if word in content:
return f"Profanity detected: {word}"
return None
guard = TrustGuard(
schema_class=GenericResponse,
custom_rules=[check_profanity]
)
3. Use an AI Judge
from trustguard.judges import OpenAIJudge
# Create a GPT-4 judge
judge = OpenAIJudge(
model="gpt-4o-mini",
config={"system_prompt": "You are a strict safety judge."}
)
guard = TrustGuard(
schema_class=GenericResponse,
judge=judge
)
# Catches nuanced issues
result = guard.validate('{"content": "Sure, I can help... you idiot."}')
print(result.log) # "Judge [harassment]: Text contains insult"
๐ค Judge System
Available Judges
| Judge | Description | Best For |
|---|---|---|
OpenAIJudge |
GPT-4o/GPT-3.5 / .. | Production apps, high accuracy |
OllamaJudge |
Local models (Llama, Phi) | Privacy, offline, free |
AnthropicJudge |
Claude models | Constitutional AI |
CallableJudge |
Any function | Universal adapter |
EnsembleJudge |
Combine multiple | Maximum accuracy |
Ensemble Example
from trustguard.judges import EnsembleJudge, OpenAIJudge, CallableJudge
ensemble = EnsembleJudge([
OpenAIJudge(model="gpt-4o-mini", weight=2.0),
CallableJudge(my_local_judge, weight=1.0),
CallableJudge(my_rule_judge, weight=1.0)
], strategy="weighted_vote") # or majority_vote, strict, lenient
guard = TrustGuard(schema_class=GenericResponse, judge=ensemble)
Custom Judge
from trustguard.judges import BaseJudge
class MyJudge(BaseJudge):
def judge(self, text: str) -> Dict[str, Any]:
# Your logic here
return {
"safe": True,
"reason": "Explanation",
"confidence": 0.95
}
Using a Custom Judge Exclusively
# Disable all default rules, use only your own judge
guard = TrustGuard(
schema_class=GenericResponse,
custom_rules=[], # empty list = no default rules
judge=my_judge,
)
๐ Batch Validation
# Validate multiple responses at once
responses = [response1, response2, response3]
report = guard.validate_batch(responses, parallel=True, max_workers=4)
print(report.summary())
# Total: 3 | Passed: 2 | Failed: 1
# Top failures:
# - PII Detected: 1
๐ Statistics
# Track validation metrics
stats = guard.get_stats()
# {
# "total_validations": 100,
# "approved": 85,
# "rejected": 15,
# "judge_checks": 30
# }
guard.reset_stats() # Reset counters
๐ฅ๏ธ CLI Usage
# Run interactive demo
trustguard --demo
# Validate a JSON string
trustguard --validate '{"content":"test","sentiment":"neutral","tone":"professional","is_helpful":true}'
# Validate from file
trustguard --file response.json
# Show version
trustguard --version
# Show help
trustguard --help
๐ Documentation
Comprehensive documentation is available at docs
| Guide | Description |
|---|---|
| Quick Start | Get up and running in 5 minutes |
| Core Concepts | Understand how trustguard works |
| Schema Validation | Define your own response structures |
| Rules System | Built-in validation rules |
| Judge System | Deep dive into AI judges |
| API Reference | Complete API documentation |
| Examples | Real-world use cases |
| Contributing | How to contribute |
๐ฏ Use Cases
| Use Case | Example |
|---|---|
| Chatbots | Prevent toxic responses, detect PII |
| Code Generation | Block dangerous code patterns |
| Content Moderation | Filter harmful content |
| Customer Support | Ensure professional responses |
| Education | Keep AI tutors safe and appropriate |
| Healthcare | Validate medical information |
๐ง Configuration
Guard Configuration
config = {
"fail_on_judge_error": False, # Don't crash on judge errors
"on_error": "allow" # Allow on errors
}
guard = TrustGuard(
schema_class=GenericResponse,
config=config,
judge=my_judge
)
Judge Configuration
judge = OpenAIJudge(
config={
"cache_size": 1000, # Cache last 1000 results
"timeout": 30, # Timeout in seconds
"on_error": "allow", # What to do on error
"log_errors": True # Log errors to console
}
)
๐ Performance
| Operation | Speed |
|---|---|
| Rules | Microseconds |
| Local Judge (Ollama) | 50-100ms |
| Cloud Judge (GPT-4o-mini) | 200-500ms |
| Batch Validation | Parallel by default |
Optimization Tips
- Use local judges for high-volume, privacy-sensitive data
- Cache results for repeated queries
- Batch validation for multiple texts
- Set appropriate timeouts to avoid hanging
- Use smaller models (phi3, gpt-4o-mini) for speed
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=trustguard --cov-report=html
# Run specific test
pytest tests/test_core.py::test_schema_validation -v
๐ค Contributing
We welcome contributions! Here's how you can help:
- ๐ Report bugs - Open an issue
- ๐ก Suggest features - Start a discussion
- ๐ Improve documentation - Submit a PR
- ๐ง Add new rules or judges - Follow our contributing guide
- ๐ Star the project - Show your support
See CONTRIBUTING.md for detailed guidelines.
๐ License
This project is licensed under the MIT License see the LICENSE file for details.
๐ฅ Authors
- Dr-Mo-Khalaf - @github
๐ Acknowledgments
- Pydantic - Schema validation
- Ollama - Local model support
- OpenAI - GPT integration
- Anthropic - Claude integration
๐ Project Stats
| Metric | Value |
|---|---|
| PyPI Downloads |
| Python Versions | 3.8+ | | License | MIT| | Last Release | v0.2.7 |
๐ฌ Support
- Documentation - Guides and API reference
- GitHub Issues - Bug reports, feature requests
- Discussions - Questions, ideas
Copyright 2026 Khalaf
Licensed under the MIT License
Star the project on GitHub to show your support โญ GitHub!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trustguard-0.2.7.tar.gz.
File metadata
- Download URL: trustguard-0.2.7.tar.gz
- Upload date:
- Size: 35.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
685adff8bb289c8e727e897a5ec03323838ba30dd33175532caf1d0778a102e7
|
|
| MD5 |
f898fde1fa06f862b517c653164ca29e
|
|
| BLAKE2b-256 |
529a53dfff6219dae519aceaaf587eafe133bbb034d6930e651daadb6bf8cfbe
|
File details
Details for the file trustguard-0.2.7-py3-none-any.whl.
File metadata
- Download URL: trustguard-0.2.7-py3-none-any.whl
- Upload date:
- Size: 35.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d000c6e8b89b02faee1dc2fdacdd1ae4787cc11eec2e6e9bc1a5e07354bd0492
|
|
| MD5 |
878d716cd4656aa2a20c09fc1cf28e2b
|
|
| BLAKE2b-256 |
abf316eff984ac215edf0c5a701a09888a1bf975f8a96c00f5e7a85e8fae4f30
|