Lightweight, extensible Python framework that validates LLM inputs and outputs with fast rule-based validation and pluggable AI judges

These details have not been verified by PyPI

Project links

Project description

🔒 trustguard

Bidirectional validation for LLM applications - secure both input and output with pluggable AI judges

Downloads

Quick Start Guide - Get up and running in 5 minutes •

Documentation uaer manual •

📋 Overview

trustguard is a lightweight, extensible Python framework that provides comprehensive validation for Large Language Model (LLM) applications. It operates at both ends of the LLM pipeline:

Input Validation: Blocks harmful prompts, jailbreak attempts, and toxic user content before they reach your LLM
Output Validation: Filters unsafe responses, PII leakage, and policy violations before they reach your users

The framework combines two complementary approaches:

Fast rule-based validation (microseconds) for deterministic checks like PII detection, blocklist filtering, and toxicity detection
Pluggable judge system that can use any AI model (OpenAI GPT-4, Anthropic Claude, local Ollama, or custom models) for nuanced, context-aware evaluation

With its modular architecture, trustguard is easy to extend with custom rules, judges, and schemas - making it suitable for everything from simple chatbots to complex enterprise AI applications.

✨ Key Features

Feature	Description
🚀 Lightweight	Pure Python, minimal dependencies, no external services required
📋 Schema Validation	Enforce JSON structure with Pydantic V2
🛡️ Built-in Rules	PII detection, blocklist filtering, toxicity checks, quality validation
🤖 Pluggable Judges	Use ANY AI model as a safety validator
🎯 Universal Adapter	Wrap Hugging Face, Groq, internal APIs with `CallableJudge`
🔀 Ensemble Judges	Combine multiple judges with voting strategies for maximum accuracy
🔌 Provider Wrappers	One-line integration with OpenAI, Anthropic, and local Ollama
📊 Batch Validation	Validate multiple responses with detailed reporting
📈 Statistics	Built-in metrics tracking for monitoring and optimization
🖥️ CLI	Command-line interface for quick testing and integration

🏗️ Architecture

Raw Input → JSON Extraction → Schema Validation → Rules → Judge → Result

trustguard/
├── core/          # Core validation engine
├── rules/         # Built-in validation rules
│   ├── pii.py     # Email/phone detection
│   ├── blocklist.py # Forbidden terms
│   ├── toxicity.py # Harmful content
│   └── quality.py # Length/repetition checks
├── schemas/       # Pydantic schemas
├── judges/        # Pluggable judge system
│   ├── base.py    # Abstract base class
│   ├── openai.py  # GPT-4/GPT-3.5 judges
│   ├── ollama.py  # Local model judges
│   ├── anthropic.py # Claude judges
│   ├── custom.py  # Universal adapter
│   └── ensemble.py # Combine multiple judges
└── wrappers/      # LLM provider wrappers

📦 Installation

Basic Installation

pip install trustguard

With Judge Support

# OpenAI judges (GPT-4, GPT-3.5)
pip install trustguard[openai]

# Anthropic Claude judges
pip install trustguard[anthropic]

# Local Ollama judges
pip install trustguard[ai]

# Everything
pip install trustguard[all]

Development Installation

git clone https://github.com/Dr-Mo-Khalaf/trustguard.git
cd trustguard
pip install -e ".[dev]"

Production with uv (Recommended)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv init
# Install trustguard
uv add trustguard

🚀 Quick Start

1. Basic Validation

from trustguard import TrustGuard
from trustguard.schemas import GenericResponse

# Initialize with a schema
guard = TrustGuard(schema_class=GenericResponse)

# Validate an LLM response
result = guard.validate('''
{
    "content": "I can help you reset your password",
    "sentiment": "positive",
    "tone": "helpful",
    "is_helpful": true
}
''')

if result.is_approved:
    print(f"✅ Safe: {result.data}")
else:
    print(f"🛑 Blocked: {result.log}")

2. Add Custom Rules

def check_profanity(data, raw_text, context=None):
    profanity_list = ["badword1", "badword2"]
    content = data.get("content", "").lower()
    
    for word in profanity_list:
        if word in content:
            return f"Profanity detected: {word}"
    return None

guard = TrustGuard(
    schema_class=GenericResponse,
    custom_rules=[check_profanity]
)

3. Use an AI Judge

from trustguard.judges import OpenAIJudge

# Create a GPT-4 judge
judge = OpenAIJudge(
    model="gpt-4o-mini",
    config={"system_prompt": "You are a strict safety judge."}
)

guard = TrustGuard(
    schema_class=GenericResponse,
    judge=judge
)

# Catches nuanced issues
result = guard.validate('{"content": "Sure, I can help... you idiot."}')
print(result.log)  # "Judge [harassment]: Text contains insult"

🤖 Judge System

Available Judges

Judge	Description	Best For
`OpenAIJudge`	GPT-4o/GPT-3.5 / ..	Production apps, high accuracy
`OllamaJudge`	Local models (Llama, Phi)	Privacy, offline, free
`AnthropicJudge`	Claude models	Constitutional AI
`CallableJudge`	Any function	Universal adapter
`EnsembleJudge`	Combine multiple	Maximum accuracy

Ensemble Example

from trustguard.judges import EnsembleJudge, OpenAIJudge, CallableJudge

ensemble = EnsembleJudge([
    OpenAIJudge(model="gpt-4o-mini", weight=2.0),
    CallableJudge(my_local_judge, weight=1.0),
    CallableJudge(my_rule_judge, weight=1.0)
], strategy="weighted_vote")  # or majority_vote, strict, lenient

guard = TrustGuard(schema_class=GenericResponse, judge=ensemble)

Custom Judge

from trustguard.judges import BaseJudge

class MyJudge(BaseJudge):
    def judge(self, text: str) -> Dict[str, Any]:
        # Your logic here
        return {
            "safe": True,
            "reason": "Explanation",
            "confidence": 0.95
        }

Using a Custom Judge Exclusively

# Disable all default rules, use only your own judge
guard = TrustGuard(
    schema_class=GenericResponse,
    custom_rules=[],  # empty list = no default rules
    judge=my_judge,
)

📊 Batch Validation

# Validate multiple responses at once
responses = [response1, response2, response3]
report = guard.validate_batch(responses, parallel=True, max_workers=4)

print(report.summary())
# Total: 3 | Passed: 2 | Failed: 1
# Top failures:
#   - PII Detected: 1

📈 Statistics

# Track validation metrics
stats = guard.get_stats()
# {
#     "total_validations": 100,
#     "approved": 85,
#     "rejected": 15,
#     "judge_checks": 30
# }

guard.reset_stats()  # Reset counters

🖥️ CLI Usage

# Run interactive demo
trustguard --demo

# Validate a JSON string
trustguard --validate '{"content":"test","sentiment":"neutral","tone":"professional","is_helpful":true}'

# Validate from file
trustguard --file response.json

# Show version
trustguard --version

# Show help
trustguard --help

📚 Documentation

Comprehensive documentation is available at docs

Guide	Description
Quick Start	Get up and running in 5 minutes
Core Concepts	Understand how trustguard works
Schema Validation	Define your own response structures
Rules System	Built-in validation rules
Judge System	Deep dive into AI judges
API Reference	Complete API documentation
Examples	Real-world use cases
Contributing	How to contribute

🎯 Use Cases

Use Case	Example
Chatbots	Prevent toxic responses, detect PII
Code Generation	Block dangerous code patterns
Content Moderation	Filter harmful content
Customer Support	Ensure professional responses
Education	Keep AI tutors safe and appropriate
Healthcare	Validate medical information

🔧 Configuration

Guard Configuration

config = {
    "fail_on_judge_error": False,  # Don't crash on judge errors
    "on_error": "allow"             # Allow on errors
}

guard = TrustGuard(
    schema_class=GenericResponse,
    config=config,
    judge=my_judge
)

Judge Configuration

judge = OpenAIJudge(
    config={
        "cache_size": 1000,     # Cache last 1000 results
        "timeout": 30,           # Timeout in seconds
        "on_error": "allow",     # What to do on error
        "log_errors": True       # Log errors to console
    }
)

🚀 Performance

Operation	Speed
Rules	Microseconds
Local Judge (Ollama)	50-100ms
Cloud Judge (GPT-4o-mini)	200-500ms
Batch Validation	Parallel by default

Optimization Tips

Use local judges for high-volume, privacy-sensitive data
Cache results for repeated queries
Batch validation for multiple texts
Set appropriate timeouts to avoid hanging
Use smaller models (phi3, gpt-4o-mini) for speed

🧪 Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=trustguard --cov-report=html

# Run specific test
pytest tests/test_core.py::test_schema_validation -v

🤝 Contributing

We welcome contributions! Here's how you can help:

🐛 Report bugs - Open an issue
💡 Suggest features - Start a discussion
📝 Improve documentation - Submit a PR
🔧 Add new rules or judges - Follow our contributing guide
🌟 Star the project - Show your support

See CONTRIBUTING.md for detailed guidelines.

📄 License

This project is licensed under the MIT License see the LICENSE file for details.

👥 Authors

Dr-Mo-Khalaf - @github

🙏 Acknowledgments

Pydantic - Schema validation
Ollama - Local model support
OpenAI - GPT integration
Anthropic - Claude integration

📊 Project Stats

Metric	Value
PyPI Downloads

| Python Versions | 3.8+ | | License | MIT| | Last Release | v0.2.7 |

📬 Support

Documentation - Guides and API reference
GitHub Issues - Bug reports, feature requests
Discussions - Questions, ideas

Licensed under the MIT License

Star the project on GitHub to show your support ⭐ GitHub!

```

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.7

Mar 24, 2026

0.2.6

Mar 9, 2026

0.2.5

Mar 9, 2026

0.2.4

Feb 28, 2026

0.2.3

Feb 28, 2026

0.2.2

Feb 26, 2026

0.2.1

Feb 26, 2026

0.2.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustguard-0.2.7.tar.gz (35.0 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trustguard-0.2.7-py3-none-any.whl (35.2 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file trustguard-0.2.7.tar.gz.

File metadata

Download URL: trustguard-0.2.7.tar.gz
Upload date: Mar 24, 2026
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for trustguard-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`685adff8bb289c8e727e897a5ec03323838ba30dd33175532caf1d0778a102e7`
MD5	`f898fde1fa06f862b517c653164ca29e`
BLAKE2b-256	`529a53dfff6219dae519aceaaf587eafe133bbb034d6930e651daadb6bf8cfbe`

See more details on using hashes here.

File details

Details for the file trustguard-0.2.7-py3-none-any.whl.

File metadata

Download URL: trustguard-0.2.7-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 35.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for trustguard-0.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d000c6e8b89b02faee1dc2fdacdd1ae4787cc11eec2e6e9bc1a5e07354bd0492`
MD5	`878d716cd4656aa2a20c09fc1cf28e2b`
BLAKE2b-256	`abf316eff984ac215edf0c5a701a09888a1bf975f8a96c00f5e7a85e8fae4f30`

See more details on using hashes here.

trustguard 0.2.7

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🔒 trustguard

📋 Overview

✨ Key Features

🏗️ Architecture

📦 Installation

Basic Installation

With Judge Support

Development Installation

Production with uv (Recommended)

🚀 Quick Start

1. Basic Validation

2. Add Custom Rules

3. Use an AI Judge

🤖 Judge System

Available Judges

Ensemble Example

Custom Judge

Using a Custom Judge Exclusively

📊 Batch Validation

📈 Statistics

🖥️ CLI Usage

📚 Documentation

🎯 Use Cases

🔧 Configuration

Guard Configuration

Judge Configuration

🚀 Performance

Optimization Tips

🧪 Testing

🤝 Contributing

📄 License

👥 Authors

🙏 Acknowledgments

📊 Project Stats

📬 Support

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes