Skip to main content

Lightweight, extensible Python framework that validates LLM inputs and outputs with fast rule-based validation and pluggable AI judges

Project description

๐Ÿ”’ trustguard

Bidirectional validation for LLM applications - secure both input and output with pluggable AI judges

PyPI version Python Versions License: MIT Documentation

Downloads

Quick Start Guide - Get up and running in 5 minutes โ€ข

Documentation uaer manual โ€ข


๐Ÿ“‹ Overview

trustguard is a lightweight, extensible Python framework that provides comprehensive validation for Large Language Model (LLM) applications. It operates at both ends of the LLM pipeline:

  • Input Validation: Blocks harmful prompts, jailbreak attempts, and toxic user content before they reach your LLM
  • Output Validation: Filters unsafe responses, PII leakage, and policy violations before they reach your users

The framework combines two complementary approaches:

  • Fast rule-based validation (microseconds) for deterministic checks like PII detection, blocklist filtering, and toxicity detection
  • Pluggable judge system that can use any AI model (OpenAI GPT-4, Anthropic Claude, local Ollama, or custom models) for nuanced, context-aware evaluation

With its modular architecture, trustguard is easy to extend with custom rules, judges, and schemas - making it suitable for everything from simple chatbots to complex enterprise AI applications.


โœจ Key Features

Feature Description
๐Ÿš€ Lightweight Pure Python, minimal dependencies, no external services required
๐Ÿ“‹ Schema Validation Enforce JSON structure with Pydantic V2
๐Ÿ›ก๏ธ Built-in Rules PII detection, blocklist filtering, toxicity checks, quality validation
๐Ÿค– Pluggable Judges Use ANY AI model as a safety validator
๐ŸŽฏ Universal Adapter Wrap Hugging Face, Groq, internal APIs with CallableJudge
๐Ÿ”€ Ensemble Judges Combine multiple judges with voting strategies for maximum accuracy
๐Ÿ”Œ Provider Wrappers One-line integration with OpenAI, Anthropic, and local Ollama
๐Ÿ“Š Batch Validation Validate multiple responses with detailed reporting
๐Ÿ“ˆ Statistics Built-in metrics tracking for monitoring and optimization
๐Ÿ–ฅ๏ธ CLI Command-line interface for quick testing and integration

๐Ÿ—๏ธ Architecture

Raw Input โ†’ JSON Extraction โ†’ Schema Validation โ†’ Rules โ†’ Judge โ†’ Result
trustguard/
โ”œโ”€โ”€ core/          # Core validation engine
โ”œโ”€โ”€ rules/         # Built-in validation rules
โ”‚   โ”œโ”€โ”€ pii.py     # Email/phone detection
โ”‚   โ”œโ”€โ”€ blocklist.py # Forbidden terms
โ”‚   โ”œโ”€โ”€ toxicity.py # Harmful content
โ”‚   โ””โ”€โ”€ quality.py # Length/repetition checks
โ”œโ”€โ”€ schemas/       # Pydantic schemas
โ”œโ”€โ”€ judges/        # Pluggable judge system
โ”‚   โ”œโ”€โ”€ base.py    # Abstract base class
โ”‚   โ”œโ”€โ”€ openai.py  # GPT-4/GPT-3.5 judges
โ”‚   โ”œโ”€โ”€ ollama.py  # Local model judges
โ”‚   โ”œโ”€โ”€ anthropic.py # Claude judges
โ”‚   โ”œโ”€โ”€ custom.py  # Universal adapter
โ”‚   โ””โ”€โ”€ ensemble.py # Combine multiple judges
โ””โ”€โ”€ wrappers/      # LLM provider wrappers

๐Ÿ“ฆ Installation

Basic Installation

pip install trustguard

With Judge Support

# OpenAI judges (GPT-4, GPT-3.5)
pip install trustguard[openai]

# Anthropic Claude judges
pip install trustguard[anthropic]

# Local Ollama judges
pip install trustguard[ai]

# Everything
pip install trustguard[all]

Development Installation

git clone https://github.com/Dr-Mo-Khalaf/trustguard.git
cd trustguard
pip install -e ".[dev]"

Production with uv (Recommended)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment
uv init
# Install trustguard
uv add trustguard

๐Ÿš€ Quick Start

1. Basic Validation

from trustguard import TrustGuard
from trustguard.schemas import GenericResponse

# Initialize with a schema
guard = TrustGuard(schema_class=GenericResponse)

# Validate an LLM response
result = guard.validate('''
{
    "content": "I can help you reset your password",
    "sentiment": "positive",
    "tone": "helpful",
    "is_helpful": true
}
''')

if result.is_approved:
    print(f"โœ… Safe: {result.data}")
else:
    print(f"๐Ÿ›‘ Blocked: {result.log}")

2. Add Custom Rules

def check_profanity(data, raw_text, context=None):
    profanity_list = ["badword1", "badword2"]
    content = data.get("content", "").lower()
    
    for word in profanity_list:
        if word in content:
            return f"Profanity detected: {word}"
    return None

guard = TrustGuard(
    schema_class=GenericResponse,
    custom_rules=[check_profanity]
)

3. Use an AI Judge

from trustguard.judges import OpenAIJudge

# Create a GPT-4 judge
judge = OpenAIJudge(
    model="gpt-4o-mini",
    config={"system_prompt": "You are a strict safety judge."}
)

guard = TrustGuard(
    schema_class=GenericResponse,
    judge=judge
)

# Catches nuanced issues
result = guard.validate('{"content": "Sure, I can help... you idiot."}')
print(result.log)  # "Judge [harassment]: Text contains insult"

๐Ÿค– Judge System

Available Judges

Judge Description Best For
OpenAIJudge GPT-4o/GPT-3.5 / .. Production apps, high accuracy
OllamaJudge Local models (Llama, Phi) Privacy, offline, free
AnthropicJudge Claude models Constitutional AI
CallableJudge Any function Universal adapter
EnsembleJudge Combine multiple Maximum accuracy

Ensemble Example

from trustguard.judges import EnsembleJudge, OpenAIJudge, CallableJudge

ensemble = EnsembleJudge([
    OpenAIJudge(model="gpt-4o-mini", weight=2.0),
    CallableJudge(my_local_judge, weight=1.0),
    CallableJudge(my_rule_judge, weight=1.0)
], strategy="weighted_vote")  # or majority_vote, strict, lenient

guard = TrustGuard(schema_class=GenericResponse, judge=ensemble)

Custom Judge

from trustguard.judges import BaseJudge

class MyJudge(BaseJudge):
    def judge(self, text: str) -> Dict[str, Any]:
        # Your logic here
        return {
            "safe": True,
            "reason": "Explanation",
            "confidence": 0.95
        }

Using a Custom Judge Exclusively

# Disable all default rules, use only your own judge
guard = TrustGuard(
    schema_class=GenericResponse,
    custom_rules=[],  # empty list = no default rules
    judge=my_judge,
)

๐Ÿ“Š Batch Validation

# Validate multiple responses at once
responses = [response1, response2, response3]
report = guard.validate_batch(responses, parallel=True, max_workers=4)

print(report.summary())
# Total: 3 | Passed: 2 | Failed: 1
# Top failures:
#   - PII Detected: 1

๐Ÿ“ˆ Statistics

# Track validation metrics
stats = guard.get_stats()
# {
#     "total_validations": 100,
#     "approved": 85,
#     "rejected": 15,
#     "judge_checks": 30
# }

guard.reset_stats()  # Reset counters

๐Ÿ–ฅ๏ธ CLI Usage

# Run interactive demo
trustguard --demo

# Validate a JSON string
trustguard --validate '{"content":"test","sentiment":"neutral","tone":"professional","is_helpful":true}'

# Validate from file
trustguard --file response.json

# Show version
trustguard --version

# Show help
trustguard --help

๐Ÿ“š Documentation

Comprehensive documentation is available at docs

Guide Description
Quick Start Get up and running in 5 minutes
Core Concepts Understand how trustguard works
Schema Validation Define your own response structures
Rules System Built-in validation rules
Judge System Deep dive into AI judges
API Reference Complete API documentation
Examples Real-world use cases
Contributing How to contribute

๐ŸŽฏ Use Cases

Use Case Example
Chatbots Prevent toxic responses, detect PII
Code Generation Block dangerous code patterns
Content Moderation Filter harmful content
Customer Support Ensure professional responses
Education Keep AI tutors safe and appropriate
Healthcare Validate medical information

๐Ÿ”ง Configuration

Guard Configuration

config = {
    "fail_on_judge_error": False,  # Don't crash on judge errors
    "on_error": "allow"             # Allow on errors
}

guard = TrustGuard(
    schema_class=GenericResponse,
    config=config,
    judge=my_judge
)

Judge Configuration

judge = OpenAIJudge(
    config={
        "cache_size": 1000,     # Cache last 1000 results
        "timeout": 30,           # Timeout in seconds
        "on_error": "allow",     # What to do on error
        "log_errors": True       # Log errors to console
    }
)

๐Ÿš€ Performance

Operation Speed
Rules Microseconds
Local Judge (Ollama) 50-100ms
Cloud Judge (GPT-4o-mini) 200-500ms
Batch Validation Parallel by default

Optimization Tips

  1. Use local judges for high-volume, privacy-sensitive data
  2. Cache results for repeated queries
  3. Batch validation for multiple texts
  4. Set appropriate timeouts to avoid hanging
  5. Use smaller models (phi3, gpt-4o-mini) for speed

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=trustguard --cov-report=html

# Run specific test
pytest tests/test_core.py::test_schema_validation -v

๐Ÿค Contributing

We welcome contributions! Here's how you can help:

  • ๐Ÿ› Report bugs - Open an issue
  • ๐Ÿ’ก Suggest features - Start a discussion
  • ๐Ÿ“ Improve documentation - Submit a PR
  • ๐Ÿ”ง Add new rules or judges - Follow our contributing guide
  • ๐ŸŒŸ Star the project - Show your support

See CONTRIBUTING.md for detailed guidelines.


๐Ÿ“„ License

This project is licensed under the MIT License see the LICENSE file for details.


๐Ÿ‘ฅ Authors


๐Ÿ™ Acknowledgments


๐Ÿ“Š Project Stats

Metric Value
PyPI Downloads Downloads

| Python Versions | 3.8+ | | License | MIT| | Last Release | v0.2.7 |


๐Ÿ“ฌ Support


Copyright 2026 Khalaf

Licensed under the MIT License

Star the project on GitHub to show your support โญ GitHub!

```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trustguard-0.2.7.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trustguard-0.2.7-py3-none-any.whl (35.2 kB view details)

Uploaded Python 3

File details

Details for the file trustguard-0.2.7.tar.gz.

File metadata

  • Download URL: trustguard-0.2.7.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for trustguard-0.2.7.tar.gz
Algorithm Hash digest
SHA256 685adff8bb289c8e727e897a5ec03323838ba30dd33175532caf1d0778a102e7
MD5 f898fde1fa06f862b517c653164ca29e
BLAKE2b-256 529a53dfff6219dae519aceaaf587eafe133bbb034d6930e651daadb6bf8cfbe

See more details on using hashes here.

File details

Details for the file trustguard-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: trustguard-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 35.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for trustguard-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d000c6e8b89b02faee1dc2fdacdd1ae4787cc11eec2e6e9bc1a5e07354bd0492
MD5 878d716cd4656aa2a20c09fc1cf28e2b
BLAKE2b-256 abf316eff984ac215edf0c5a701a09888a1bf975f8a96c00f5e7a85e8fae4f30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page