A comprehensive collection of AI guardrails built with DSPy for content moderation and security.
Project description
DSPy Guardrails
A comprehensive collection of AI guardrails built with DSPy for content moderation and security.
Explore the Documentation »
Report Bug
·
Request Feature
Table of Contents
About
DSPy Guardrails is a comprehensive suite of AI guardrails built with DSPy. Each guardrail is implemented as a self-contained module that can be used to test and validate different types of content moderation and security checks.
- Modular design — Each guardrail type is implemented as a separate, self-contained module
- Two-stage detection — A fast regex prefilter handles known patterns before any LLM call, so most requests never reach the model
- Programmatic testing — Run guardrails directly in Python for fast iteration
- Comprehensive coverage — Covers major content moderation and security scenarios
- DSPy integration — Leverages DSPy's programmatic prompting for consistent, reliable results
- Type safety — Full type hints and dataclass definitions for robust implementations
Quick Start
Install
Install dspy-guardrails with uv (recommended)
uv add dspy-guardrails
Install with pip (alternative)
pip install dspy-guardrails
Continue with the usage examples below.
Usage
Basic Usage
import dspy
from dspy_guardrails import guardrail
# Configure DSPy (required)
lm = dspy.LM("openrouter/google/gemini-2.5-flash-preview-09-2025")
guardrail.configure(lm=lm)
# Create and run guardrails
topic_guardrail = guardrail.Topic(topic_scopes=["AI", "Machine Learning"])
result = guardrail.Run(topic_guardrail, "I want to learn about neural networks")
print(f"Allowed: {result.is_allowed}") # True
Multiple Guardrails
Assumes guardrail.configure(lm=lm) has already been called.
all_guardrails = [
guardrail.Topic(topic_scopes=["AI"]),
guardrail.Nsfw(),
guardrail.Pii(),
]
result = guardrail.Run(all_guardrails, "Safe AI content")
print(f"All passed: {result.is_allowed}") # True
Early Return
Stop execution on the first failing guardrail.
guardrails = [guardrail.Topic(topic_scopes=["AI"]), guardrail.Nsfw()]
result = guardrail.Run(guardrails, "Risky content", early_return=True)
print(result.is_allowed)
For more examples and patterns, see the complete quickstart guide and guardrail types. For the regex prefilter catalog, opt-out flags, and custom-pattern reference, see docs/REGEX_PREFILTERS.md.
Two-Stage Detection: Regex Prefilter + LLM
Most guardrails (Pii, PromptInjection, SecretKeys, Jailbreak,
Keywords, Gibberish, Toxicity, Language, Topic) run a fast
regex prefilter before calling the DSPy LLM. If the prefilter
matches, the guardrail short-circuits with is_allowed=False and
method="regex_prefilter" in the result metadata — no model call is
made, so the request is handled deterministically and at zero API
cost. Only requests that pass the prefilter are sent to the LLM.
result = guardrail.Run(pii_guardrail, "Email me at user@example.com")
result.metadata["method"] # "regex_prefilter" (fast) or "dspy" (LLM)
result.metadata["matches"] # list of {slug, matched_text, ...} on a prefilter hit
The prefilter is opt-out per guardrail (e.g.
enable_regex_prefilter=False, enable_script_prefilter=False,
enable_blocked_topic_prefilter=False) and accepts user-supplied
custom patterns where the API supports them (Pii.custom_patterns,
SecretKeys.custom_patterns). Custom patterns are ReDoS-screened at
construction time.
To run multiple guardrails concurrently in a single bulk check, pass
parallel=True to guardrail.Run() (uses a ThreadPoolExecutor):
result = guardrail.Run(
[pii_gr, secret_keys_gr, prompt_injection_gr],
"Email me at user@example.com",
parallel=True,
)
Available Guardrails
| Guardrail | Example variables | Example instantiation |
|---|---|---|
| Topic | topic_scopes=["AI", "Machine Learning"], blocked_topics=["spam"] |
guardrail.Topic(topic_scopes=["AI", "Machine Learning"], blocked_topics=["spam"]) |
| NSFW | sensitivity_level="high" |
guardrail.Nsfw(sensitivity_level="high") |
| PII | allowed_pii_types=["email"] |
guardrail.Pii(allowed_pii_types=["email"]) |
| Toxicity | toxicity_threshold=0.8 |
guardrail.Toxicity(toxicity_threshold=0.8) |
| Tone | desired_tone="helpful", unwanted_tones=["sarcastic"] |
guardrail.Tone(desired_tone="helpful", unwanted_tones=["sarcastic"]) |
| Grounding | grounding_threshold=0.8 |
guardrail.Grounding(grounding_threshold=0.8) |
| Language | allowed_languages=["en", "es"] |
guardrail.Language(allowed_languages=["en", "es"]) |
| Keywords | blocked_keywords=["password", "secret"], case_sensitive=False |
guardrail.Keywords(blocked_keywords=["password", "secret"], case_sensitive=False) |
| Secret Keys | key_patterns=["sk-", "ghp_"], entropy_threshold=3.5 |
guardrail.SecretKeys(key_patterns=["sk-", "ghp_"], entropy_threshold=3.5) |
| Gibberish | prob_threshold=0.7 |
guardrail.Gibberish(prob_threshold=0.7) |
| Prompt Injection | injection_patterns=["ignore previous", "system override"] |
guardrail.PromptInjection(injection_patterns=["ignore previous", "system override"]) |
| Jailbreak | detection_threshold=0.9 |
guardrail.Jailbreak(detection_threshold=0.9) |
Development
Code Quality
This project uses several tools to maintain code quality:
- Ruff: Linting and formatting
- isort: Import sorting
- pytest: Testing framework
Available commands:
# Run all quality checks
uv run poe clean
# Individual checks
uv run poe lint # Ruff linting
uv run poe format # Ruff formatting
uv run poe sort # Import sorting
Testing
Run tests using pytest:
# Run all tests
uv run pytest
# Run specific test
uv run pytest path/to/test.py::test_name
Contributing
Quick workflow:
- Fork and branch:
git checkout -b feature/name - Make changes
- Run checks:
uv run poe clean-full - Commit and push
- Open a Pull Request
License
MIT (as declared in pyproject.toml).
Built by thememium
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dspy_guardrails-0.1.5.tar.gz.
File metadata
- Download URL: dspy_guardrails-0.1.5.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c294cf4cf7f47facdcbb5569098aa766e166bd0db865ee773892af9406e0c12
|
|
| MD5 |
b4712f1040d32fb5c3b0db9311d541fc
|
|
| BLAKE2b-256 |
98a5acea6c25cbf67b403d2838df47fb54eaf1ea6160b3f1654c1de936e45847
|
File details
Details for the file dspy_guardrails-0.1.5-py3-none-any.whl.
File metadata
- Download URL: dspy_guardrails-0.1.5-py3-none-any.whl
- Upload date:
- Size: 61.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1121e2b3ec516f3f8069d01d356fad5fad43adc02bead9331e4450fc29c1751b
|
|
| MD5 |
65fc39a85b807132fa8bd236adb59ce8
|
|
| BLAKE2b-256 |
2ec4989f0d513f3dbd7fa063784d40f4395ae6517e71619e1de9fb6d1a557496
|