Skip to main content

A comprehensive collection of AI guardrails built with DSPy for content moderation and security.

Project description

DSPy Guardrails

A comprehensive collection of AI guardrails built with DSPy for content moderation and security.
Explore the Documentation »
Report Bug · Request Feature

Table of Contents
  1. About
  2. Quick Start
  3. Usage
  4. Development
  5. Contributing
  6. License

About

DSPy Guardrails is a comprehensive suite of AI guardrails built with DSPy. Each guardrail is implemented as a self-contained module that can be used to test and validate different types of content moderation and security checks.

  • Modular design — Each guardrail type is implemented as a separate, self-contained module
  • Two-stage detection — A fast regex prefilter handles known patterns before any LLM call, so most requests never reach the model
  • Programmatic testing — Run guardrails directly in Python for fast iteration
  • Comprehensive coverage — Covers major content moderation and security scenarios
  • DSPy integration — Leverages DSPy's programmatic prompting for consistent, reliable results
  • Type safety — Full type hints and dataclass definitions for robust implementations

(back to top)

Quick Start

Install

Install dspy-guardrails with uv (recommended)

uv add dspy-guardrails

Install with pip (alternative)

pip install dspy-guardrails

Continue with the usage examples below.

(back to top)

Usage

Basic Usage

import dspy
from dspy_guardrails import guardrail

# Configure DSPy (required)
lm = dspy.LM("openrouter/google/gemini-2.5-flash-preview-09-2025")
guardrail.configure(lm=lm)

# Create and run guardrails
topic_guardrail = guardrail.Topic(topic_scopes=["AI", "Machine Learning"])
result = guardrail.Run(topic_guardrail, "I want to learn about neural networks")
print(f"Allowed: {result.is_allowed}")  # True

Multiple Guardrails

Assumes guardrail.configure(lm=lm) has already been called.

all_guardrails = [
    guardrail.Topic(topic_scopes=["AI"]),
    guardrail.Nsfw(),
    guardrail.Pii(),
]
result = guardrail.Run(all_guardrails, "Safe AI content")
print(f"All passed: {result.is_allowed}")  # True

Early Return

Stop execution on the first failing guardrail.

guardrails = [guardrail.Topic(topic_scopes=["AI"]), guardrail.Nsfw()]
result = guardrail.Run(guardrails, "Risky content", early_return=True)
print(result.is_allowed)

Parallel Execution

For bulk checks with multiple guardrails, fan them out concurrently on a ThreadPoolExecutor with parallel=True. Each text's guardrail fan-out runs on its own thread, so a bulk check with N guardrails takes roughly the time of the slowest single guardrail, not the sum of all of them.

result = guardrail.Run(
    [pii_gr, secret_keys_gr, prompt_injection_gr],
    "Email me at user@example.com",
    parallel=True,
    num_threads=8,    # optional thread pool size
)
print(result.is_allowed)
print(result.metadata["parallel"])    # True
print(result.metadata["num_threads"])  # 8

parallel=True only affects the aggregated path (multiple guardrails or multiple texts). It composes with early_return=True — all guardrails still execute concurrently; the result reflects the first failure, and processing stops at the first text with any failure.

For more examples and patterns, see the complete quickstart guide and guardrail types. For the regex prefilter catalog, opt-out flags, and custom-pattern reference, see docs/REGEX_PREFILTERS.md.

(back to top)

Two-Stage Detection: Regex Prefilter + LLM

Most guardrails (Pii, PromptInjection, SecretKeys, Jailbreak, Keywords, Gibberish, Toxicity, Language, Topic) run a fast regex prefilter before calling the DSPy LLM. If the prefilter matches, the guardrail short-circuits with is_allowed=False and method="regex_prefilter" in the result metadata — no model call is made, so the request is handled deterministically and at zero API cost. Only requests that pass the prefilter are sent to the LLM.

result = guardrail.Run(pii_guardrail, "Email me at user@example.com")
result.metadata["method"]    # "regex_prefilter" (fast) or "dspy" (LLM)
result.metadata["matches"]   # list of {slug, matched_text, ...} on a prefilter hit

The prefilter is opt-out per guardrail (e.g. enable_regex_prefilter=False, enable_script_prefilter=False, enable_blocked_topic_prefilter=False) and accepts user-supplied custom patterns where the API supports them (Pii.custom_patterns, SecretKeys.custom_patterns). Custom patterns are ReDoS-screened at construction time. See Parallel Execution above for running multiple guardrails concurrently.

(back to top)

Available Guardrails

Guardrail Example variables Example instantiation
Topic topic_scopes=["AI", "Machine Learning"], blocked_topics=["spam"] guardrail.Topic(topic_scopes=["AI", "Machine Learning"], blocked_topics=["spam"])
NSFW sensitivity_level="high" guardrail.Nsfw(sensitivity_level="high")
PII allowed_pii_types=["email"] guardrail.Pii(allowed_pii_types=["email"])
Toxicity toxicity_threshold=0.8 guardrail.Toxicity(toxicity_threshold=0.8)
Tone desired_tone="helpful", unwanted_tones=["sarcastic"] guardrail.Tone(desired_tone="helpful", unwanted_tones=["sarcastic"])
Grounding grounding_threshold=0.8 guardrail.Grounding(grounding_threshold=0.8)
Language allowed_languages=["en", "es"] guardrail.Language(allowed_languages=["en", "es"])
Keywords blocked_keywords=["password", "secret"], case_sensitive=False guardrail.Keywords(blocked_keywords=["password", "secret"], case_sensitive=False)
Secret Keys key_patterns=["sk-", "ghp_"], entropy_threshold=3.5 guardrail.SecretKeys(key_patterns=["sk-", "ghp_"], entropy_threshold=3.5)
Gibberish prob_threshold=0.7 guardrail.Gibberish(prob_threshold=0.7)
Prompt Injection injection_patterns=["ignore previous", "system override"] guardrail.PromptInjection(injection_patterns=["ignore previous", "system override"])
Jailbreak detection_threshold=0.9 guardrail.Jailbreak(detection_threshold=0.9)

Development

Code Quality

This project uses several tools to maintain code quality:

  • Ruff: Linting and formatting
  • isort: Import sorting
  • pytest: Testing framework

Available commands:

# Run all quality checks
uv run poe clean

# Individual checks
uv run poe lint          # Ruff linting
uv run poe format        # Ruff formatting
uv run poe sort          # Import sorting

Testing

Run tests using pytest:

# Run all tests
uv run pytest

# Run specific test
uv run pytest path/to/test.py::test_name

(back to top)

Contributing

Quick workflow:

  1. Fork and branch: git checkout -b feature/name
  2. Make changes
  3. Run checks: uv run poe clean-full
  4. Commit and push
  5. Open a Pull Request

(back to top)

License

MIT (as declared in pyproject.toml).

(back to top)


Built by thememium

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspy_guardrails-0.1.6.tar.gz (42.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspy_guardrails-0.1.6-py3-none-any.whl (61.5 kB view details)

Uploaded Python 3

File details

Details for the file dspy_guardrails-0.1.6.tar.gz.

File metadata

  • Download URL: dspy_guardrails-0.1.6.tar.gz
  • Upload date:
  • Size: 42.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dspy_guardrails-0.1.6.tar.gz
Algorithm Hash digest
SHA256 70c5e4dd7f984cbd7fa72d3bdf91927f9b266dd62915ea546dd7cd6c8851072a
MD5 fbc0416cf700c98ae53cf11d6f7c568a
BLAKE2b-256 4895492f44b9c2d99dc64bd2774ede424cce8d73bb2cb9eab602db3848e44dc1

See more details on using hashes here.

File details

Details for the file dspy_guardrails-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: dspy_guardrails-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 61.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dspy_guardrails-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f41b7e09258df7f2a82cdaf2be51037d2ed60e69960fc2eb4290deccb0790416
MD5 8fca761d643a60f7d39afd04d6a8f463
BLAKE2b-256 ff4aa0ddefc4f468ef4533287751b9ebbae3b655cb7b39f8a2ba2abbcf986e0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page