Skip to main content

Production-Grade LLM Security Framework - Protect against prompt injection, jailbreaks, and data leakage

Project description

PromptShields

Secure AI Applications in 3 Lines of Code

PyPI Python License Downloads

Stop prompt injection, jailbreaks, and data leaks in production LLM applications.


Installation

pip install promptshields

Quick Start

from promptshield import Shield

shield = Shield.balanced()
result = shield.protect_input(user_input, system_prompt)

if result['blocked']:
    print(f"Blocked: {result['reason']} (score: {result['threat_level']:.2f})")
    print(f"Breakdown: {result['threat_breakdown']}")

That's it. Production-ready security in 3 lines.


Why PromptShields?

Feature PromptShields DIY Regex Paid APIs
Setup Time 3 minutes Weeks Days
Cost Free Free $$$$
Privacy 100% Local Local Cloud
F1 Score 0.97 (RF) / 0.96 (DeBERTa) ~0.60 ~0.95
ML Models 4 + DeBERTa None Black box
Async ✅ Native DIY Varies

What We Block

  • 🛡️ Prompt injection attacks (direct + indirect)
  • 🎭 Jailbreak attempts (DAN, persona replacement)
  • 🔑 System prompt extraction
  • 🔒 PII leakage
  • 📊 Session anomalies
  • 🔤 Encoded/obfuscated attacks (Base64, URL, Unicode)

Security Modes

Choose the right tier for your application:

Shield.fast()       # ~1ms  - High throughput (pattern matching only)
Shield.balanced()   # ~2ms  - Production default (patterns + session tracking)
Shield.strict()     # ~7ms  - Sensitive apps (+ 1 ML model + PII detection)
Shield.secure()     # ~12ms - Maximum security (4 ML models ensemble)

New in v2.7.0 (Streaming & Output Protection)

Streaming LLM Output Scanning

Securely wrap real-time LLM token streams to detect leaked PII or injected prompts before the full response finishes generating.

# Auto-scans generator stream chunks in real-time
for safe_chunk in shield.protect_stream(llm.stream("Summarize...")):
    print(safe_chunk, end="")

Upgraded Zero-Leakage Classical ML

The random_forest, logistic_regression, linear_svc, and gradient_boosting ensemble models were retrained from scratch on a curated 14K dataset to completely eliminate "data leakage" false positives (e.g., blocking benign math/conversion queries). Combine with DeBERTa via Shield.balanced() for optimal F1 performance.


New in v2.6.0 (Developer Experience)

YAML Configuration

Launch shields declaratively without changing application code.

shield = Shield.from_config("promptshield.yml")

Slack / Teams Webhooks

Instantly trigger webhooks whenever high-severity threats are blocked natively.

shield = Shield.balanced(webhook_url="https://hooks.slack.com/...")

New in v2.5.0

Per-Layer Threat Breakdown

Every response now shows exactly which layer triggered:

result = shield.protect_input(user_text, system_prompt)
print(result["threat_breakdown"])
# {"pattern_score": 0.0, "ml_score": 0.994, "session_score": 0.0}

DeBERTa Support

shield = Shield(models=["deberta"])  # Auto-downloads from HuggingFace

Async Support

from promptshield import AsyncShield

shield = AsyncShield.balanced()
result = await shield.aprotect_input(user_text, system_prompt)

FastAPI Middleware

from promptshield import Shield
from promptshield.integrations.fastapi import PromptShieldMiddleware

app.add_middleware(PromptShieldMiddleware, shield=Shield.balanced())

Allowlist & Custom Rules

shield = Shield(
    patterns=True,
    models=["random_forest"],
    allowlist=["summarize this document", "translate to french"],
    custom_patterns=[r"jailbreak|dan mode|evil\s*bot"],
)

Benchmark Results

Trained on neuralchemy/Prompt-injection-dataset:

Model F1 ROC-AUC FPR Latency
Random Forest 0.969 0.994 6.9% <1ms
Logistic Regression 0.964 0.995 6.4% <1ms
Gradient Boosting 0.961 0.994 7.9% <1ms
LinearSVC 0.959 0.995 10.3% <1ms
DeBERTa-v3-small 0.959 0.950 8.5% ~50ms

Pre-trained models: neuralchemy/prompt-injection-detector · neuralchemy/prompt-injection-deberta


Documentation

📖 Full Documentation — Complete guide with framework integrations, enterprise observability, and output filtering.


License

MIT License — see LICENSE


Built by NeurAlchemy — AI Security & LLM Safety Research

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptshields-3.0.0.tar.gz (15.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

promptshields-3.0.0-py3-none-any.whl (15.9 MB view details)

Uploaded Python 3

File details

Details for the file promptshields-3.0.0.tar.gz.

File metadata

  • Download URL: promptshields-3.0.0.tar.gz
  • Upload date:
  • Size: 15.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for promptshields-3.0.0.tar.gz
Algorithm Hash digest
SHA256 199f4fca2dd0a2a00a62c98606544ffbd85b4f7ea26be9a84086154bc14c0c0b
MD5 8e94d536160333451fb16d9ba9aecb77
BLAKE2b-256 0787a834d522b47b385f6fa27b0702d5590ed4107015ad29d94ac4d16ffc581f

See more details on using hashes here.

File details

Details for the file promptshields-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: promptshields-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for promptshields-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3db507f05b47dec2a899e798ed3941c75ea8da34b787f2f5602f183ca05da8c7
MD5 b62d32d8e4a56112846677d67960bec9
BLAKE2b-256 851dc83e4109485cc1d8e2efa499a2090cfcb77094740f20df0d696cdc0bea34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page