Pull the plug on bad AI. Fast prompt injection detection and redaction for LLM apps, agents, and RAG pipelines.
Project description
Unplug SDK
Find the attack. Cut the attack. Keep the rest.
Unplug is a runtime defense layer for LLM apps and agents. It tracks where every piece of text came from, scans untrusted content for prompt injection, and gates tool calls before they do damage. Attacks are redacted at the span level, so the rest of the document stays usable.
Why Unplug
- Span-level redaction. Binary blocking throws away the whole document. Unplug localizes the injected instruction to character offsets and removes just that.
- Provenance built in. Nothing enters as a raw string. Every text carries its source (user, retrieved, tool output) and trust level.
- Tool-call gates. Destructive calls block. Tainted sessions force review before side-effect tools run.
- Fail closed. Scanner errors block, never silently allow.
- Offline by default. Regex + normalization scanning needs zero ML dependencies. One line upgrades to the ML span model.
Install
pip install unplug-ai # regex-only core, zero ML deps
pip install "unplug-ai[ml]" # add the ML span model
Or from source:
git clone https://github.com/UnplugAI/Unplug.git && cd Unplug/sdk
uv sync && uv pip install -e ".[ml]"
60-second quickstart
from unplug import Guard
guard = Guard() # local mode, offline, regex scanners
result = guard.scan("Ignore all previous instructions", source="user")
if not result.safe:
print(result.action) # block / review / redact
print(result.redacted_text) # attack spans replaced
print(result.findings) # evidence with span offsets
Upgrade to the ML span model. Weights download once from Unplug-AI/unplug-tiny-v1 and cache locally:
guard = Guard.with_tiny()
result = guard.scan(rag_chunk, source="retrieved")
No install needed to try it: live demo on Hugging Face.
Protect an agent
Wire Unplug into any agent that fetches external content or calls tools:
- Scan user input.
guard.scan(text, source="user")capturesuser_intentfor later gates. - Wrap untrusted content before it enters LLM context.
guard.wrap_for_context(rag_chunk, source="retrieved"). Auto-wrap also runs onscan(..., source="retrieved")when[boundaries] auto_wrap_untrusted = true. - After fetch/read tools.
guard.notify_taint_source("web_fetch")so side-effect tools require review. - Before every tool call.
guard.check_tool_call(name, args, taint_sources=[...]). Destructive calls block. A tainted session plus a side-effect tool returnsREVIEW. Crescendo patterns tightenexec,web_fetch, and browser tools adaptively ([degradation]). - Scan agent output.
guard.scan_output(text). Setstrip_on_output = trueto remove boundary markers from redacted output. - New trusted turn.
guard.reset_session_taint()clears taint and degradation.
Context files (AGENTS.md and similar): guard.scan_context_file(text, filename="AGENTS.md") before loading into the system prompt.
Full walkthrough: examples/agent_exfil_demo.py shows a hidden webpage injection leading to a tainted session and a blocked exfil tool call.
Long documents and streams
Documents past 8K chars are scanned with sliding windows (2048 chars, 256 overlap) so the full text is covered, not just head and tail. Configure under [catalog.tiers.tiny.config] or unplug.toml.
# Streamed LLM output: scan incrementally, full coverage on flush
scanner = guard.stream_scanner(scan_every_chars=1024)
for chunk in token_stream:
if hit := scanner.push(chunk):
handle(hit)
result = scanner.flush()
# Or scan a finished chunk list as one document
guard.scan_stream(["part1", "part2", "part3"])
Deployment modes
| Mode | When to use | Init | ML runs where |
|---|---|---|---|
| Local regex | Dev, air-gapped, zero deps | Guard() |
Nowhere |
| Local + ML | Single agent, offline | Guard.with_tiny() or active_model="tiny" |
Agent process |
| Hosted | Production, no GPU on client | Guard(mode="server") + API key |
Unplug API |
| Local sidecar | Many local agents, one model load | Sidecar + Guard(mode="server") to localhost |
Local server |
Full architecture and decision guide: docs/DEPLOYMENT.md.
Hosted
export UNPLUG_SERVER_URL=https://api.your-unplug-host.com
export UNPLUG_API_KEY=up_live_xxxxxxxx
guard = Guard(mode="server") # or server_url= / server_api_key= in ctor
The server handles /v1/scan and /v1/scan/output. check_tool_call() always runs locally (toolchain, collusion, taint). See examples/hosted_client.py.
Local sidecar
Same wire format as hosted, run on localhost without an API key:
# Terminal 1, from the unplug-server repo
docker compose -f docker-compose.sidecar.yml up
# Terminal 2
export UNPLUG_SERVER_URL=http://127.0.0.1:8000
unplug-sidecar doctor
python examples/local_sidecar_client.py
ML model: unplug-tiny
The dual-head checkpoint has a document classifier (recall) and a BIOES span head (localization and redaction). Without it, regex + tool enforcement remain the default.
pip install "unplug-ai[ml]"
unplug-models download tiny # optional; Guard.with_tiny() auto-downloads too
# unplug.toml
active_model = "tiny"
auto_download_model = true
require_ml = true # optional fail-fast at init
UNPLUG_MODEL_PATH alone auto-selects the tiny tier; prefer setting both explicitly in production. Checkpoint layout and integration steps: docs/ML_INTEGRATION.md.
All published model metrics come from a frozen golden-eval harness on held-out data and are recorded on the model card. No hand-typed numbers, measured not target.
Verify your wiring anytime:
unplug-audit # wiring + ML status
unplug-audit --probes # FP + encoding + boundary batteries
unplug-audit --require-ml # fail if checkpoint / config / ML not active
| Check | Meaning |
|---|---|
ml_checkpoint |
Checkpoint dir found on disk |
ml_configured |
active_model set in config |
ml_active |
injection_ml loaded and weights ready |
Configuration
Copy unplug.example.toml to unplug.toml to customize scanners, tool profiles, boundaries, and limits.
| Variable | Hosted | Local ML |
|---|---|---|
UNPLUG_SERVER_URL |
required | - |
UNPLUG_API_KEY |
required if server auth on | - |
UNPLUG_ACTIVE_MODEL |
- | tiny |
UNPLUG_MODEL_PATH |
- | checkpoint dir |
UNPLUG_REQUIRE_ML |
- | optional |
Integrations
Framework hooks for LangGraph and Agno, plus framework-agnostic patterns: docs/INTEGRATIONS.md.
Threat scanners live under unplug.safeguards (canonical). The older unplug.scanners path still works but emits deprecation warnings:
from unplug.safeguards.injection import InjectionScanner
from unplug.safeguards.destructive import DestructiveScanner
from unplug.safeguards.registry import SafeguardRegistry
Examples
examples/agent_exfil_demo.py: hidden injection, tainted session, blocked exfil tool callexamples/langgraph_hooks_demo.pyandexamples/agno_hooks_demo.py: framework hooksexamples/hosted_client.pyandexamples/local_sidecar_client.py: server modesdemo/: the Gradio app behind the Hugging Face demo
Documentation
| Doc | Covers |
|---|---|
docs/DEPLOYMENT.md |
Hosted vs embedded vs sidecar architecture |
docs/BENCHMARKS.md |
Regex SDK eval results (neuralchemy, microsoft) |
docs/ML_INTEGRATION.md |
Checkpoint layout, thresholds, long-text and streaming config |
docs/INTEGRATIONS.md |
LangGraph, Agno, framework-agnostic hooks |
docs/AGENT_FLOW_SECURITY.md |
End-to-end agent hardening flow |
docs/HERMES_AGENT_SECURITY.md |
Context-file scanning for agent frameworks |
Development
cd sdk && uv sync --all-extras --dev
make fix # auto-fix lint + format
make check # lint + format check + full pytest
make check-ci # CI parity: check + exfil demo + security regression
make test-security
make audit # unplug-audit wiring
make audit-ml # unplug-audit --require-ml
Contributions welcome. See CONTRIBUTING.md.
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unplug_ai-0.2.3.tar.gz.
File metadata
- Download URL: unplug_ai-0.2.3.tar.gz
- Upload date:
- Size: 366.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64e19f95a7a3c0c9b3d85d9840ed2ce715973c78cae6672448c2f9b0c23eb579
|
|
| MD5 |
8c1553a975c6ec3ce50fe0aebe626b20
|
|
| BLAKE2b-256 |
8670d0c4b019f6510c791655d6d4c6b8ec98d0d146eea8acb057304db8a9572a
|
File details
Details for the file unplug_ai-0.2.3-py3-none-any.whl.
File metadata
- Download URL: unplug_ai-0.2.3-py3-none-any.whl
- Upload date:
- Size: 139.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e3e4820916b87ec410e01c342a0f517538520e71da8cf04ed2387c2aeecff79
|
|
| MD5 |
e505d3f357bf122102550093a0772015
|
|
| BLAKE2b-256 |
7350e073ade468fedbf4fc25796dd73bdf75763a7525c56c7cdd198e784e63dd
|