Capability-based IFC, streaming-native cascaded guardrails, and an eval-to-guardrail compiler for LLM agents.
Project description
AIsafePy
Capability-based information-flow control, streaming-native cascaded guardrails, and a continuous eval-to-guardrail compiler for LLM agents.
AIsafePy fills three gaps that the existing OSS guardrails ecosystem (NeMo, Guardrails AI, llm-guard, LlamaFirewall, OpenAI Guardrails) has not closed:
aisafepy.flow. Capability-based, taint-propagating runtime around tool-calling agents (CaMeL / FIDES / RTBAS-style information-flow control), packaged as drop-in adapters for OpenAI Agents SDK, LangGraph, LlamaIndex, Anthropic tools, and MCP servers.aisafepy.stream. Streaming-native cascaded guardrails with deterministic Tier-1, small-classifier Tier-2, and optional white-box activation probes / LLM-judge Tier-3, plus an explicit p95 latency budget and structuredGuardDecisions.aisafepy.adapt. A continuous eval-to-guardrail compiler that promotes PyRIT / Garak / Inspect failures into runtime guards: distilled classifiers, synthesized regexes, Cedar/OPA policy rules, steering vectors (for self-hosted models), and deliberative cases.
Status
Alpha (v0.1). API surface is stable enough to build against, but expect rough edges and missing optional dependencies in the heavier extras.
Install
pip install aisafepy # core only
pip install "aisafepy[stream]" # + HF classifiers, regex, deterministic Tier 1/2
pip install "aisafepy[probes]" # + linear/MLP activation probes for HF models
pip install "aisafepy[adapt]" # + clustering and compiler targets
pip install "aisafepy[flow-openai]" # + OpenAI Agents SDK adapter
pip install "aisafepy[all]" # everything except contrib-* extras
For development:
uv venv
uv pip install -e ".[dev,all]"
uv run pytest
Quickstart
flow: defeating indirect prompt injection by construction
from aisafepy.flow import Policy, Capability, secure_agent, Tainted
from agents import Agent, Runner # openai-agents
policy = (
Policy()
.label_source("web.fetch", integrity="UNTRUSTED")
.label_source("gmail.read", integrity="UNTRUSTED", caps={Capability.READ_USER})
.label_source("user_prompt", integrity="TRUSTED")
.require("send_email", control_flow_integrity="TRUSTED")
.require("payments.transfer", control_flow_integrity="TRUSTED",
caps={Capability.WRITE_EXTERNAL})
.deny_if("send_email",
when=lambda to, body: "read.secrets" in body.provenance,
reason="secret-to-external-sink")
)
agent = Agent(name="ops-bot", tools=[gmail_read, web_fetch, send_email, transfer])
safe_agent = secure_agent(agent, policy=policy)
result = Runner.run_sync(safe_agent, "Read my last email and act on it.")
stream: cascaded guardrails with a latency budget
from aisafepy.stream import (
GuardPipeline, RegexGuard, ClassifierGuard, probes,
)
pipeline = GuardPipeline(
tier1=[
RegexGuard.compile_pii(),
RegexGuard.blocklist(["api_key=", "BEGIN PRIVATE"]),
],
tier2=[ClassifierGuard.from_hf("meta-llama/Llama-Prompt-Guard-2-22M")],
tier3=[ClassifierGuard.from_hf("meta-llama/Llama-Guard-4")],
budget_ms_p95=80,
)
async for chunk_or_decision in pipeline.guard_stream(model.generate_stream(prompt)):
if hasattr(chunk_or_decision, "action"):
log_otel(chunk_or_decision)
break
yield chunk_or_decision
adapt: PyRIT failures → deployed guard pipeline
from aisafepy.adapt import PyRITSource, GuardCompiler, Target, promote
from aisafepy.stream import GuardPipeline
source = PyRITSource(memory_db="pyrit_memory.duckdb")
compiler = GuardCompiler(
source=source,
targets=[
Target.distill_classifier(base="meta-llama/Llama-Prompt-Guard-2-22M"),
Target.synthesize_regex(min_precision=0.99),
Target.steering_vector(model="Qwen/Qwen3-8B-Instruct"),
Target.deliberative_case(policy="policies/company_safety.md"),
],
min_attack_success_rate=0.05,
)
report = compiler.compile()
promote(report, to=GuardPipeline.from_yaml("guards.yaml"),
canary_traffic_pct=1.0, fp_budget=0.005)
Layout
src/aisafepy/
├── core/ # shared primitives: GuardDecision, telemetry, budgets, progress, policies
├── flow/ # Gap 1. Capability-based IFC
│ └── adapters/ # openai_agents, langgraph, llamaindex, anthropic_tools, mcp
├── stream/ # Gap 2. Streaming cascade
│ └── adapters/ # openai_agents, langchain, llamaindex
├── adapt/ # Gap 3. Eval-to-guardrail compiler
│ └── compile/ # classifier, regex, policy, steering, deliberative
└── contrib/ # thin wrappers: presidio, llama_guard, shield_gemma, prompt_guard, llm_guard, lakera
Design principles
- Pythonic, not DSL-first. Decorators and types, not Colang. Cedar / OPA appears only as an emission target inside
adapt.compile.policy. - Composable primitives. Every guard is a
Callable[[Context], Awaitable[GuardDecision]]. Pipelines, IFC, andadaptall consume and produce this type. - Bring your own model. No proprietary models are shipped.
contrib/wraps Llama Guard 4, ShieldGemma, Prompt Guard 2, llm-guard, Lakera, Presidio. - Defense in depth.
flow(architectural) +stream(detective) +adapt(continuous) compose. - Observability is a first-class output. Structured
GuardDecision/IFCViolation, OpenTelemetry-native, with explicitwhy_blocked+evidence. - Self-hosted parity. Probe-based and steering-based features work on HF Transformers; hosted APIs fall back to classifier guards.
Caveats
Capability-based defenses reduce risk dramatically but are not free. CaMeL reports ~2.7× tokens, RTBAS ~2% utility loss. Streaming forecasters require MC rollouts or token-level supervision to train. Activation probes are model-specific. AIsafePy does not solve sleeper-agent / deceptive-alignment problems. See docs/CAVEATS.md.
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aisafepy-0.1.0.tar.gz.
File metadata
- Download URL: aisafepy-0.1.0.tar.gz
- Upload date:
- Size: 108.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5a67dc539b68eb175ac59a14cc0b6d637a71d29b992b5c2df87de9cc1b2bc0a
|
|
| MD5 |
6124b9be72ddd1133f8103e445afa66d
|
|
| BLAKE2b-256 |
867cb060d7dbf25d7d50e0cdc78fedb14b10f962323aba681d52c4a5ece0dd87
|
File details
Details for the file aisafepy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aisafepy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 88.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ad4dd997c57259eb480003cd4e1180d67a1c0c74d6df00e63387ebdacec9824
|
|
| MD5 |
db8aaed1616c03c4b59d6bcf7069fd22
|
|
| BLAKE2b-256 |
313595214363a5fec53884dddf05662c38d1f04ffcde16fc2b47b2cee0479c8a
|