aisafepy

Capability-based IFC, streaming-native cascaded guardrails, and an eval-to-guardrail compiler for LLM agents.

These details have not been verified by PyPI

Project links

Project description

AIsafePy

Capability-based information-flow control, streaming-native cascaded guardrails, and a continuous eval-to-guardrail compiler for LLM agents.

AIsafePy fills three gaps that the existing OSS guardrails ecosystem (NeMo, Guardrails AI, llm-guard, LlamaFirewall, OpenAI Guardrails) has not closed:

aisafepy.flow. Capability-based, taint-propagating runtime around tool-calling agents (CaMeL / FIDES / RTBAS-style information-flow control), packaged as drop-in adapters for OpenAI Agents SDK, LangGraph, LlamaIndex, Anthropic tools, and MCP servers.
aisafepy.stream. Streaming-native cascaded guardrails with deterministic Tier-1, small-classifier Tier-2, and optional white-box activation probes / LLM-judge Tier-3, plus an explicit p95 latency budget and structured GuardDecisions.
aisafepy.adapt. A continuous eval-to-guardrail compiler that promotes PyRIT / Garak / Inspect failures into runtime guards: distilled classifiers, synthesized regexes, Cedar/OPA policy rules, steering vectors (for self-hosted models), and deliberative cases.

Status

Alpha (v0.1). API surface is stable enough to build against, but expect rough edges and missing optional dependencies in the heavier extras.

Install

pip install aisafepy                       # core only
pip install "aisafepy[stream]"             # + HF classifiers, regex, deterministic Tier 1/2
pip install "aisafepy[probes]"             # + linear/MLP activation probes for HF models
pip install "aisafepy[adapt]"              # + clustering and compiler targets
pip install "aisafepy[flow-openai]"        # + OpenAI Agents SDK adapter
pip install "aisafepy[all]"                # everything except contrib-* extras

For development:

uv venv
uv pip install -e ".[dev,all]"
uv run pytest

Quickstart

`flow`: defeating indirect prompt injection by construction

from aisafepy.flow import Policy, Capability, secure_agent, Tainted
from agents import Agent, Runner  # openai-agents

policy = (
    Policy()
    .label_source("web.fetch", integrity="UNTRUSTED")
    .label_source("gmail.read", integrity="UNTRUSTED", caps={Capability.READ_USER})
    .label_source("user_prompt", integrity="TRUSTED")
    .require("send_email", control_flow_integrity="TRUSTED")
    .require("payments.transfer", control_flow_integrity="TRUSTED",
             caps={Capability.WRITE_EXTERNAL})
    .deny_if("send_email",
             when=lambda to, body: "read.secrets" in body.provenance,
             reason="secret-to-external-sink")
)

agent = Agent(name="ops-bot", tools=[gmail_read, web_fetch, send_email, transfer])
safe_agent = secure_agent(agent, policy=policy)
result = Runner.run_sync(safe_agent, "Read my last email and act on it.")

`stream`: cascaded guardrails with a latency budget

from aisafepy.stream import (
    GuardPipeline, RegexGuard, ClassifierGuard, probes,
)

pipeline = GuardPipeline(
    tier1=[
        RegexGuard.compile_pii(),
        RegexGuard.blocklist(["api_key=", "BEGIN PRIVATE"]),
    ],
    tier2=[ClassifierGuard.from_hf("meta-llama/Llama-Prompt-Guard-2-22M")],
    tier3=[ClassifierGuard.from_hf("meta-llama/Llama-Guard-4")],
    budget_ms_p95=80,
)

async for chunk_or_decision in pipeline.guard_stream(model.generate_stream(prompt)):
    if hasattr(chunk_or_decision, "action"):
        log_otel(chunk_or_decision)
        break
    yield chunk_or_decision

`adapt`: PyRIT failures → deployed guard pipeline

from aisafepy.adapt import PyRITSource, GuardCompiler, Target, promote
from aisafepy.stream import GuardPipeline

source = PyRITSource(memory_db="pyrit_memory.duckdb")
compiler = GuardCompiler(
    source=source,
    targets=[
        Target.distill_classifier(base="meta-llama/Llama-Prompt-Guard-2-22M"),
        Target.synthesize_regex(min_precision=0.99),
        Target.steering_vector(model="Qwen/Qwen3-8B-Instruct"),
        Target.deliberative_case(policy="policies/company_safety.md"),
    ],
    min_attack_success_rate=0.05,
)
report = compiler.compile()
promote(report, to=GuardPipeline.from_yaml("guards.yaml"),
        canary_traffic_pct=1.0, fp_budget=0.005)

Layout

src/aisafepy/
├── core/           # shared primitives: GuardDecision, telemetry, budgets, progress, policies
├── flow/           # Gap 1. Capability-based IFC
│   └── adapters/   # openai_agents, langgraph, llamaindex, anthropic_tools, mcp
├── stream/         # Gap 2. Streaming cascade
│   └── adapters/   # openai_agents, langchain, llamaindex
├── adapt/          # Gap 3. Eval-to-guardrail compiler
│   └── compile/    # classifier, regex, policy, steering, deliberative
└── contrib/        # thin wrappers: presidio, llama_guard, shield_gemma, prompt_guard, llm_guard, lakera

Design principles

Pythonic, not DSL-first. Decorators and types, not Colang. Cedar / OPA appears only as an emission target inside adapt.compile.policy.
Composable primitives. Every guard is a Callable[[Context], Awaitable[GuardDecision]]. Pipelines, IFC, and adapt all consume and produce this type.
Bring your own model. No proprietary models are shipped. contrib/ wraps Llama Guard 4, ShieldGemma, Prompt Guard 2, llm-guard, Lakera, Presidio.
Defense in depth. flow (architectural) + stream (detective) + adapt (continuous) compose.
Observability is a first-class output. Structured GuardDecision / IFCViolation, OpenTelemetry-native, with explicit why_blocked + evidence.
Self-hosted parity. Probe-based and steering-based features work on HF Transformers; hosted APIs fall back to classifier guards.

Caveats

Capability-based defenses reduce risk dramatically but are not free. CaMeL reports ~2.7× tokens, RTBAS ~2% utility loss. Streaming forecasters require MC rollouts or token-level supervision to train. Activation probes are model-specific. AIsafePy does not solve sleeper-agent / deceptive-alignment problems. See docs/CAVEATS.md.

License

Apache-2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aisafepy-0.1.0.tar.gz (108.7 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aisafepy-0.1.0-py3-none-any.whl (88.0 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file aisafepy-0.1.0.tar.gz.

File metadata

Download URL: aisafepy-0.1.0.tar.gz
Upload date: May 11, 2026
Size: 108.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for aisafepy-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c5a67dc539b68eb175ac59a14cc0b6d637a71d29b992b5c2df87de9cc1b2bc0a`
MD5	`6124b9be72ddd1133f8103e445afa66d`
BLAKE2b-256	`867cb060d7dbf25d7d50e0cdc78fedb14b10f962323aba681d52c4a5ece0dd87`

See more details on using hashes here.

File details

Details for the file aisafepy-0.1.0-py3-none-any.whl.

File metadata

Download URL: aisafepy-0.1.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 88.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for aisafepy-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ad4dd997c57259eb480003cd4e1180d67a1c0c74d6df00e63387ebdacec9824`
MD5	`db8aaed1616c03c4b59d6bcf7069fd22`
BLAKE2b-256	`313595214363a5fec53884dddf05662c38d1f04ffcde16fc2b47b2cee0479c8a`

See more details on using hashes here.

aisafepy 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AIsafePy

Status

Install

Quickstart

`flow`: defeating indirect prompt injection by construction

`stream`: cascaded guardrails with a latency budget

`adapt`: PyRIT failures → deployed guard pipeline

Layout

Design principles

Caveats

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

aisafepy 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AIsafePy

Status

Install

Quickstart

flow: defeating indirect prompt injection by construction

stream: cascaded guardrails with a latency budget

adapt: PyRIT failures → deployed guard pipeline

Layout

Design principles

Caveats

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`flow`: defeating indirect prompt injection by construction

`stream`: cascaded guardrails with a latency budget

`adapt`: PyRIT failures → deployed guard pipeline