Deterministic policy runtime for AI agents. Same policy file gates CI and enforces in production, with no LLM in the decision loop.

These details have not been verified by PyPI

Project links

Project description

Mirage

Mirage is the open-source policy gateway for AI agents in production. The same YAML policy file enforces the gateway at runtime and gates the CI build pre-merge. No LLM in the decision loop.

Mirage review console and workflow preview

Screenshot: Mirage review console over a risky procurement run trace.

Mirage sits between an agent and the rest of the world. Every outbound action is evaluated against a portable policy DSL and decided deterministically: allow, block, or flag. The same policies.yaml enforces a runtime gateway in production and gates the CI build pre-merge.

Why Mirage exists

Agents do not just generate text. They submit bids, mutate billing systems, file tickets, push code, and call APIs that move money. A bad retry, hallucinated route, or out-of-policy payload can charge a customer twice, leak data across tenants, or ship a regression that only surfaces after deploy.

Existing safety tooling either grades outputs with an LLM judge (flaky, stochastic, and unsafe to fail-build a CI run on) or bundles into one framework or one cloud, with the lock-in that implies. Mirage is the deterministic, framework-agnostic layer underneath: a policy DSL that runs the same file in production and in CI, with no model in the decision loop.

Positioning

Mirage is the open-source policy gateway for AI agents. The same policy file runs in two modes:

Gateway mode: agent runs against real upstreams, every action evaluated, deterministic trace emitted, configurable enforcement (passthrough+log or hard-block on violation). This is the production runtime.
CI mode: agent runs against mocked responses, every action evaluated, deterministic trace emitted, build fails on policy regression. This is the pre-merge gate that lets a team adopt Mirage safely without touching production traffic on day one.

One policy file. Two modes. The decision is rule-based; no LLM judges, no stochastic verdicts.

How Mirage is different

The agent-safety landscape splits into three buckets. Mirage sits in a fourth.

Quality eval (LangSmith, Braintrust, Patronus, Galileo, Maxim, Arize, Future AGI): graders that score whether a model's output was good. They run an LLM judge in the loop. Useful for response quality. Cannot deterministically fail a CI build, cannot ground a SOC2/HIPAA control. Mirage doesn't compete; it sits one layer down: agent actions, not response quality, evaluated by rules, not models.
Observability (Sentrial, Laminar, Helicone, Langfuse, Lucidic): passive watching. Tells you what happened. Doesn't enforce. Mirage is enforcement.
Bundled framework guardrails (Microsoft Agent Governance Toolkit, OpenAI Agents SDK callbacks, NeMo Guardrails, LangChain callbacks): shipped inside one framework or one cloud. Useful if you live entirely inside that vendor's stack. Mirage is framework-agnostic: the same policies.yaml runs against any agent that crosses an HTTP boundary you control.
Deterministic policy runtime: the layer Mirage occupies. A portable policy DSL, evaluated by rules, that runs the same file in production (against real upstreams) and in CI (against mocks). No LLM in the decision loop. This is the line none of the above can say.
Production-tested: Mirage ships a reproducible benchmark harness with containment-rate, false-positive-rate, and decision-latency numbers. See BENCHMARKS.md for current scores and the methodology.

Direct comparisons

Salus, Playgent, Cascade, Clam (YC agent-infra cohort): adjacent and uncrowded, but each ships a different shape. Salus is a runtime engine that wraps and checks actions; Mirage is the portable policy language that sits above the engine. Playgent is sandbox+mocks for testing; Mirage runs the same policy file in test and in production. Cascade learns from observed failures; Mirage enforces declarative rules. Clam is a network-layer firewall with prompt-injection scanning; Mirage is a policy DSL one layer up.
Microsoft Agent Governance Toolkit (April 2026, MIT): covers OWASP Top 10 agentic risks with framework-bundled SDK helpers (LangChain, CrewAI, LangGraph, OpenAI Agents SDK). Excellent if you live inside the Azure/MSFT stack. Mirage is the framework-agnostic alternative: same policy file, any framework, any cloud, deterministic decisions, exportable policy artifacts that survive a stack migration.
Future AGI: closed-loop agent platform with simulation, eval, observability, and "Protect" guardrails. Their evaluation surface is LLM-judged (hallucination, factuality, toxicity scores). Mirage is the opposite category: deterministic action policies, not LLM-graded outputs. Different buyer, different decision class, different SLA shape.
Runtime LLM-judge guards (Llama Guard, policy agents that prompt a model to decide allow/block): arbitrate via a model. Mirage arbitrates via rules. CI can deterministically gate on rules; it can't on judges.

Dev-tool overlap

Mirage's CI mode looks superficially like HTTP-mocking libraries. The wedge is run-scoped policy enforcement, not per-test response stubbing:

pytest-httpx / respx: per-test httpx mocking with response stubs. Mirage is run-scoped: one MirageSession spans an entire agent run, enforces a declarative policy file (not just response stubs), and writes a trace you can gate CI on via assert_clean() or mirage gate-run.
VCR.py: record-and-replay cassettes. Mirage does not record; it evaluates against a policy so a brand-new risky action is caught on its first appearance, not only after a cassette exists.
WireMock / mitmproxy: general-purpose mock servers and intercepting proxies. Mirage is narrower: declarative policy + deterministic decision + trace, tuned for agent action review.

When not to use Mirage today

Your agent doesn't cross an HTTP boundary you control (direct DB writes, filesystem mutation, subprocess calls; none of those go through Mirage yet).
Your decision criteria are inherently subjective ("did the answer sound right?"); that's an LLM-judge problem, not a policy-rule problem.
You have no CI step that can run the agent and no staging environment that can route through a gateway. Mirage's value is enforcement at one of those two boundaries.

See it in 60 seconds

Two starting points. Pick whichever matches where you are today.

60 seconds, production gateway

pip install mirage-ci

In one terminal, start the gateway against a real upstream with the bundled PII-redaction example policy:

mirage gateway \
  --upstream https://your-api.example.com \
  --mode passthrough \
  --policies-path examples/policies/pii_redaction.yaml

In a second terminal, send a request with an SSN-shaped string in the body:

curl -X POST http://127.0.0.1:8001/v1/customer/profile \
  -H "Content-Type: application/json" \
  -d '{"name": "Alice", "ssn": "123-45-6789"}'

Mirage logs a flagged policy decision and forwards the request (the right default for a new deployment). Switch --mode enforce and the same request returns HTTP 403 with the failed policy decisions in the response body.

60 seconds, CI gate

pip install mirage-ci

In one terminal, start the Mirage proxy with the bundled example mocks and policies:

python -m uvicorn mirage.proxy:app --host 127.0.0.1 --port 8000

In a second terminal, submit a bid above the policy limit and gate the run:

python <<'PY'
from mirage import MirageSession

with MirageSession(run_id="sixty-second-demo") as mirage:
    mirage.post("/v1/submit_bid", json={"bid_amount": 99999})
PY

mirage gate-run --run-id sixty-second-demo

Mirage flags the bid as a policy_violation and gate-run exits non-zero, the same signal that fails a CI build:

Mirage run: sixty-second-demo
Summary: 1 action(s), 0 safe, 1 risky
Risky actions:
- [policy_violation] POST /v1/submit_bid (event 1, mock=submit_bid):
  enforce_bid_limit: Agents cannot submit bids above the approved threshold.
  (bid_amount lte 10000, got 99999)

For the bundled multi-step procurement harness (requires a repo checkout), see examples/procurement_harness/README.md.

Policies you can express

Mirage ships five real-world example policies in examples/policies/. Each is a starting point: copy one into your own policy file, narrow path and method to your endpoints, and tighten the regex or limits to your domain.

File	What it prevents
`pii_redaction.yaml`	SSNs, payment-card numbers, and email addresses leaking into outbound payloads
`prompt_injection.yaml`	Common prompt-injection markers in outbound payload text
`outbound_allowlist.yaml`	Outbound HTTP traffic to hosts that are not on an allowlist
`cost_guard.yaml`	Agents spending above approved per-call thresholds (bids, refunds, transfers)
`output_length_cap.yaml`	Runaway agent text generation flooding downstream systems

Operator coverage in the policy DSL today: exists, eq, neq, lt, lte, gt, gte, in, not_in, regex_match, not_regex_match, contains, not_contains, starts_with, not_starts_with, ends_with, length_lte, length_gte, host_in, host_not_in. New operators land as v-minor releases.

Framework integrations

Mirage's policy decisions are the same regardless of the framework an agent runs on. The integration adapters are thin glue that route framework-native tool calls through a Mirage gateway.

OpenAI Agents SDK: mirage.integrations.openai_agents.wrap_with_mirage wraps an Agent so every tool call is policy-checked first. See docs/INTEGRATIONS_OPENAI_AGENTS_SDK.md for the minimal example, the contrast to OpenAI's own model-graded guardrails, and the install command (pip install mirage-ci[openai-agents]).
LangChain: mirage.integrations.langchain.wrap_with_mirage wraps a LangChain AgentExecutor so every tool call is policy-checked first. Includes a configurable payload_mapper for matching custom policy field shapes. See docs/INTEGRATIONS_LANGCHAIN.md and pip install mirage-ci[langchain].

Benchmarks

Run the benchmark suite with make bench. See BENCHMARKS.md for the methodology, the synthetic scenarios, and current scoring numbers (containment rate, false-positive rate, decision-latency p50/p95/p99). Benchmarks are reproducible from the benchmarks/ directory.

Start here

Want to understand the product quickly: read docs/README.md
Want the current alpha snapshot: read docs/releases/v0.1.0.md
Want to integrate Mirage into your own agent: read docs/FIRST_INTEGRATION.md
Want the framework-agnostic integration paths: read docs/INTEGRATION_PATTERNS.md
Want to wire Mirage into CI: read docs/CI_INTEGRATION.md
Want to try the bundled workflow first: read examples/procurement_harness/README.md
Want the straight licensing/commercial answer: read docs/OPEN_SOURCE_FAQ.md

What ships today (v0.2.0)

The deterministic policy runtime, both halves:

Gateway mode (mirage.gateway, mirage gateway)

same policies.yaml evaluated against real upstream traffic
passthrough mode: forward every request, log policy decisions, do not block (the right starting mode for a new deployment)
enforce mode: forward when policy passes, block with HTTP 403 when it fails
four outcomes per action: allowed, flagged, blocked, error
trace events carry a mode discriminator so dashboards can distinguish gateway runs from CI runs

CI mode (mirage.proxy, MirageSession, mirage gate-run)

declarative policy DSL (policies.yaml)
mocked responses (mocks.yaml) for deterministic CI runs
run-scoped trace store with structured policy decisions
mirage gate-run exits non-zero on regression; drop-in fail-build for any CI
four outcomes per action: allowed, policy_violation, unmatched_route, config_error

Shared substrate

PolicyEvaluator: pure, deterministic, mock-free; identical decisions in gateway and CI
containment-rate, decision-latency, and time-to-decide metrics surfaced in the console and the /api/runs/{run_id}/containment endpoint
review console over the trace store, both legacy HTML and a Next.js operator client
Python-first integration via MirageSession (CI) or any HTTP client (gateway); httpx-native, framework-agnostic
container-ready (Dockerfile + docker-compose)

Framework integrations

OpenAI Agents SDK adapter (mirage.integrations.openai_agents); see docs/INTEGRATIONS_OPENAI_AGENTS_SDK.md for the minimal example and the model-graded versus rule-graded comparison

Benchmarks

reproducible benchmark harness with three synthetic scenarios (PII leak, prompt injection, cost runaway), containment rate, false-positive rate, and decision-latency percentiles. See BENCHMARKS.md.

What ships next (v0.3 and beyond)

chaos-library testing harness: prove policies hold under hostile environments
adversarial benchmark scenarios with realistic false-positive surfaces
async-native gating in the LangChain adapter

The mission sentence is the contract: same policy file, production and CI, no LLM in the decision loop.

Gateway forwarding behavior

The gateway is a deliberate, in-line proxy. When you point mirage gateway at an upstream URL, every request the agent makes flows through it on the way to that upstream. A few load-bearing details, intentional but worth being explicit about:

Authorization, Cookie, and other application headers are forwarded unchanged. Upstream auth has to keep working through the gateway, so the gateway does not strip credentials. Pointing --upstream at an unintended host would forward those credentials to the wrong destination.
Mirage strips X-Mirage-* headers before forwarding (so internal trace metadata never leaks to the upstream) and strips standard hop-by-hop headers (Connection, Transfer-Encoding, Upgrade, Host, Content-Length, etc.) before forwarding so they do not interfere with the outbound request.
The gateway never holds plaintext secrets in the trace. Request and response bodies are stored verbatim in the trace store; if your payload contains secrets, you are responsible for either redacting in policy or not putting them in the body.
Operator responsibility. --upstream is the load-bearing config knob. The right deployment posture is: bind the gateway to a private network, scope the upstream URL to a single host you own, and start in passthrough mode before flipping to enforce.
Log-only first. passthrough mode is the recommended starting state. Run real traffic through it, watch the trace store fill with flagged events, tune your policies, then switch to enforce once the containment rate is at the floor you want.

Quickstart

Requires Python 3.11+.

pip install mirage-ci

The package installs as mirage-ci on PyPI and imports as mirage:

from mirage import MirageSession

It also exposes a mirage console script. If that script is not on your PATH, use python -m mirage.cli ... directly.

For a development checkout (editable install from source), see Contributing.

Integrate your own agent

The canonical Mirage integration is MirageSession. One run ID, an httpx client surface the agent uses directly, one assertion point for CI.

from mirage import MirageSession

with MirageSession(run_id="demo-run") as mirage:
    response = mirage.post(
        "/v1/submit_bid",
        json={"contract_id": "STANDARD-7", "bid_amount": 7500},
    )
    summary = mirage.assert_clean()
    print(summary.trace_path)

For the full 30-minute walkthrough of pointing Mirage at your own agent, see docs/FIRST_INTEGRATION.md. For CI gating recipes (pytest and GitHub Actions), see docs/CI_INTEGRATION.md.

Try the bundled procurement harness

If you want to see Mirage working on a realistic pre-built workflow before integrating your own agent:

make proxy-procurement

In a second terminal:

make procurement-demo-safe
make test-procurement

Run with Docker:

docker compose up --build

That Docker path starts the Mirage proxy with the procurement harness config on http://localhost:8000.

MirageSession

MirageSession is the recommended path for:

local developer runs
pytest integration tests
CI gates on risky actions

For agent code that already expects a client-like object:

from examples.procurement_harness.agent import ProcurementAgent
from mirage import MirageSession

with MirageSession(run_id="procurement-safe") as mirage:
    agent = ProcurementAgent(mirage)
    result = agent.run_compliant_bid_workflow()
    summary = mirage.assert_clean()
    print(result.action.mirage.outcome)
    print(summary.to_text())

Alternative: per-response primitives

If you want per-response access instead of a run-level session, the lower-level httpx primitives remain available:

from mirage.httpx_client import (
    assert_mirage_response_safe,
    create_mirage_client,
    mirage_response_report,
)

with create_mirage_client(run_id="demo-run") as client:
    response = client.post(
        "/v1/submit_bid",
        json={"contract_id": "STANDARD-7", "bid_amount": 7500},
    )
    report = mirage_response_report(response)
    assert_mirage_response_safe(response)
    print(report.trace_path)

Mirage adds response metadata headers so tests and agents can inspect what happened without changing the mocked response body:

X-Mirage-Run-Id
X-Mirage-Outcome
X-Mirage-Policy-Passed
X-Mirage-Trace-Path
X-Mirage-Matched-Mock
X-Mirage-Message
X-Mirage-Decision-Summary

CI gating

Mirage now has a run-level CLI for CI or shell workflows:

make mirage-summary RUN_ID=procurement-risky-demo
make mirage-gate RUN_ID=procurement-risky-demo

Equivalent direct commands:

python -m mirage.cli summarize-run --run-id procurement-risky-demo
python -m mirage.cli gate-run --run-id procurement-risky-demo
python -m mirage.cli validate-config

gate-run exits non-zero when the run is risky or missing, so it can fail CI directly. validate-config exits non-zero when Mirage config is missing or malformed, so you can fail fast before starting the proxy.

For complete GitHub Actions and pytest recipes, see docs/CI_INTEGRATION.md.

If your agent does not already use `httpx`

Mirage does not require your whole stack to be built directly on httpx. It only needs the outbound action path to cross a client boundary you control.

If your SDK or framework lets you inject a base URL, transport, or HTTP client, point that boundary at Mirage.
If your orchestration layer hides HTTP completely, wrap the side-effecting calls in your own gateway and test that gateway with Mirage.
If you only need a starting point, intercept writes first: bids, orders, ticket creation, CRM updates, or billing actions.

See docs/INTEGRATION_PATTERNS.md for the concrete patterns.

Config

The primary onboarding config now lives in:

When you run Mirage from a repo checkout, local mocks.yaml and policies.yaml remain the default fallback config. Installed Mirage also ships bundled example defaults, so the CLI and proxy still boot outside the source tree.

Example policy:

policies:
  - name: enforce_bid_limit
    method: POST
    path: /v1/submit_bid
    field: bid_amount
    operator: lte
    value: 10000
    message: Agents cannot submit bids above the approved threshold.

Optional environment variables:

MIRAGE_PROXY_URL
MIRAGE_RUN_ID
MIRAGE_MOCKS_PATH
MIRAGE_POLICIES_PATH
MIRAGE_ARTIFACT_ROOT

Validate config before a local run or CI job:

make mirage-validate-config

Procurement harness

The default onboarding path now lives in examples/procurement_harness/.

It gives one coherent workflow instead of isolated request demos:

look up an approved supplier
submit a compliant or risky bid
inspect Mirage outcomes and trace paths

Primary commands:

make proxy-procurement
make procurement-demo-safe
make procurement-demo-risky
make procurement-demo-unmatched
make test-procurement

Harness docs:

examples/procurement_harness/README.md

Action review console

Mirage currently ships two console surfaces over the same review backend:

demo_ui/: the shared FastAPI console API plus a zero-dependency legacy HTML shell
ui/: a richer Next.js operator client that consumes that API

Both read Mirage trace artifacts, show aggregate action metrics, surface recent risky runs, and let you drill into one run at a time.

The shared backend still supports the scenario launcher for founder demos, but the primary value of the console is now:

aggregate action counts across runs
review queue for recent runs that need attention
top endpoints by action volume
top policy failures
containment rate (fleet-wide and per-run)
overview-first run detail with request, outcome, policy reasoning, and trace
per-run graph view for decision flow review

Start it with:

make demo-ui

Then open http://127.0.0.1:5100. Override the port with PORT=5101 make demo-ui if needed.

For the Next.js client:

make ui-install
make ui-dev-local

Then open http://127.0.0.1:3000.

For live demos, use the terminal-first script in docs/live-demo-script.md.

Example scenarios

This repo ships three canonical example flows plus the policy library:

examples/policies/: five real-world example policies (PII redaction, prompt injection, outbound allowlist, cost guard, output length cap)
examples/procurement_harness/: realistic private-alpha procurement harness
examples/safe_agent.py: safe request passes policy checks
examples/rogue_agent.py: unsafe request is flagged while control flow continues
examples/unmatched_route.py: unmatched route fails clearly

Worklog

Create a new implementation review entry with:

make worklog TITLE="Short Task Title"

The template and index live in docs/worklog/.

Repo structure

examples/policies/: real-world example policies
examples/procurement_harness/: primary private-alpha onboarding harness
benchmarks/: reproducible benchmark harness
demo_ui/: shared console API plus legacy HTML review shell
ui/: Next.js operator client over the demo_ui API
mirage/engine.py: policy evaluation, outcomes, and trace writes
mirage/proxy.py: FastAPI CI-mode boundary and Mirage response headers
mirage/gateway.py: runtime gateway against real upstreams
mirage/integrations/: framework integration adapters
mirage/httpx_client.py: Python httpx helper and response assertions
tests/: engine, proxy, gateway, integration, and httpx helper coverage
docs/worklog/: per-task review log for agentic development

Supporting docs

docs/README.md: docs hub for the main Mirage paths
docs/FIRST_INTEGRATION.md: 30-minute walkthrough for integrating your own httpx agent
docs/CI_INTEGRATION.md: pytest and GitHub Actions gating recipes
docs/INTEGRATIONS_OPENAI_AGENTS_SDK.md: OpenAI Agents SDK adapter usage
docs/INTEGRATIONS_LANGCHAIN.md: LangChain adapter usage
docs/INTEGRATIONS_DATABASE.md: SQLAlchemy event-hook pattern for DB-write policy enforcement
docs/OPEN_SOURCE_FAQ.md: practical guidance on MIT licensing and commercial use
examples/procurement_harness/README.md: bundled end-to-end example workflow
examples/policies/README.md: real-world example policies and how to load them
BENCHMARKS.md: benchmark methodology and current numbers
ui/README.md: how the Next.js client consumes the shared console API

Contributing

Bug reports and pull requests are welcome. See CONTRIBUTING.md for the local dev loop and expectations, CODE_OF_CONDUCT.md for community standards, and SECURITY.md for private vulnerability reporting.

Source install

For a development checkout:

git clone https://github.com/ysham123/Mirage
cd Mirage
pip install setuptools wheel
pip install -e '.[dev]'

Or, with the bundled Makefile:

make install

The editable install exposes the mirage console script and the mirage Python package directly from your checkout.

License

Mirage is released under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 2, 2026

0.1.3

Apr 27, 2026

0.1.1

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mirage_ci-0.2.0.tar.gz (106.6 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mirage_ci-0.2.0-py3-none-any.whl (85.9 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file mirage_ci-0.2.0.tar.gz.

File metadata

Download URL: mirage_ci-0.2.0.tar.gz
Upload date: May 2, 2026
Size: 106.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for mirage_ci-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`5821a661ce9786e70334c986a66271876b2f32fd89d7c14e2cda9417b1d7b5f0`
MD5	`ae69cbdade5962ba4085ffc5e6a5aec4`
BLAKE2b-256	`0823093c06e7ccd868127625aaf5ff3b31c39c4c1b537592cdd1009e645dfe24`

See more details on using hashes here.

File details

Details for the file mirage_ci-0.2.0-py3-none-any.whl.

File metadata

Download URL: mirage_ci-0.2.0-py3-none-any.whl
Upload date: May 2, 2026
Size: 85.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for mirage_ci-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`01fd65ba442968e7b41f6821da0e3277f42c6d13eb28e9ba7e0e30d86d1e10ac`
MD5	`7ee796fa4527138afd6bf502779e7bfc`
BLAKE2b-256	`936d18136548d244f3baa7c46019a22de5821bea6c8abe5fc97b8e3da43d24ed`

See more details on using hashes here.

mirage-ci 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mirage

Why Mirage exists

Positioning

How Mirage is different

Direct comparisons

Dev-tool overlap

When not to use Mirage today

See it in 60 seconds

60 seconds, production gateway

60 seconds, CI gate

Policies you can express

Framework integrations

Benchmarks

Start here

What ships today (v0.2.0)

What ships next (v0.3 and beyond)

Gateway forwarding behavior

Quickstart

Integrate your own agent

Try the bundled procurement harness

MirageSession

Alternative: per-response primitives

CI gating

If your agent does not already use httpx

Config

Procurement harness

Action review console

Example scenarios

Worklog

Repo structure

Supporting docs

Contributing

Source install

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

If your agent does not already use `httpx`