Open-source AI agent guardrails linter — Claude Code, Codex CLI, OpenClaw, MCP
Project description
xrails
Open-source AI agent guardrails linter. Find dangerous permissions, unsafe agent configs, and cross-file attack paths before your AI agent runs.
xrails reads the configuration that drives Claude Code, Codex CLI, OpenClaw,
and MCP servers, normalizes it into a typed fact graph, evaluates 34
deterministic rules, and reports findings — including composite attack paths
where individual settings look fine but combine into something dangerous.
pipx install xrails # or: pip install xrails
xrails scan .
xrails scan — /repo
Profiles: claude-code · Grade: F (28/100) · Attack paths: 1
─────────────────────────────────────────────────────────────
[CRITICAL] XR-AP-EXFIL-001 High-confidence secret exfiltration path
Secret-like file detected
→ Agent can read workspace files
→ Network egress enabled (WebFetch unscoped)
→ Approval prompts disabled (bypassPermissions)
→ Potential data exfiltration
[HIGH] XR-CLAUDE-001 bypassPermissions mode enabled
.claude/settings.json:2
Why: skips approvals for nearly every tool action.
Fix: set defaultMode to "default" or run inside an isolated container.
…
2 critical · 4 high · 1 medium · template-example: 0
- Deterministic. Same input → same findings. No LLM in the pass/fail loop.
- Facts-first. Rules query a typed fact graph, never raw config strings.
- Multi-vendor. Claude Code, Codex CLI, OpenClaw, MCP — one CLI.
- CI-ready. Stable JSON, SARIF 2.1.0, GitHub Action, suppressions, baselines.
- Attack paths, not just lints. Source → capability → sink correlation finds the misconfigurations that only matter together.
Table of contents
- Why this exists
- Quick start
- Example output
- What xrails detects
- Supported platforms
- Why facts, not regex
- Attack-path example
- GitHub Action
- LLM explain mode (optional)
- Suppressions and baselines
- Authoring rules
- How is this different from AgentShield?
- Roadmap
- Responsible disclosure
- License
Why this exists
AI coding agents ship with broad permissions by default. A single setting —
bypassPermissions, Bash(*), danger-full-access, an unpinned
/var/run/docker.sock mount — can let prompt injection from a web page or a
README turn into shell execution, secret exfiltration, or destructive git
operations.
The risk usually isn't one bad setting. It's the combination: a secret file plus filesystem read plus an outbound channel plus a weak approval policy. Each looks fine alone. Together they're a complete exfiltration path.
xrails is a static analysis tool that finds both atomic misconfigurations
and these composite paths, before the agent runs.
Quick start
# Install (pipx is preferred — keeps xrails out of your project venv)
pipx install xrails
# or:
pip install xrails
# Scan the current directory
xrails scan .
# Or aim it at a specific config dir
xrails scan ./examples/vulnerable/claude_code
# JSON for tooling, SARIF for GitHub code scanning
xrails scan . --format json --output xrails.json
xrails scan . --format sarif --output xrails.sarif
Try the demo configs that ship with the repo:
git clone https://github.com/guardrailsai/xrails
cd xrails
xrails scan examples/vulnerable/claude_code # expect findings
xrails scan examples/secure/claude_code # expect clean
xrails scan examples/vulnerable/attack-path-exfil # expect XR-AP-EXFIL-001
Example output
A vulnerable Claude Code config:
xrails scan — examples/vulnerable/claude_code
Profiles: claude-code · Grade: F (10/100) · Attack paths: 1
[HIGH] XR-AP-EXFIL-001 High-confidence secret exfiltration path
[HIGH] XR-CLAUDE-001 bypassPermissions mode enabled
[HIGH] XR-CLAUDE-002 Bash(*) allows unrestricted shell execution
[HIGH] XR-CLAUDE-003 WebFetch allowed without domain scoping
[MEDIUM] XR-CLAUDE-009 No deny rules for .env or credential files
5 findings (0 critical, 4 high, 1 medium, 0 low) · active-runtime: 5
Same scan as JSON:
xrails scan . --format json | jq '.score, .findings[].rule_id'
{ "numeric": 10, "grade": "F", "critical_count": 0, "high_count": 4, ... }
"XR-AP-EXFIL-001"
"XR-CLAUDE-001"
…
What xrails detects
34 rules across five profiles. Severity ladder: critical / high / medium / low / info.
Claude Code (10 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-CLAUDE-001 | HIGH | bypassPermissions mode enabled |
| XR-CLAUDE-002 | HIGH | Bash(*) wildcard with no deny for destructive commands |
| XR-CLAUDE-003 | HIGH | WebFetch without domain scoping |
| XR-CLAUDE-004 | MEDIUM | additionalDirectories includes home directory |
| XR-CLAUDE-005 | MEDIUM | CLAUDE.md instructs the agent to skip permissions |
| XR-CLAUDE-006 | HIGH | Command hooks run with full user permissions |
| XR-CLAUDE-007 | HIGH | HTTP hooks send to remote URLs |
| XR-CLAUDE-008 | MEDIUM | allowedHttpHookUrls unset while hooks enabled |
| XR-CLAUDE-009 | MEDIUM | No deny rules for .env or credential files |
| XR-CLAUDE-010 | MEDIUM | auto mode without managed org constraints |
Codex (8 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-CODEX-001 | HIGH | sandbox_mode = "danger-full-access" |
| XR-CODEX-002 | HIGH | approval_policy = "never" in writable sandbox |
| XR-CODEX-003 | HIGH | Network access enabled in workspace-write sandbox |
| XR-CODEX-004 | MEDIUM | Project config present but possibly untrusted |
| XR-CODEX-005 | HIGH | MCP servers without enterprise allowlist |
| XR-CODEX-006 | HIGH | MCP tool approval overrides are permissive |
| XR-CODEX-007 | MEDIUM | Shell environment policy too permissive |
| XR-CODEX-008 | MEDIUM | otel.log_user_prompt = true |
OpenClaw (7 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-OPENCLAW-001 | HIGH | Sandboxing disabled — tools run on host |
| XR-OPENCLAW-002 | HIGH | Workspace treated as boundary but sandbox off |
| XR-OPENCLAW-003 | HIGH | Docker bind mount without :ro |
| XR-OPENCLAW-004 | CRITICAL | /var/run/docker.sock bind mount |
| XR-OPENCLAW-005 | MEDIUM | Broad tool groups in allow list |
| XR-OPENCLAW-006 | MEDIUM | sandbox.mode = non-main |
| XR-OPENCLAW-007 | MEDIUM | .env with secrets in agent workspace |
MCP (2 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-MCP-001 | HIGH | MCP server + weak approvals + network egress |
| XR-MCP-002 | MEDIUM | Remote MCP transport without identity pinning |
Composite attack paths (7 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-AP-EXFIL-001 | HIGH | Secret + read + egress + bypass = exfiltration path |
| XR-AP-DESTRUCT-001 | HIGH | Shell + bypass = destructive git ops |
| XR-AP-DRIFT-001 | MEDIUM | Effective policy ≠ intended config |
| XR-AP-HOOK-EXFIL-001 | HIGH | Hooks + unrestricted URLs + secrets |
| XR-AP-MCP-SUPPLYCHAIN-001 | HIGH | MCP auto-install without version pinning |
| XR-AP-ADDDIR-BYPASS-001 | HIGH | bypassPermissions + home dir in additionalDirectories |
| XR-AP-SANDBOX-DRIFT-001 | HIGH | Sandbox configured but not effective |
xrails rules list browses every installed rule with profile and severity
filters; xrails rules validate checks the YAML catalog against the schema.
Supported platforms
| Agent | Files xrails reads | Notes |
|---|---|---|
| Claude Code | .claude/settings.json, .claude/settings.local.json, .mcp.json, CLAUDE.md, .claude/CLAUDE.md |
Permission rules, hooks, MCP servers, instruction files |
| Codex CLI | .codex/config.toml |
Sandbox modes, approvals, MCP, telemetry |
| OpenClaw | openclaw.json |
Sandbox, Docker binds, tool policy |
| MCP | Servers declared inside any of the above | Remote/local transport, approval overrides |
| All | .env, .env.* |
Secret-like keys for attack-path correlation |
Findings inside examples/, fixtures/, docs/, samples/, etc. are
classified as template-example and scored at a discount, so a repo that
ships demo configs won't produce misleading grades.
Why facts, not regex
Most "AI security" linters are regex over config files. That breaks fast: JSON keys reorder, TOML tables nest, YAML aliases hide things. Worse, regex can't see combinations.
xrails parses each config file with a real parser (tomllib, json,
PyYAML), normalizes the result into a typed FactGraph, and evaluates rules
against facts. Two consequences:
- Composite findings are first-class. A secret-bearing
.env, a filesystem capability, a network capability, and a weak approval policy are four independent facts. The XR-AP-EXFIL rule fires only when all four coexist. - Rules become portable. New parsers add new facts; rules that query
network.enabled == trueorapproval.policy == neverwork across vendors without rewriting.
The rule format is YAML — most rules are 20 lines and require no Python. See docs/rule-authoring.md.
Attack-path example
examples/vulnerable/attack-path-exfil/ ships a demo with four conditions
that are individually plausible but together unsafe:
Source → Capability → Capability → Sink
[.env with API keys]
↓
[Read access (no .env deny)]
↓
[WebFetch unscoped → outbound HTTP]
↓
[bypassPermissions disables prompts]
↓
[Network exfiltration sink]
xrails scan examples/vulnerable/attack-path-exfil
# → XR-AP-EXFIL-001 fires + the underlying atomic findings.
GitHub Action
name: xrails
on: [pull_request, push]
jobs:
xrails:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install xrails
- run: xrails scan . --format sarif --output xrails.sarif --fail-on high
- if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: xrails.sarif
category: xrails
Findings appear in the Security tab of the repository. Full guide and reusable action recipe: docs/github-action.md.
LLM explain mode (optional)
xrails never uses an LLM to decide whether a finding is real. The optional
explain command rewrites a single finding's evidence into prose:
xrails scan . --format json --output report.json
xrails explain --report report.json --id XR-CLAUDE-001 --llm-provider echo
echo is the default and runs entirely offline. openai and anthropic
providers are loaded lazily and only used when explicitly requested. All
evidence is run through a redactor that strips secret-like keys and long
opaque tokens before any prompt is assembled.
Suppressions and baselines
Adopt xrails on a legacy repo without being blocked by pre-existing findings:
xrails scan . --format json --output .xrails-baseline.json
xrails scan . --baseline .xrails-baseline.json # only NEW findings break the build
Suppress one specific finding for one specific path with audit metadata in
.xrails-suppressions.yml:
suppressions:
- rule_id: XR-CLAUDE-009
path: examples/**
reason: demo fixture for documentation
expires: 2026-12-31
Suppressions carry a reason and an expiry. Expired suppressions stop applying automatically — no silently-stale ignore lists.
Authoring rules
A rule is a YAML file:
schema_version: "2.0"
id: XR-CODEX-001
title: sandbox_mode = danger-full-access
category: excessive_privilege
severity: high
confidence: high
profiles: [codex]
facts:
all:
- fact: sandbox.mode
op: eq
value: danger-full-access
remediation:
steps:
- Use sandbox_mode = "workspace-write" with approval_policy = "on-request".
references:
- title: Codex — Agent approvals & security
url: https://developers.openai.com/codex/agent-approvals-security
cwe: [CWE-269 Improper Privilege Management]
Drop it into src/xrails/rules/<vendor>/, add trigger/safe fixtures under
tests/fixtures/<vendor>/, run xrails rules validate and xrails rules test. No Python required for atomic rules.
Detailed walkthrough including operators (eq, contains, regex,
matches_any, present, absent, gt/gte/lt/lte, in/not_in) and the
fact catalog: docs/rule-authoring.md.
How is this different from AgentShield?
AgentShield is the closest existing tool and it's good — we share the underlying motivation that AI agent misconfigurations need real auditing. The differences are scope and architecture:
| AgentShield | xrails | |
|---|---|---|
| Primary scope | Claude Code | Claude Code, Codex CLI, OpenClaw, MCP |
| Engine model | Pattern + scoring | Typed fact graph + DSL evaluator |
| Composite findings | Implicit via scoring | First-class attack-path rules over a correlation graph |
| Rule format | Python | YAML DSL (Python only for graph rules) |
| LLM role | None | Optional explain layer, opt-in |
| CI artifacts | JSON / SARIF | JSON, SARIF, GitHub Action, baseline, suppressions |
We respect AgentShield's prior art — runtimeConfidence and the
discovery-skip patterns are clearly inspired by it.
Roadmap
- v0.2 — current. Multi-vendor parity, schema v2, correlation engine, baseline + suppressions, GitHub Action, demo + benchmark scaffolding.
- v0.3 — additional vendors (Cline, Cursor agents, Aider), expanded MCP identity rules, OPA/Conftest export for org-level policy packs.
- v0.4 — VS Code extension surfacing findings inline; benchmark report
pipeline; rule packs (
xrails-rules-enterprise).
Responsible disclosure
Security issues: see SECURITY.md. Public benchmark scans follow the rules in benchmarks/README.md — no raw secrets, no live exploit chains, no naming projects with critical live exposures before private outreach.
License
xrails is part of GuardrailsAI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xrails-0.2.0.tar.gz.
File metadata
- Download URL: xrails-0.2.0.tar.gz
- Upload date:
- Size: 205.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
444f1ea5daf99e62fb7e86d5c4bc343d89384f99e91a0c2a552ac0b188c33be3
|
|
| MD5 |
c50028e0467006422824e69fbc4971bc
|
|
| BLAKE2b-256 |
3b2c4d59bccce5fc699d2cda4ce07cddab9a8f6682bae5b04d4c1fb1f45d1bf7
|
File details
Details for the file xrails-0.2.0-py3-none-any.whl.
File metadata
- Download URL: xrails-0.2.0-py3-none-any.whl
- Upload date:
- Size: 112.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b2944385619666fa1ef830e32beadd2e342217d5e482ed8ee237201ab93b505
|
|
| MD5 |
2b7a5dd12f104ad7827bbef69aa32ac9
|
|
| BLAKE2b-256 |
75cc3c5c84223f5568c2f575cfa08e681137d9544536874e921dee76e59cf781
|