Open-source AI agent guardrails linter — Claude Code (incl. Skills), Codex CLI, OpenClaw, MCP
Project description
xrails
Open-source AI agent guardrails linter. Find dangerous permissions, unsafe agent configs, risky skills, and cross-file attack paths before your AI agent runs.
xrails reads the configuration that drives Claude Code (including
Skills), Codex CLI, OpenClaw, and MCP servers,
normalizes it into a typed fact graph, evaluates 40 deterministic rules,
and reports findings — including composite attack paths where individual
settings look fine but combine into something dangerous.
pipx install xrails # or: pip install xrails
xrails scan .
xrails scan — /repo
Profiles: claude-code · Grade: F (28/100) · Attack paths: 1
─────────────────────────────────────────────────────────────
[CRITICAL] XR-AP-EXFIL-001 High-confidence secret exfiltration path
Secret-like file detected
→ Agent can read workspace files
→ Network egress enabled (WebFetch unscoped)
→ Approval prompts disabled (bypassPermissions)
→ Potential data exfiltration
[HIGH] XR-CLAUDE-001 bypassPermissions mode enabled
.claude/settings.json:2
Why: skips approvals for nearly every tool action.
Fix: set defaultMode to "default" or run inside an isolated container.
…
2 critical · 4 high · 1 medium · template-example: 0
- Deterministic. Same input → same findings. No LLM in the pass/fail loop.
- Facts-first. Rules query a typed fact graph, never raw config strings.
- Multi-vendor. Claude Code, Codex CLI, OpenClaw, MCP, and Claude Code
Skills (
SKILL.md) — one CLI. - CI-ready. Stable JSON, SARIF 2.1.0, GitHub Action, suppressions, baselines.
- Attack paths, not just lints. Source → capability → sink correlation finds the misconfigurations that only matter together.
- Runs inside Claude Code. Slash command
/xrailstriages findings using the agent's own LLM — no extra API key needed.
Table of contents
- Why this exists
- Quick start
- Example output
- What xrails detects
- Supported platforms
- Skill scanning
- LLM advisory review (opt-in)
- Run inside Claude Code (no extra API key)
- Privacy and data handling
- Why facts, not regex
- Attack-path example
- GitHub Action
- LLM explain mode (optional)
- Suppressions and baselines
- Authoring rules
- How is this different from AgentShield?
- Roadmap
- Responsible disclosure
- License
Why this exists
AI coding agents ship with broad permissions by default. A single setting —
bypassPermissions, Bash(*), danger-full-access, an unpinned
/var/run/docker.sock mount — can let prompt injection from a web page or a
README turn into shell execution, secret exfiltration, or destructive git
operations.
The risk usually isn't one bad setting. It's the combination: a secret file plus filesystem read plus an outbound channel plus a weak approval policy. Each looks fine alone. Together they're a complete exfiltration path.
xrails is a static analysis tool that finds both atomic misconfigurations
and these composite paths, before the agent runs.
Quick start
# Install (pipx is preferred — keeps xrails out of your project venv)
pipx install xrails
# or:
pip install xrails
# Scan the current directory
xrails scan .
# Or aim it at a specific config dir
xrails scan ./examples/vulnerable/claude_code
# JSON for tooling, SARIF for GitHub code scanning
xrails scan . --format json --output xrails.json
xrails scan . --format sarif --output xrails.sarif
Try the demo configs that ship with the repo:
git clone https://github.com/xrails-ai/xrails
cd xrails
xrails scan examples/vulnerable/claude_code # expect findings
xrails scan examples/secure/claude_code # expect clean
xrails scan examples/vulnerable/attack-path-exfil # expect XR-AP-EXFIL-001
Example output
A vulnerable Claude Code config:
xrails scan — examples/vulnerable/claude_code
Profiles: claude-code · Grade: F (10/100) · Attack paths: 1
[HIGH] XR-AP-EXFIL-001 High-confidence secret exfiltration path
[HIGH] XR-CLAUDE-001 bypassPermissions mode enabled
[HIGH] XR-CLAUDE-002 Bash(*) allows unrestricted shell execution
[HIGH] XR-CLAUDE-003 WebFetch allowed without domain scoping
[MEDIUM] XR-CLAUDE-009 No deny rules for .env or credential files
5 findings (0 critical, 4 high, 1 medium, 0 low) · active-runtime: 5
Same scan as JSON:
xrails scan . --format json | jq '.score, .findings[].rule_id'
{ "numeric": 10, "grade": "F", "critical_count": 0, "high_count": 4, ... }
"XR-AP-EXFIL-001"
"XR-CLAUDE-001"
…
What xrails detects
40 rules across six profiles. Severity ladder: critical / high / medium / low / info.
Claude Code (10 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-CLAUDE-001 | HIGH | bypassPermissions mode enabled |
| XR-CLAUDE-002 | HIGH | Bash(*) wildcard with no deny for destructive commands |
| XR-CLAUDE-003 | HIGH | WebFetch without domain scoping |
| XR-CLAUDE-004 | MEDIUM | additionalDirectories includes home directory |
| XR-CLAUDE-005 | MEDIUM | CLAUDE.md instructs the agent to skip permissions |
| XR-CLAUDE-006 | HIGH | Command hooks run with full user permissions |
| XR-CLAUDE-007 | HIGH | HTTP hooks send to remote URLs |
| XR-CLAUDE-008 | MEDIUM | allowedHttpHookUrls unset while hooks enabled |
| XR-CLAUDE-009 | MEDIUM | No deny rules for .env or credential files |
| XR-CLAUDE-010 | MEDIUM | auto mode without managed org constraints |
Codex (8 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-CODEX-001 | HIGH | sandbox_mode = "danger-full-access" |
| XR-CODEX-002 | HIGH | approval_policy = "never" in writable sandbox |
| XR-CODEX-003 | HIGH | Network access enabled in workspace-write sandbox |
| XR-CODEX-004 | MEDIUM | Project config present but possibly untrusted |
| XR-CODEX-005 | HIGH | MCP servers without enterprise allowlist |
| XR-CODEX-006 | HIGH | MCP tool approval overrides are permissive |
| XR-CODEX-007 | MEDIUM | Shell environment policy too permissive |
| XR-CODEX-008 | MEDIUM | otel.log_user_prompt = true |
OpenClaw (7 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-OPENCLAW-001 | HIGH | Sandboxing disabled — tools run on host |
| XR-OPENCLAW-002 | HIGH | Workspace treated as boundary but sandbox off |
| XR-OPENCLAW-003 | HIGH | Docker bind mount without :ro |
| XR-OPENCLAW-004 | CRITICAL | /var/run/docker.sock bind mount |
| XR-OPENCLAW-005 | MEDIUM | Broad tool groups in allow list |
| XR-OPENCLAW-006 | MEDIUM | sandbox.mode = non-main |
| XR-OPENCLAW-007 | MEDIUM | .env with secrets in agent workspace |
MCP (2 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-MCP-001 | HIGH | MCP server + weak approvals + network egress |
| XR-MCP-002 | MEDIUM | Remote MCP transport without identity pinning |
Skills (5 rules + 1 composite)
| Rule | Sev | What it detects |
|---|---|---|
| XR-SKILL-001 | HIGH | SKILL.md allowed-tools includes bare wildcards (Bash, Read, WebFetch) |
| XR-SKILL-002 | HIGH | Skill body contains shell-exfil patterns (curl --data, nc <host>, base64 | curl) |
| XR-SKILL-003 | MEDIUM | Skill body references credential paths (~/.aws, ~/.ssh, .env) |
| XR-SKILL-004 | MEDIUM | Skill body uses approval-bypass language ("skip permissions", "dangerously") |
| XR-SKILL-005 | MEDIUM | Third-party skill loaded from a plugin / vendored tree without pinning |
| XR-AP-SKILL-EXFIL-001 | HIGH | Composite: one or more skills read secrets AND one or more emit network traffic |
Composite attack paths (7 rules)
| Rule | Sev | What it detects |
|---|---|---|
| XR-AP-EXFIL-001 | HIGH | Secret + read + egress + bypass = exfiltration path |
| XR-AP-DESTRUCT-001 | HIGH | Shell + bypass = destructive git ops |
| XR-AP-DRIFT-001 | MEDIUM | Effective policy ≠ intended config |
| XR-AP-HOOK-EXFIL-001 | HIGH | Hooks + unrestricted URLs + secrets |
| XR-AP-MCP-SUPPLYCHAIN-001 | HIGH | MCP auto-install without version pinning |
| XR-AP-ADDDIR-BYPASS-001 | HIGH | bypassPermissions + home dir in additionalDirectories |
| XR-AP-SANDBOX-DRIFT-001 | HIGH | Sandbox configured but not effective |
xrails rules list browses every installed rule with profile and severity
filters; xrails rules validate checks the YAML catalog against the schema.
Supported platforms
| Agent | Files xrails reads | Notes |
|---|---|---|
| Claude Code | .claude/settings.json, .claude/settings.local.json, .mcp.json, CLAUDE.md, .claude/CLAUDE.md |
Permission rules, hooks, MCP servers, instruction files |
| Codex CLI | .codex/config.toml |
Sandbox modes, approvals, MCP, telemetry |
| OpenClaw | openclaw.json |
Sandbox, Docker binds, tool policy |
| MCP | Servers declared inside any of the above | Remote/local transport, approval overrides |
| Skills | .claude/skills/*/SKILL.md, plugin/vendored skills |
Frontmatter allowed-tools + body heuristics (shell exfil, sensitive paths, bypass language) |
| All | .env, .env.* |
Secret-like keys for attack-path correlation |
Findings inside examples/, fixtures/, docs/, samples/, etc. are
classified as template-example and scored at a discount, so a repo that
ships demo configs won't produce misleading grades.
Skill scanning
xrails reads every SKILL.md under .claude/skills/, plugin trees, and
vendored skill directories. Six rules detect:
- broad
allowed-tools(bareBash,Read,WebFetch) - shell exfil patterns in the body (
curl --data,nc <host>,base64 | curl) - references to credential paths (
~/.aws,~/.ssh,.env) - approval-bypass language ("skip permissions", "dangerously")
- third-party skills loaded without version pinning
- a composite path: skill reads secrets ∧ skill emits network traffic
xrails scan examples/vulnerable/risky-skill
Output (abridged):
[HIGH] XR-AP-SKILL-EXFIL-001 Skill exfiltration path
[HIGH] XR-SKILL-001 Skill declares broad allowed-tools
[HIGH] XR-SKILL-002 Skill body contains shell exfil patterns
[MEDIUM] XR-SKILL-003 Skill body references sensitive paths
[MEDIUM] XR-SKILL-004 Skill body uses approval-bypass language
LLM advisory review (opt-in)
Run an LLM advisory pass against a SKILL.md or a directory of them:
# Default offline — no API key required
xrails review-skill ./.claude/skills
# Use a real provider
export OPENAI_API_KEY=sk-...
xrails review-skill ./.claude/skills --llm-provider openai
# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
xrails review-skill ./.claude/skills --llm-provider anthropic
This output is advisory, not a finding. It does not affect exit code,
score, or SARIF, and is clearly labelled in the terminal. Skill body is
redacted (secret-like keys, long opaque tokens) before any prompt is
assembled. The deterministic scanner (xrails scan) remains the source
of truth in CI.
Run inside Claude Code (no extra API key)
xrails ships a slash command and an auto-invoke skill for Claude Code:
xrails install-agent --scope project # writes into ./.claude/
# or:
xrails install-agent --scope user # writes into ~/.claude/
Then in Claude Code:
- Type
/xrailsto run a triage on the current repo. - Or ask Claude "audit my agent setup" — the
xrailsskill auto-invokes when context fits.
The integration uses Claude's own LLM for triage. xrails provides the
deterministic findings (via xrails scan and xrails review-skill
JSON output); Claude reasons over them with project context. No
separate API-key configuration needed.
The slash command and skill both pin allowed-tools to
Bash(xrails *), Bash(which xrails), and Read — Claude cannot run
arbitrary shell or write files from this integration.
Privacy and data handling
xrails is built so the deterministic scan runs entirely on your machine.
| Surface | Default behavior | Network calls? | Notes |
|---|---|---|---|
xrails scan |
Local | No | Reads files, evaluates rules, writes report. |
xrails review-skill |
echo provider (offline) |
No | Heuristic pseudo-review from the parser. |
xrails review-skill --llm-provider openai/anthropic |
Opt-in | Yes | Body redacted before prompt. |
xrails explain --llm-provider openai/anthropic |
Opt-in | Yes | Evidence redacted before prompt. |
/xrails slash command in Claude Code |
Uses host LLM | Inherits Claude Code's network policy | xrails subprocess itself stays local. |
Concrete guarantees, all enforced by tests:
xrails scannever importsopenai,anthropic, or any HTTP client..envparsing stores onlyvalue_present: bool— raw secret values never reach reporters, logs, JSON, or SARIF.- Redaction (strip secret-like keys + long opaque tokens) runs on every byte of evidence before any LLM provider receives a prompt.
ReviewSuggestionis a separate type fromFinding— it cannot enter exit code, score, SARIF, or--fail-ondecisions.- Provider SDKs are imported lazily inside
_get_client()and only when a non-echoprovider is explicitly selected.
If any of these claims look wrong, please open a security issue per SECURITY.md.
Why facts, not regex
Most "AI security" linters are regex over config files. That breaks fast: JSON keys reorder, TOML tables nest, YAML aliases hide things. Worse, regex can't see combinations.
xrails parses each config file with a real parser (tomllib, json,
PyYAML), normalizes the result into a typed FactGraph, and evaluates rules
against facts. Two consequences:
- Composite findings are first-class. A secret-bearing
.env, a filesystem capability, a network capability, and a weak approval policy are four independent facts. The XR-AP-EXFIL rule fires only when all four coexist. - Rules become portable. New parsers add new facts; rules that query
network.enabled == trueorapproval.policy == neverwork across vendors without rewriting.
The rule format is YAML — most rules are 20 lines and require no Python. See docs/rule-authoring.md.
Attack-path example
examples/vulnerable/attack-path-exfil/ ships a demo with four conditions
that are individually plausible but together unsafe:
Source → Capability → Capability → Sink
[.env with API keys]
↓
[Read access (no .env deny)]
↓
[WebFetch unscoped → outbound HTTP]
↓
[bypassPermissions disables prompts]
↓
[Network exfiltration sink]
xrails scan examples/vulnerable/attack-path-exfil
# → XR-AP-EXFIL-001 fires + the underlying atomic findings.
GitHub Action
name: xrails
on: [pull_request, push]
jobs:
xrails:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install xrails
- run: xrails scan . --format sarif --output xrails.sarif --fail-on high
- if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: xrails.sarif
category: xrails
Findings appear in the Security tab of the repository. Full guide and reusable action recipe: docs/github-action.md.
Run inside Claude Code (no extra API key)
xrails ships a slash command and a skill for Claude Code. Install once:
xrails install-agent --scope project # writes into ./.claude/
# or:
xrails install-agent --scope user # writes into ~/.claude/
Then in Claude Code:
- Type
/xrailsto run a triage on the current repo. - Or ask Claude "audit my agent setup" — the
xrailsskill auto-invokes when context fits.
The integration uses Claude's own LLM for triage. xrails provides the
deterministic findings (via xrails scan and xrails review-skill
JSON output); Claude reasons over them with project context. No
separate API-key configuration needed.
LLM explain mode (optional)
xrails never uses an LLM to decide whether a finding is real. The optional
explain command rewrites a single finding's evidence into prose:
xrails scan . --format json --output report.json
xrails explain --report report.json --id XR-CLAUDE-001 --llm-provider echo
echo is the default and runs entirely offline. openai and anthropic
providers are loaded lazily and only used when explicitly requested. All
evidence is run through a redactor that strips secret-like keys and long
opaque tokens before any prompt is assembled.
Suppressions and baselines
Adopt xrails on a legacy repo without being blocked by pre-existing findings:
xrails scan . --format json --output .xrails-baseline.json
xrails scan . --baseline .xrails-baseline.json # only NEW findings break the build
Suppress one specific finding for one specific path with audit metadata in
.xrails-suppressions.yml:
suppressions:
- rule_id: XR-CLAUDE-009
path: examples/**
reason: demo fixture for documentation
expires: 2026-12-31
Suppressions carry a reason and an expiry. Expired suppressions stop applying automatically — no silently-stale ignore lists.
Authoring rules
A rule is a YAML file:
schema_version: "2.0"
id: XR-CODEX-001
title: sandbox_mode = danger-full-access
category: excessive_privilege
severity: high
confidence: high
profiles: [codex]
facts:
all:
- fact: sandbox.mode
op: eq
value: danger-full-access
remediation:
steps:
- Use sandbox_mode = "workspace-write" with approval_policy = "on-request".
references:
- title: Codex — Agent approvals & security
url: https://developers.openai.com/codex/agent-approvals-security
cwe: [CWE-269 Improper Privilege Management]
Drop it into src/xrails/rules/<vendor>/, add trigger/safe fixtures under
tests/fixtures/<vendor>/, run xrails rules validate and xrails rules test. No Python required for atomic rules.
Detailed walkthrough including operators (eq, contains, regex,
matches_any, present, absent, gt/gte/lt/lte, in/not_in) and the
fact catalog: docs/rule-authoring.md.
How is this different from AgentShield?
AgentShield is the closest existing tool and it's good — we share the underlying motivation that AI agent misconfigurations need real auditing. The differences are scope and architecture:
| AgentShield | xrails | |
|---|---|---|
| Primary scope | Claude Code | Claude Code (incl. Skills), Codex CLI, OpenClaw, MCP |
| Engine model | Pattern + scoring | Typed fact graph + DSL evaluator |
| Composite findings | Implicit via scoring | First-class attack-path rules over a correlation graph |
| Rule format | Python | YAML DSL (Python only for graph rules) |
| LLM role | None | Optional, opt-in: xrails explain, xrails review-skill |
| Claude Code integration | — | single xrails skill (acts as both /xrails slash command and auto-invoke) |
| CI artifacts | JSON / SARIF | JSON, SARIF, GitHub Action, baseline, suppressions |
We respect AgentShield's prior art — runtimeConfidence and the
discovery-skip patterns are clearly inspired by it.
Roadmap
- v0.3 (current, beta) — Skill scanning (
SKILL.md), LLM advisory review (xrails review-skill), Claude Code slash command + skill (/xrailsskill), 40 rules, 312 tests. - v0.4 — additional vendors (Cline, Cursor agents, Aider), expanded MCP identity rules, OPA/Conftest export for org-level policy packs.
- v0.5 — VS Code extension surfacing findings inline; public benchmark
report pipeline; rule packs (
xrails-rules-enterprise).
Responsible disclosure
Security issues: see SECURITY.md. Public benchmark scans follow the rules in benchmarks/README.md — no raw secrets, no live exploit chains, no naming projects with critical live exposures before private outreach.
License
xrails is part of GuardrailsAI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xrails-0.3.0.tar.gz.
File metadata
- Download URL: xrails-0.3.0.tar.gz
- Upload date:
- Size: 243.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54e015c46bfb8249571553af34208dea38e062ce5b68fe66df5788b6b56e4d5d
|
|
| MD5 |
8a7a8a3246d7d0663b4ab1ed775bbf81
|
|
| BLAKE2b-256 |
63b8720720506bbb151e46d5958964fb93e8aa1e75f40ebf417488b22ddc146e
|
File details
Details for the file xrails-0.3.0-py3-none-any.whl.
File metadata
- Download URL: xrails-0.3.0-py3-none-any.whl
- Upload date:
- Size: 140.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
694cfb003438d82e33671a6a40951bc98ecf5e4992443b5718f42635658eff0a
|
|
| MD5 |
d5561e27420bdef5502731f4775dea40
|
|
| BLAKE2b-256 |
35e2319745e4d8bd97533fef2209b1d727450502af76fae10a34d179a72c4c2d
|