Skip to main content

Open-source AI agent guardrails linter — Claude Code, Codex CLI, OpenClaw, MCP

Project description

xrails

Open-source AI agent guardrails linter. Find dangerous permissions, unsafe agent configs, and cross-file attack paths before your AI agent runs.

xrails reads the configuration that drives Claude Code, Codex CLI, OpenClaw, and MCP servers, normalizes it into a typed fact graph, evaluates 34 deterministic rules, and reports findings — including composite attack paths where individual settings look fine but combine into something dangerous.

pipx install xrails    # or: pip install xrails
xrails scan .
xrails scan — /repo
Profiles: claude-code · Grade: F (28/100) · Attack paths: 1
─────────────────────────────────────────────────────────────
[CRITICAL] XR-AP-EXFIL-001  High-confidence secret exfiltration path
  Secret-like file detected
    → Agent can read workspace files
    → Network egress enabled (WebFetch unscoped)
    → Approval prompts disabled (bypassPermissions)
    → Potential data exfiltration

[HIGH]  XR-CLAUDE-001  bypassPermissions mode enabled
  .claude/settings.json:2
  Why: skips approvals for nearly every tool action.
  Fix: set defaultMode to "default" or run inside an isolated container.

…
2 critical · 4 high · 1 medium · template-example: 0
  • Deterministic. Same input → same findings. No LLM in the pass/fail loop.
  • Facts-first. Rules query a typed fact graph, never raw config strings.
  • Multi-vendor. Claude Code, Codex CLI, OpenClaw, MCP — one CLI.
  • CI-ready. Stable JSON, SARIF 2.1.0, GitHub Action, suppressions, baselines.
  • Attack paths, not just lints. Source → capability → sink correlation finds the misconfigurations that only matter together.

Table of contents


Why this exists

AI coding agents ship with broad permissions by default. A single setting — bypassPermissions, Bash(*), danger-full-access, an unpinned /var/run/docker.sock mount — can let prompt injection from a web page or a README turn into shell execution, secret exfiltration, or destructive git operations.

The risk usually isn't one bad setting. It's the combination: a secret file plus filesystem read plus an outbound channel plus a weak approval policy. Each looks fine alone. Together they're a complete exfiltration path.

xrails is a static analysis tool that finds both atomic misconfigurations and these composite paths, before the agent runs.

Quick start

# Install (pipx is preferred — keeps xrails out of your project venv)
pipx install xrails
# or:
pip install xrails

# Scan the current directory
xrails scan .

# Or aim it at a specific config dir
xrails scan ./examples/vulnerable/claude_code

# JSON for tooling, SARIF for GitHub code scanning
xrails scan . --format json   --output xrails.json
xrails scan . --format sarif  --output xrails.sarif

Try the demo configs that ship with the repo:

git clone https://github.com/guardrailsai/xrails
cd xrails

xrails scan examples/vulnerable/claude_code     # expect findings
xrails scan examples/secure/claude_code         # expect clean
xrails scan examples/vulnerable/attack-path-exfil   # expect XR-AP-EXFIL-001

Example output

A vulnerable Claude Code config:

xrails scan — examples/vulnerable/claude_code
Profiles: claude-code · Grade: F (10/100) · Attack paths: 1

[HIGH]     XR-AP-EXFIL-001  High-confidence secret exfiltration path
[HIGH]     XR-CLAUDE-001    bypassPermissions mode enabled
[HIGH]     XR-CLAUDE-002    Bash(*) allows unrestricted shell execution
[HIGH]     XR-CLAUDE-003    WebFetch allowed without domain scoping
[MEDIUM]   XR-CLAUDE-009    No deny rules for .env or credential files

5 findings (0 critical, 4 high, 1 medium, 0 low) · active-runtime: 5

Same scan as JSON:

xrails scan . --format json | jq '.score, .findings[].rule_id'
{ "numeric": 10, "grade": "F", "critical_count": 0, "high_count": 4, ... }
"XR-AP-EXFIL-001"
"XR-CLAUDE-001"

What xrails detects

34 rules across five profiles. Severity ladder: critical / high / medium / low / info.

Claude Code (10 rules)
Rule Sev What it detects
XR-CLAUDE-001 HIGH bypassPermissions mode enabled
XR-CLAUDE-002 HIGH Bash(*) wildcard with no deny for destructive commands
XR-CLAUDE-003 HIGH WebFetch without domain scoping
XR-CLAUDE-004 MEDIUM additionalDirectories includes home directory
XR-CLAUDE-005 MEDIUM CLAUDE.md instructs the agent to skip permissions
XR-CLAUDE-006 HIGH Command hooks run with full user permissions
XR-CLAUDE-007 HIGH HTTP hooks send to remote URLs
XR-CLAUDE-008 MEDIUM allowedHttpHookUrls unset while hooks enabled
XR-CLAUDE-009 MEDIUM No deny rules for .env or credential files
XR-CLAUDE-010 MEDIUM auto mode without managed org constraints
Codex (8 rules)
Rule Sev What it detects
XR-CODEX-001 HIGH sandbox_mode = "danger-full-access"
XR-CODEX-002 HIGH approval_policy = "never" in writable sandbox
XR-CODEX-003 HIGH Network access enabled in workspace-write sandbox
XR-CODEX-004 MEDIUM Project config present but possibly untrusted
XR-CODEX-005 HIGH MCP servers without enterprise allowlist
XR-CODEX-006 HIGH MCP tool approval overrides are permissive
XR-CODEX-007 MEDIUM Shell environment policy too permissive
XR-CODEX-008 MEDIUM otel.log_user_prompt = true
OpenClaw (7 rules)
Rule Sev What it detects
XR-OPENCLAW-001 HIGH Sandboxing disabled — tools run on host
XR-OPENCLAW-002 HIGH Workspace treated as boundary but sandbox off
XR-OPENCLAW-003 HIGH Docker bind mount without :ro
XR-OPENCLAW-004 CRITICAL /var/run/docker.sock bind mount
XR-OPENCLAW-005 MEDIUM Broad tool groups in allow list
XR-OPENCLAW-006 MEDIUM sandbox.mode = non-main
XR-OPENCLAW-007 MEDIUM .env with secrets in agent workspace
MCP (2 rules)
Rule Sev What it detects
XR-MCP-001 HIGH MCP server + weak approvals + network egress
XR-MCP-002 MEDIUM Remote MCP transport without identity pinning
Composite attack paths (7 rules)
Rule Sev What it detects
XR-AP-EXFIL-001 HIGH Secret + read + egress + bypass = exfiltration path
XR-AP-DESTRUCT-001 HIGH Shell + bypass = destructive git ops
XR-AP-DRIFT-001 MEDIUM Effective policy ≠ intended config
XR-AP-HOOK-EXFIL-001 HIGH Hooks + unrestricted URLs + secrets
XR-AP-MCP-SUPPLYCHAIN-001 HIGH MCP auto-install without version pinning
XR-AP-ADDDIR-BYPASS-001 HIGH bypassPermissions + home dir in additionalDirectories
XR-AP-SANDBOX-DRIFT-001 HIGH Sandbox configured but not effective

xrails rules list browses every installed rule with profile and severity filters; xrails rules validate checks the YAML catalog against the schema.

Supported platforms

Agent Files xrails reads Notes
Claude Code .claude/settings.json, .claude/settings.local.json, .mcp.json, CLAUDE.md, .claude/CLAUDE.md Permission rules, hooks, MCP servers, instruction files
Codex CLI .codex/config.toml Sandbox modes, approvals, MCP, telemetry
OpenClaw openclaw.json Sandbox, Docker binds, tool policy
MCP Servers declared inside any of the above Remote/local transport, approval overrides
All .env, .env.* Secret-like keys for attack-path correlation

Findings inside examples/, fixtures/, docs/, samples/, etc. are classified as template-example and scored at a discount, so a repo that ships demo configs won't produce misleading grades.

Why facts, not regex

Most "AI security" linters are regex over config files. That breaks fast: JSON keys reorder, TOML tables nest, YAML aliases hide things. Worse, regex can't see combinations.

xrails parses each config file with a real parser (tomllib, json, PyYAML), normalizes the result into a typed FactGraph, and evaluates rules against facts. Two consequences:

  1. Composite findings are first-class. A secret-bearing .env, a filesystem capability, a network capability, and a weak approval policy are four independent facts. The XR-AP-EXFIL rule fires only when all four coexist.
  2. Rules become portable. New parsers add new facts; rules that query network.enabled == true or approval.policy == never work across vendors without rewriting.

The rule format is YAML — most rules are 20 lines and require no Python. See docs/rule-authoring.md.

Attack-path example

examples/vulnerable/attack-path-exfil/ ships a demo with four conditions that are individually plausible but together unsafe:

Source → Capability → Capability → Sink

[.env with API keys]
       ↓
[Read access (no .env deny)]
       ↓
[WebFetch unscoped → outbound HTTP]
       ↓
[bypassPermissions disables prompts]
       ↓
[Network exfiltration sink]
xrails scan examples/vulnerable/attack-path-exfil
# → XR-AP-EXFIL-001 fires + the underlying atomic findings.

GitHub Action

name: xrails
on: [pull_request, push]
jobs:
  xrails:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install xrails
      - run: xrails scan . --format sarif --output xrails.sarif --fail-on high
      - if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: xrails.sarif
          category: xrails

Findings appear in the Security tab of the repository. Full guide and reusable action recipe: docs/github-action.md.

LLM explain mode (optional)

xrails never uses an LLM to decide whether a finding is real. The optional explain command rewrites a single finding's evidence into prose:

xrails scan . --format json --output report.json
xrails explain --report report.json --id XR-CLAUDE-001 --llm-provider echo

echo is the default and runs entirely offline. openai and anthropic providers are loaded lazily and only used when explicitly requested. All evidence is run through a redactor that strips secret-like keys and long opaque tokens before any prompt is assembled.

Suppressions and baselines

Adopt xrails on a legacy repo without being blocked by pre-existing findings:

xrails scan . --format json --output .xrails-baseline.json
xrails scan . --baseline .xrails-baseline.json   # only NEW findings break the build

Suppress one specific finding for one specific path with audit metadata in .xrails-suppressions.yml:

suppressions:
  - rule_id: XR-CLAUDE-009
    path: examples/**
    reason: demo fixture for documentation
    expires: 2026-12-31

Suppressions carry a reason and an expiry. Expired suppressions stop applying automatically — no silently-stale ignore lists.

Authoring rules

A rule is a YAML file:

schema_version: "2.0"
id: XR-CODEX-001
title: sandbox_mode = danger-full-access
category: excessive_privilege
severity: high
confidence: high
profiles: [codex]
facts:
  all:
    - fact: sandbox.mode
      op: eq
      value: danger-full-access
remediation:
  steps:
    - Use sandbox_mode = "workspace-write" with approval_policy = "on-request".
  references:
    - title: Codex — Agent approvals & security
      url: https://developers.openai.com/codex/agent-approvals-security
cwe: [CWE-269 Improper Privilege Management]

Drop it into src/xrails/rules/<vendor>/, add trigger/safe fixtures under tests/fixtures/<vendor>/, run xrails rules validate and xrails rules test. No Python required for atomic rules.

Detailed walkthrough including operators (eq, contains, regex, matches_any, present, absent, gt/gte/lt/lte, in/not_in) and the fact catalog: docs/rule-authoring.md.

How is this different from AgentShield?

AgentShield is the closest existing tool and it's good — we share the underlying motivation that AI agent misconfigurations need real auditing. The differences are scope and architecture:

AgentShield xrails
Primary scope Claude Code Claude Code, Codex CLI, OpenClaw, MCP
Engine model Pattern + scoring Typed fact graph + DSL evaluator
Composite findings Implicit via scoring First-class attack-path rules over a correlation graph
Rule format Python YAML DSL (Python only for graph rules)
LLM role None Optional explain layer, opt-in
CI artifacts JSON / SARIF JSON, SARIF, GitHub Action, baseline, suppressions

We respect AgentShield's prior art — runtimeConfidence and the discovery-skip patterns are clearly inspired by it.

Roadmap

  • v0.2 — current. Multi-vendor parity, schema v2, correlation engine, baseline + suppressions, GitHub Action, demo + benchmark scaffolding.
  • v0.3 — additional vendors (Cline, Cursor agents, Aider), expanded MCP identity rules, OPA/Conftest export for org-level policy packs.
  • v0.4 — VS Code extension surfacing findings inline; benchmark report pipeline; rule packs (xrails-rules-enterprise).

Responsible disclosure

Security issues: see SECURITY.md. Public benchmark scans follow the rules in benchmarks/README.md — no raw secrets, no live exploit chains, no naming projects with critical live exposures before private outreach.

License

Apache 2.0.

xrails is part of GuardrailsAI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xrails-0.2.0.tar.gz (205.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xrails-0.2.0-py3-none-any.whl (112.0 kB view details)

Uploaded Python 3

File details

Details for the file xrails-0.2.0.tar.gz.

File metadata

  • Download URL: xrails-0.2.0.tar.gz
  • Upload date:
  • Size: 205.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.2.0.tar.gz
Algorithm Hash digest
SHA256 444f1ea5daf99e62fb7e86d5c4bc343d89384f99e91a0c2a552ac0b188c33be3
MD5 c50028e0467006422824e69fbc4971bc
BLAKE2b-256 3b2c4d59bccce5fc699d2cda4ce07cddab9a8f6682bae5b04d4c1fb1f45d1bf7

See more details on using hashes here.

File details

Details for the file xrails-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: xrails-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 112.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b2944385619666fa1ef830e32beadd2e342217d5e482ed8ee237201ab93b505
MD5 2b7a5dd12f104ad7827bbef69aa32ac9
BLAKE2b-256 75cc3c5c84223f5568c2f575cfa08e681137d9544536874e921dee76e59cf781

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page