Skip to main content

Open-source AI agent guardrails linter — Claude Code (incl. Skills), Codex CLI, OpenClaw, MCP

Project description

xrails

Open-source AI agent guardrails linter. Find dangerous permissions, unsafe agent configs, risky skills, and cross-file attack paths before your AI agent runs.

xrails reads the configuration that drives Claude Code (including Skills), Codex CLI, OpenClaw, and MCP servers, normalizes it into a typed fact graph, evaluates 40 deterministic rules, and reports findings — including composite attack paths where individual settings look fine but combine into something dangerous.

pipx install xrails    # or: pip install xrails
xrails scan .
xrails scan — /repo
Profiles: claude-code · Grade: F (28/100) · Attack paths: 1
─────────────────────────────────────────────────────────────
[CRITICAL] XR-AP-EXFIL-001  High-confidence secret exfiltration path
  Secret-like file detected
    → Agent can read workspace files
    → Network egress enabled (WebFetch unscoped)
    → Approval prompts disabled (bypassPermissions)
    → Potential data exfiltration

[HIGH]  XR-CLAUDE-001  bypassPermissions mode enabled
  .claude/settings.json:2
  Why: skips approvals for nearly every tool action.
  Fix: set defaultMode to "default" or run inside an isolated container.

…
2 critical · 4 high · 1 medium · template-example: 0
  • Deterministic. Same input → same findings. No LLM in the pass/fail loop.
  • Facts-first. Rules query a typed fact graph, never raw config strings.
  • Multi-vendor. Claude Code, Codex CLI, OpenClaw, MCP, and Claude Code Skills (SKILL.md) — one CLI.
  • CI-ready. Stable JSON, SARIF 2.1.0, GitHub Action, suppressions, baselines.
  • Attack paths, not just lints. Source → capability → sink correlation finds the misconfigurations that only matter together.
  • Runs inside Claude Code. Slash command /xrails triages findings using the agent's own LLM — no extra API key needed.

Table of contents


Why this exists

AI coding agents ship with broad permissions by default. A single setting — bypassPermissions, Bash(*), danger-full-access, an unpinned /var/run/docker.sock mount — can let prompt injection from a web page or a README turn into shell execution, secret exfiltration, or destructive git operations.

The risk usually isn't one bad setting. It's the combination: a secret file plus filesystem read plus an outbound channel plus a weak approval policy. Each looks fine alone. Together they're a complete exfiltration path.

xrails is a static analysis tool that finds both atomic misconfigurations and these composite paths, before the agent runs.

Quick start

# Install (pipx is preferred — keeps xrails out of your project venv)
pipx install xrails
# or:
pip install xrails

# Scan the current directory
xrails scan .

# Or aim it at a specific config dir
xrails scan ./examples/vulnerable/claude_code

# JSON for tooling, SARIF for GitHub code scanning
xrails scan . --format json   --output xrails.json
xrails scan . --format sarif  --output xrails.sarif

Try the demo configs that ship with the repo:

git clone https://github.com/xrails-ai/xrails
cd xrails

xrails scan examples/vulnerable/claude_code     # expect findings
xrails scan examples/secure/claude_code         # expect clean
xrails scan examples/vulnerable/attack-path-exfil   # expect XR-AP-EXFIL-001

Example output

A vulnerable Claude Code config:

xrails scan — examples/vulnerable/claude_code
Profiles: claude-code · Grade: F (10/100) · Attack paths: 1

[HIGH]     XR-AP-EXFIL-001  High-confidence secret exfiltration path
[HIGH]     XR-CLAUDE-001    bypassPermissions mode enabled
[HIGH]     XR-CLAUDE-002    Bash(*) allows unrestricted shell execution
[HIGH]     XR-CLAUDE-003    WebFetch allowed without domain scoping
[MEDIUM]   XR-CLAUDE-009    No deny rules for .env or credential files

5 findings (0 critical, 4 high, 1 medium, 0 low) · active-runtime: 5

Same scan as JSON:

xrails scan . --format json | jq '.score, .findings[].rule_id'
{ "numeric": 10, "grade": "F", "critical_count": 0, "high_count": 4, ... }
"XR-AP-EXFIL-001"
"XR-CLAUDE-001"

What xrails detects

40 rules across six profiles. Severity ladder: critical / high / medium / low / info.

Claude Code (10 rules)
Rule Sev What it detects
XR-CLAUDE-001 HIGH bypassPermissions mode enabled
XR-CLAUDE-002 HIGH Bash(*) wildcard with no deny for destructive commands
XR-CLAUDE-003 HIGH WebFetch without domain scoping
XR-CLAUDE-004 MEDIUM additionalDirectories includes home directory
XR-CLAUDE-005 MEDIUM CLAUDE.md instructs the agent to skip permissions
XR-CLAUDE-006 HIGH Command hooks run with full user permissions
XR-CLAUDE-007 HIGH HTTP hooks send to remote URLs
XR-CLAUDE-008 MEDIUM allowedHttpHookUrls unset while hooks enabled
XR-CLAUDE-009 MEDIUM No deny rules for .env or credential files
XR-CLAUDE-010 MEDIUM auto mode without managed org constraints
Codex (8 rules)
Rule Sev What it detects
XR-CODEX-001 HIGH sandbox_mode = "danger-full-access"
XR-CODEX-002 HIGH approval_policy = "never" in writable sandbox
XR-CODEX-003 HIGH Network access enabled in workspace-write sandbox
XR-CODEX-004 MEDIUM Project config present but possibly untrusted
XR-CODEX-005 HIGH MCP servers without enterprise allowlist
XR-CODEX-006 HIGH MCP tool approval overrides are permissive
XR-CODEX-007 MEDIUM Shell environment policy too permissive
XR-CODEX-008 MEDIUM otel.log_user_prompt = true
OpenClaw (7 rules)
Rule Sev What it detects
XR-OPENCLAW-001 HIGH Sandboxing disabled — tools run on host
XR-OPENCLAW-002 HIGH Workspace treated as boundary but sandbox off
XR-OPENCLAW-003 HIGH Docker bind mount without :ro
XR-OPENCLAW-004 CRITICAL /var/run/docker.sock bind mount
XR-OPENCLAW-005 MEDIUM Broad tool groups in allow list
XR-OPENCLAW-006 MEDIUM sandbox.mode = non-main
XR-OPENCLAW-007 MEDIUM .env with secrets in agent workspace
MCP (2 rules)
Rule Sev What it detects
XR-MCP-001 HIGH MCP server + weak approvals + network egress
XR-MCP-002 MEDIUM Remote MCP transport without identity pinning
Skills (5 rules + 1 composite)
Rule Sev What it detects
XR-SKILL-001 HIGH SKILL.md allowed-tools includes bare wildcards (Bash, Read, WebFetch)
XR-SKILL-002 HIGH Skill body contains shell-exfil patterns (curl --data, nc <host>, base64 | curl)
XR-SKILL-003 MEDIUM Skill body references credential paths (~/.aws, ~/.ssh, .env)
XR-SKILL-004 MEDIUM Skill body uses approval-bypass language ("skip permissions", "dangerously")
XR-SKILL-005 MEDIUM Third-party skill loaded from a plugin / vendored tree without pinning
XR-AP-SKILL-EXFIL-001 HIGH Composite: one or more skills read secrets AND one or more emit network traffic
Composite attack paths (7 rules)
Rule Sev What it detects
XR-AP-EXFIL-001 HIGH Secret + read + egress + bypass = exfiltration path
XR-AP-DESTRUCT-001 HIGH Shell + bypass = destructive git ops
XR-AP-DRIFT-001 MEDIUM Effective policy ≠ intended config
XR-AP-HOOK-EXFIL-001 HIGH Hooks + unrestricted URLs + secrets
XR-AP-MCP-SUPPLYCHAIN-001 HIGH MCP auto-install without version pinning
XR-AP-ADDDIR-BYPASS-001 HIGH bypassPermissions + home dir in additionalDirectories
XR-AP-SANDBOX-DRIFT-001 HIGH Sandbox configured but not effective

xrails rules list browses every installed rule with profile and severity filters; xrails rules validate checks the YAML catalog against the schema.

Supported platforms

Agent Files xrails reads Notes
Claude Code .claude/settings.json, .claude/settings.local.json, .mcp.json, CLAUDE.md, .claude/CLAUDE.md Permission rules, hooks, MCP servers, instruction files
Codex CLI .codex/config.toml Sandbox modes, approvals, MCP, telemetry
OpenClaw openclaw.json Sandbox, Docker binds, tool policy
MCP Servers declared inside any of the above Remote/local transport, approval overrides
Skills .claude/skills/*/SKILL.md, plugin/vendored skills Frontmatter allowed-tools + body heuristics (shell exfil, sensitive paths, bypass language)
All .env, .env.* Secret-like keys for attack-path correlation

Findings inside examples/, fixtures/, docs/, samples/, etc. are classified as template-example and scored at a discount, so a repo that ships demo configs won't produce misleading grades.

Skill scanning

xrails reads every SKILL.md under .claude/skills/, plugin trees, and vendored skill directories. Six rules detect:

  • broad allowed-tools (bare Bash, Read, WebFetch)
  • shell exfil patterns in the body (curl --data, nc <host>, base64 | curl)
  • references to credential paths (~/.aws, ~/.ssh, .env)
  • approval-bypass language ("skip permissions", "dangerously")
  • third-party skills loaded without version pinning
  • a composite path: skill reads secrets ∧ skill emits network traffic
xrails scan examples/vulnerable/risky-skill

Output (abridged):

[HIGH]   XR-AP-SKILL-EXFIL-001  Skill exfiltration path
[HIGH]   XR-SKILL-001           Skill declares broad allowed-tools
[HIGH]   XR-SKILL-002           Skill body contains shell exfil patterns
[MEDIUM] XR-SKILL-003           Skill body references sensitive paths
[MEDIUM] XR-SKILL-004           Skill body uses approval-bypass language

LLM advisory review (opt-in)

Run an LLM advisory pass against a SKILL.md or a directory of them:

# Default offline — no API key required
xrails review-skill ./.claude/skills

# Use a real provider
export OPENAI_API_KEY=sk-...
xrails review-skill ./.claude/skills --llm-provider openai

# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
xrails review-skill ./.claude/skills --llm-provider anthropic

This output is advisory, not a finding. It does not affect exit code, score, or SARIF, and is clearly labelled in the terminal. Skill body is redacted (secret-like keys, long opaque tokens) before any prompt is assembled. The deterministic scanner (xrails scan) remains the source of truth in CI.

Run inside Claude Code (no extra API key)

xrails ships a slash command and an auto-invoke skill for Claude Code:

xrails install-agent --scope project   # writes into ./.claude/
# or:
xrails install-agent --scope user      # writes into ~/.claude/

Then in Claude Code:

  • Type /xrails to run a triage on the current repo.
  • Or ask Claude "audit my agent setup" — the xrails-audit skill auto-invokes when context fits.

The integration uses Claude's own LLM for triage. xrails provides the deterministic findings (via xrails scan and xrails review-skill JSON output); Claude reasons over them with project context. No separate API-key configuration needed.

The slash command and skill both pin allowed-tools to Bash(xrails *), Bash(which xrails), and Read — Claude cannot run arbitrary shell or write files from this integration.

Privacy and data handling

xrails is built so the deterministic scan runs entirely on your machine.

Surface Default behavior Network calls? Notes
xrails scan Local No Reads files, evaluates rules, writes report.
xrails review-skill echo provider (offline) No Heuristic pseudo-review from the parser.
xrails review-skill --llm-provider openai/anthropic Opt-in Yes Body redacted before prompt.
xrails explain --llm-provider openai/anthropic Opt-in Yes Evidence redacted before prompt.
/xrails slash command in Claude Code Uses host LLM Inherits Claude Code's network policy xrails subprocess itself stays local.

Concrete guarantees, all enforced by tests:

  • xrails scan never imports openai, anthropic, or any HTTP client.
  • .env parsing stores only value_present: bool — raw secret values never reach reporters, logs, JSON, or SARIF.
  • Redaction (strip secret-like keys + long opaque tokens) runs on every byte of evidence before any LLM provider receives a prompt.
  • ReviewSuggestion is a separate type from Finding — it cannot enter exit code, score, SARIF, or --fail-on decisions.
  • Provider SDKs are imported lazily inside _get_client() and only when a non-echo provider is explicitly selected.

If any of these claims look wrong, please open a security issue per SECURITY.md.

Why facts, not regex

Most "AI security" linters are regex over config files. That breaks fast: JSON keys reorder, TOML tables nest, YAML aliases hide things. Worse, regex can't see combinations.

xrails parses each config file with a real parser (tomllib, json, PyYAML), normalizes the result into a typed FactGraph, and evaluates rules against facts. Two consequences:

  1. Composite findings are first-class. A secret-bearing .env, a filesystem capability, a network capability, and a weak approval policy are four independent facts. The XR-AP-EXFIL rule fires only when all four coexist.
  2. Rules become portable. New parsers add new facts; rules that query network.enabled == true or approval.policy == never work across vendors without rewriting.

The rule format is YAML — most rules are 20 lines and require no Python. See docs/rule-authoring.md.

Attack-path example

examples/vulnerable/attack-path-exfil/ ships a demo with four conditions that are individually plausible but together unsafe:

Source → Capability → Capability → Sink

[.env with API keys]
       ↓
[Read access (no .env deny)]
       ↓
[WebFetch unscoped → outbound HTTP]
       ↓
[bypassPermissions disables prompts]
       ↓
[Network exfiltration sink]
xrails scan examples/vulnerable/attack-path-exfil
# → XR-AP-EXFIL-001 fires + the underlying atomic findings.

GitHub Action

name: xrails
on: [pull_request, push]
jobs:
  xrails:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install xrails
      - run: xrails scan . --format sarif --output xrails.sarif --fail-on high
      - if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: xrails.sarif
          category: xrails

Findings appear in the Security tab of the repository. Full guide and reusable action recipe: docs/github-action.md.

Run inside Claude Code (no extra API key)

xrails ships a slash command and a skill for Claude Code. Install once:

xrails install-agent --scope project   # writes into ./.claude/
# or:
xrails install-agent --scope user      # writes into ~/.claude/

Then in Claude Code:

  • Type /xrails to run a triage on the current repo.
  • Or ask Claude "audit my agent setup" — the xrails-audit skill auto-invokes when context fits.

The integration uses Claude's own LLM for triage. xrails provides the deterministic findings (via xrails scan and xrails review-skill JSON output); Claude reasons over them with project context. No separate API-key configuration needed.

LLM explain mode (optional)

xrails never uses an LLM to decide whether a finding is real. The optional explain command rewrites a single finding's evidence into prose:

xrails scan . --format json --output report.json
xrails explain --report report.json --id XR-CLAUDE-001 --llm-provider echo

echo is the default and runs entirely offline. openai and anthropic providers are loaded lazily and only used when explicitly requested. All evidence is run through a redactor that strips secret-like keys and long opaque tokens before any prompt is assembled.

Suppressions and baselines

Adopt xrails on a legacy repo without being blocked by pre-existing findings:

xrails scan . --format json --output .xrails-baseline.json
xrails scan . --baseline .xrails-baseline.json   # only NEW findings break the build

Suppress one specific finding for one specific path with audit metadata in .xrails-suppressions.yml:

suppressions:
  - rule_id: XR-CLAUDE-009
    path: examples/**
    reason: demo fixture for documentation
    expires: 2026-12-31

Suppressions carry a reason and an expiry. Expired suppressions stop applying automatically — no silently-stale ignore lists.

Authoring rules

A rule is a YAML file:

schema_version: "2.0"
id: XR-CODEX-001
title: sandbox_mode = danger-full-access
category: excessive_privilege
severity: high
confidence: high
profiles: [codex]
facts:
  all:
    - fact: sandbox.mode
      op: eq
      value: danger-full-access
remediation:
  steps:
    - Use sandbox_mode = "workspace-write" with approval_policy = "on-request".
  references:
    - title: Codex — Agent approvals & security
      url: https://developers.openai.com/codex/agent-approvals-security
cwe: [CWE-269 Improper Privilege Management]

Drop it into src/xrails/rules/<vendor>/, add trigger/safe fixtures under tests/fixtures/<vendor>/, run xrails rules validate and xrails rules test. No Python required for atomic rules.

Detailed walkthrough including operators (eq, contains, regex, matches_any, present, absent, gt/gte/lt/lte, in/not_in) and the fact catalog: docs/rule-authoring.md.

How is this different from AgentShield?

AgentShield is the closest existing tool and it's good — we share the underlying motivation that AI agent misconfigurations need real auditing. The differences are scope and architecture:

AgentShield xrails
Primary scope Claude Code Claude Code (incl. Skills), Codex CLI, OpenClaw, MCP
Engine model Pattern + scoring Typed fact graph + DSL evaluator
Composite findings Implicit via scoring First-class attack-path rules over a correlation graph
Rule format Python YAML DSL (Python only for graph rules)
LLM role None Optional, opt-in: xrails explain, xrails review-skill
Claude Code integration /xrails slash command + xrails-audit skill
CI artifacts JSON / SARIF JSON, SARIF, GitHub Action, baseline, suppressions

We respect AgentShield's prior art — runtimeConfidence and the discovery-skip patterns are clearly inspired by it.

Roadmap

  • v0.3 (current, beta) — Skill scanning (SKILL.md), LLM advisory review (xrails review-skill), Claude Code slash command + skill (/xrails, xrails-audit), 40 rules, 312 tests.
  • v0.4 — additional vendors (Cline, Cursor agents, Aider), expanded MCP identity rules, OPA/Conftest export for org-level policy packs.
  • v0.5 — VS Code extension surfacing findings inline; public benchmark report pipeline; rule packs (xrails-rules-enterprise).

Responsible disclosure

Security issues: see SECURITY.md. Public benchmark scans follow the rules in benchmarks/README.md — no raw secrets, no live exploit chains, no naming projects with critical live exposures before private outreach.

License

Apache 2.0.

xrails is part of GuardrailsAI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xrails-0.3.0b2.tar.gz (240.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xrails-0.3.0b2-py3-none-any.whl (140.4 kB view details)

Uploaded Python 3

File details

Details for the file xrails-0.3.0b2.tar.gz.

File metadata

  • Download URL: xrails-0.3.0b2.tar.gz
  • Upload date:
  • Size: 240.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.3.0b2.tar.gz
Algorithm Hash digest
SHA256 4fc1e6e834cd5b706f7c06e2bd3402772b8676e24ae9b5b7f45a2bb78003e108
MD5 fa0054a89a1e78aa30f2ea117e067ab0
BLAKE2b-256 58655baba33294485cb8492a71d61ec9076cd5abc06b5d9cea09f78b3a44f094

See more details on using hashes here.

File details

Details for the file xrails-0.3.0b2-py3-none-any.whl.

File metadata

  • Download URL: xrails-0.3.0b2-py3-none-any.whl
  • Upload date:
  • Size: 140.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.3.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 27d606ec6f1c0961c23ca1956c9233dd67c5a860936f141a0100c68786032254
MD5 495be445e68f143e64e003e6165240c6
BLAKE2b-256 0b29e693d40193d0a1245c3e015fe280fc3d8d4e3c7a2e14474641667557e17e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page