Open-source AI agent guardrails linter — Claude Code, Codex CLI, OpenClaw, MCP

These details have not been verified by PyPI

Project links

Project description

xrails

Open-source AI agent guardrails linter. Find dangerous permissions, unsafe agent configs, and cross-file attack paths before your AI agent runs.

xrails reads the configuration that drives Claude Code, Codex CLI, OpenClaw, and MCP servers, normalizes it into a typed fact graph, evaluates 34 deterministic rules, and reports findings — including composite attack paths where individual settings look fine but combine into something dangerous.

pipx install xrails    # or: pip install xrails
xrails scan .

xrails scan — /repo
Profiles: claude-code · Grade: F (28/100) · Attack paths: 1
─────────────────────────────────────────────────────────────
[CRITICAL] XR-AP-EXFIL-001  High-confidence secret exfiltration path
  Secret-like file detected
    → Agent can read workspace files
    → Network egress enabled (WebFetch unscoped)
    → Approval prompts disabled (bypassPermissions)
    → Potential data exfiltration

[HIGH]  XR-CLAUDE-001  bypassPermissions mode enabled
  .claude/settings.json:2
  Why: skips approvals for nearly every tool action.
  Fix: set defaultMode to "default" or run inside an isolated container.

…
2 critical · 4 high · 1 medium · template-example: 0

Deterministic. Same input → same findings. No LLM in the pass/fail loop.
Facts-first. Rules query a typed fact graph, never raw config strings.
Multi-vendor. Claude Code, Codex CLI, OpenClaw, MCP — one CLI.
CI-ready. Stable JSON, SARIF 2.1.0, GitHub Action, suppressions, baselines.
Attack paths, not just lints. Source → capability → sink correlation finds the misconfigurations that only matter together.

Why this exists
Quick start
Example output
What xrails detects
Supported platforms
Why facts, not regex
Attack-path example
GitHub Action
LLM explain mode (optional)
Suppressions and baselines
Authoring rules
How is this different from AgentShield?
Roadmap
Responsible disclosure
License

Why this exists

AI coding agents ship with broad permissions by default. A single setting — bypassPermissions, Bash(*), danger-full-access, an unpinned /var/run/docker.sock mount — can let prompt injection from a web page or a README turn into shell execution, secret exfiltration, or destructive git operations.

The risk usually isn't one bad setting. It's the combination: a secret file plus filesystem read plus an outbound channel plus a weak approval policy. Each looks fine alone. Together they're a complete exfiltration path.

xrails is a static analysis tool that finds both atomic misconfigurations and these composite paths, before the agent runs.

Quick start

# Install (pipx is preferred — keeps xrails out of your project venv)
pipx install xrails
# or:
pip install xrails

# Scan the current directory
xrails scan .

# Or aim it at a specific config dir
xrails scan ./examples/vulnerable/claude_code

# JSON for tooling, SARIF for GitHub code scanning
xrails scan . --format json   --output xrails.json
xrails scan . --format sarif  --output xrails.sarif

Try the demo configs that ship with the repo:

git clone https://github.com/guardrailsai/xrails
cd xrails

xrails scan examples/vulnerable/claude_code     # expect findings
xrails scan examples/secure/claude_code         # expect clean
xrails scan examples/vulnerable/attack-path-exfil   # expect XR-AP-EXFIL-001

Example output

A vulnerable Claude Code config:

xrails scan — examples/vulnerable/claude_code
Profiles: claude-code · Grade: F (10/100) · Attack paths: 1

[HIGH]     XR-AP-EXFIL-001  High-confidence secret exfiltration path
[HIGH]     XR-CLAUDE-001    bypassPermissions mode enabled
[HIGH]     XR-CLAUDE-002    Bash(*) allows unrestricted shell execution
[HIGH]     XR-CLAUDE-003    WebFetch allowed without domain scoping
[MEDIUM]   XR-CLAUDE-009    No deny rules for .env or credential files

5 findings (0 critical, 4 high, 1 medium, 0 low) · active-runtime: 5

Same scan as JSON:

xrails scan . --format json | jq '.score, .findings[].rule_id'

{ "numeric": 10, "grade": "F", "critical_count": 0, "high_count": 4, ... }
"XR-AP-EXFIL-001"
"XR-CLAUDE-001"
…

What xrails detects

34 rules across five profiles. Severity ladder: critical / high / medium / low / info.

Claude Code (10 rules)

Rule	Sev	What it detects
XR-CLAUDE-001	HIGH	`bypassPermissions` mode enabled
XR-CLAUDE-002	HIGH	`Bash(*)` wildcard with no deny for destructive commands
XR-CLAUDE-003	HIGH	`WebFetch` without domain scoping
XR-CLAUDE-004	MEDIUM	`additionalDirectories` includes home directory
XR-CLAUDE-005	MEDIUM	`CLAUDE.md` instructs the agent to skip permissions
XR-CLAUDE-006	HIGH	Command hooks run with full user permissions
XR-CLAUDE-007	HIGH	HTTP hooks send to remote URLs
XR-CLAUDE-008	MEDIUM	`allowedHttpHookUrls` unset while hooks enabled
XR-CLAUDE-009	MEDIUM	No deny rules for `.env` or credential files
XR-CLAUDE-010	MEDIUM	`auto` mode without managed org constraints

Codex (8 rules)

Rule	Sev	What it detects
XR-CODEX-001	HIGH	`sandbox_mode = "danger-full-access"`
XR-CODEX-002	HIGH	`approval_policy = "never"` in writable sandbox
XR-CODEX-003	HIGH	Network access enabled in workspace-write sandbox
XR-CODEX-004	MEDIUM	Project config present but possibly untrusted
XR-CODEX-005	HIGH	MCP servers without enterprise allowlist
XR-CODEX-006	HIGH	MCP tool approval overrides are permissive
XR-CODEX-007	MEDIUM	Shell environment policy too permissive
XR-CODEX-008	MEDIUM	`otel.log_user_prompt = true`

OpenClaw (7 rules)

Rule	Sev	What it detects
XR-OPENCLAW-001	HIGH	Sandboxing disabled — tools run on host
XR-OPENCLAW-002	HIGH	Workspace treated as boundary but sandbox off
XR-OPENCLAW-003	HIGH	Docker bind mount without `:ro`
XR-OPENCLAW-004	CRITICAL	`/var/run/docker.sock` bind mount
XR-OPENCLAW-005	MEDIUM	Broad tool groups in allow list
XR-OPENCLAW-006	MEDIUM	`sandbox.mode = non-main`
XR-OPENCLAW-007	MEDIUM	`.env` with secrets in agent workspace

MCP (2 rules)

Rule	Sev	What it detects
XR-MCP-001	HIGH	MCP server + weak approvals + network egress
XR-MCP-002	MEDIUM	Remote MCP transport without identity pinning

Composite attack paths (7 rules)

Rule	Sev	What it detects
XR-AP-EXFIL-001	HIGH	Secret + read + egress + bypass = exfiltration path
XR-AP-DESTRUCT-001	HIGH	Shell + bypass = destructive git ops
XR-AP-DRIFT-001	MEDIUM	Effective policy ≠ intended config
XR-AP-HOOK-EXFIL-001	HIGH	Hooks + unrestricted URLs + secrets
XR-AP-MCP-SUPPLYCHAIN-001	HIGH	MCP auto-install without version pinning
XR-AP-ADDDIR-BYPASS-001	HIGH	`bypassPermissions` + home dir in `additionalDirectories`
XR-AP-SANDBOX-DRIFT-001	HIGH	Sandbox configured but not effective

xrails rules list browses every installed rule with profile and severity filters; xrails rules validate checks the YAML catalog against the schema.

Supported platforms

Agent	Files xrails reads	Notes
Claude Code	`.claude/settings.json`, `.claude/settings.local.json`, `.mcp.json`, `CLAUDE.md`, `.claude/CLAUDE.md`	Permission rules, hooks, MCP servers, instruction files
Codex CLI	`.codex/config.toml`	Sandbox modes, approvals, MCP, telemetry
OpenClaw	`openclaw.json`	Sandbox, Docker binds, tool policy
MCP	Servers declared inside any of the above	Remote/local transport, approval overrides
All	`.env`, `.env.*`	Secret-like keys for attack-path correlation

Findings inside examples/, fixtures/, docs/, samples/, etc. are classified as template-example and scored at a discount, so a repo that ships demo configs won't produce misleading grades.

Why facts, not regex

Most "AI security" linters are regex over config files. That breaks fast: JSON keys reorder, TOML tables nest, YAML aliases hide things. Worse, regex can't see combinations.

xrails parses each config file with a real parser (tomllib, json, PyYAML), normalizes the result into a typed FactGraph, and evaluates rules against facts. Two consequences:

Composite findings are first-class. A secret-bearing .env, a filesystem capability, a network capability, and a weak approval policy are four independent facts. The XR-AP-EXFIL rule fires only when all four coexist.
Rules become portable. New parsers add new facts; rules that query network.enabled == true or approval.policy == never work across vendors without rewriting.

The rule format is YAML — most rules are 20 lines and require no Python. See docs/rule-authoring.md.

Attack-path example

examples/vulnerable/attack-path-exfil/ ships a demo with four conditions that are individually plausible but together unsafe:

Source → Capability → Capability → Sink

[.env with API keys]
       ↓
[Read access (no .env deny)]
       ↓
[WebFetch unscoped → outbound HTTP]
       ↓
[bypassPermissions disables prompts]
       ↓
[Network exfiltration sink]

xrails scan examples/vulnerable/attack-path-exfil
# → XR-AP-EXFIL-001 fires + the underlying atomic findings.

GitHub Action

name: xrails
on: [pull_request, push]
jobs:
  xrails:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install xrails
      - run: xrails scan . --format sarif --output xrails.sarif --fail-on high
      - if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: xrails.sarif
          category: xrails

Findings appear in the Security tab of the repository. Full guide and reusable action recipe: docs/github-action.md.

LLM explain mode (optional)

xrails never uses an LLM to decide whether a finding is real. The optional explain command rewrites a single finding's evidence into prose:

xrails scan . --format json --output report.json
xrails explain --report report.json --id XR-CLAUDE-001 --llm-provider echo

echo is the default and runs entirely offline. openai and anthropic providers are loaded lazily and only used when explicitly requested. All evidence is run through a redactor that strips secret-like keys and long opaque tokens before any prompt is assembled.

Suppressions and baselines

Adopt xrails on a legacy repo without being blocked by pre-existing findings:

xrails scan . --format json --output .xrails-baseline.json
xrails scan . --baseline .xrails-baseline.json   # only NEW findings break the build

Suppress one specific finding for one specific path with audit metadata in .xrails-suppressions.yml:

suppressions:
  - rule_id: XR-CLAUDE-009
    path: examples/**
    reason: demo fixture for documentation
    expires: 2026-12-31

Suppressions carry a reason and an expiry. Expired suppressions stop applying automatically — no silently-stale ignore lists.

Authoring rules

A rule is a YAML file:

schema_version: "2.0"
id: XR-CODEX-001
title: sandbox_mode = danger-full-access
category: excessive_privilege
severity: high
confidence: high
profiles: [codex]
facts:
  all:
    - fact: sandbox.mode
      op: eq
      value: danger-full-access
remediation:
  steps:
    - Use sandbox_mode = "workspace-write" with approval_policy = "on-request".
  references:
    - title: Codex — Agent approvals & security
      url: https://developers.openai.com/codex/agent-approvals-security
cwe: [CWE-269 Improper Privilege Management]

Drop it into src/xrails/rules/<vendor>/, add trigger/safe fixtures under tests/fixtures/<vendor>/, run xrails rules validate and xrails rules test. No Python required for atomic rules.

Detailed walkthrough including operators (eq, contains, regex, matches_any, present, absent, gt/gte/lt/lte, in/not_in) and the fact catalog: docs/rule-authoring.md.

How is this different from AgentShield?

AgentShield is the closest existing tool and it's good — we share the underlying motivation that AI agent misconfigurations need real auditing. The differences are scope and architecture:

	AgentShield	xrails
Primary scope	Claude Code	Claude Code, Codex CLI, OpenClaw, MCP
Engine model	Pattern + scoring	Typed fact graph + DSL evaluator
Composite findings	Implicit via scoring	First-class attack-path rules over a correlation graph
Rule format	Python	YAML DSL (Python only for graph rules)
LLM role	None	Optional explain layer, opt-in
CI artifacts	JSON / SARIF	JSON, SARIF, GitHub Action, baseline, suppressions

We respect AgentShield's prior art — runtimeConfidence and the discovery-skip patterns are clearly inspired by it.

Roadmap

v0.2 — current. Multi-vendor parity, schema v2, correlation engine, baseline + suppressions, GitHub Action, demo + benchmark scaffolding.
v0.3 — additional vendors (Cline, Cursor agents, Aider), expanded MCP identity rules, OPA/Conftest export for org-level policy packs.
v0.4 — VS Code extension surfacing findings inline; benchmark report pipeline; rule packs (xrails-rules-enterprise).

Responsible disclosure

Security issues: see SECURITY.md. Public benchmark scans follow the rules in benchmarks/README.md — no raw secrets, no live exploit chains, no naming projects with critical live exposures before private outreach.

License

Apache 2.0.

xrails is part of GuardrailsAI.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Apr 29, 2026

0.3.0b2 pre-release

Apr 29, 2026

This version

0.2.0

Apr 29, 2026

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xrails-0.2.0.tar.gz (205.6 kB view details)

Uploaded Apr 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xrails-0.2.0-py3-none-any.whl (112.0 kB view details)

Uploaded Apr 29, 2026 Python 3

File details

Details for the file xrails-0.2.0.tar.gz.

File metadata

Download URL: xrails-0.2.0.tar.gz
Upload date: Apr 29, 2026
Size: 205.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`444f1ea5daf99e62fb7e86d5c4bc343d89384f99e91a0c2a552ac0b188c33be3`
MD5	`c50028e0467006422824e69fbc4971bc`
BLAKE2b-256	`3b2c4d59bccce5fc699d2cda4ce07cddab9a8f6682bae5b04d4c1fb1f45d1bf7`

See more details on using hashes here.

File details

Details for the file xrails-0.2.0-py3-none-any.whl.

File metadata

Download URL: xrails-0.2.0-py3-none-any.whl
Upload date: Apr 29, 2026
Size: 112.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xrails-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b2944385619666fa1ef830e32beadd2e342217d5e482ed8ee237201ab93b505`
MD5	`2b7a5dd12f104ad7827bbef69aa32ac9`
BLAKE2b-256	`75cc3c5c84223f5568c2f575cfa08e681137d9544536874e921dee76e59cf781`

See more details on using hashes here.

xrails 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

xrails

Table of contents

Why this exists

Quick start

Example output

What xrails detects

Supported platforms

Why facts, not regex

Attack-path example

GitHub Action

LLM explain mode (optional)

Suppressions and baselines

Authoring rules

How is this different from AgentShield?

Roadmap

Responsible disclosure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes