Skip to main content

An AI coding agent you can actually trust - with built-in impact preview

Project description

๐Ÿ›ก๏ธ Safe Agent

Guardrails for AI code agents.

Safe Agent previews every file edit with impact-preview so AI helpers canโ€™t quietly ship risky changes. Drop it into CI or run locally and require approvals before writes.

pip install safe-agent-cli
safe-agent "add error handling to api.py" --dry-run

โœจ New in v0.4.4

  • ๐Ÿ”“ API-keyless diff gate - Run safe-agent --diff-gate to analyze Git changes with no LLM/API key
  • ๐Ÿงท Fork PR coverage - PR workflow now falls back to diff-gate mode when secrets are unavailable
  • ๐Ÿ“Š Same CI artifacts, more contexts - summary/scorecard/policy JSON now work in both task mode and diff mode
  • ๐Ÿ›ก๏ธ Input hardening - --diff-ref validation prevents unsafe ref injection patterns

Project Map

The Problem

AI coding agents are powerful but dangerous:

  • Replit Agent deleted a production database
  • Cursor YOLO mode deleted an entire system
  • You can't see what's about to happen until it's too late

The Solution

Safe Agent previews every change before execution:

$ safe-agent "update database config to use production"

๐Ÿ“‹ Task: update database config to use production

๐Ÿ“ Planned Changes
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Action โ”‚ File            โ”‚ Description             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ MODIFY โ”‚ config/db.yaml  โ”‚ Update database URL     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 1/1

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Impact Preview โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Update database URL                          โ”‚
โ”‚                                              โ”‚
โ”‚ **File:** `config/db.yaml`                   โ”‚
โ”‚ **Action:** MODIFY                           โ”‚
โ”‚ **Risk:** ๐Ÿ”ด CRITICAL                        โ”‚
โ”‚ **Policy:** REQUIRE_APPROVAL [builtin]       โ”‚
โ”‚ **Scanner:** LOW                             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Risk Factors:
  โš ๏ธ  Production pattern detected: production
  โš ๏ธ  Database configuration change

Diff:
- url: postgresql://localhost:5432/dev
+ url: postgresql://prod-server:5432/production

โš ๏ธ  CRITICAL RISK - Please review carefully!
Apply this change? [y/N]: 

Installation

pip install safe-agent-cli

Set your Anthropic API key:

export ANTHROPIC_API_KEY=your-key-here

Usage

Basic Usage

# Run a coding task
safe-agent "add input validation to user registration"

# Preview only (no execution)
safe-agent "refactor auth module" --dry-run

# Auto-approve low-risk changes
safe-agent "add docstrings" --auto-approve-low

CI / Non-interactive mode

Use --non-interactive to avoid prompts (auto-approves when policy allows; skips anything requiring approval). Combine with --fail-on-risk to fail the process if risky changes are proposed:

safe-agent "scan repository for risky config changes" --dry-run --non-interactive --fail-on-risk high

Need an API-keyless gate for forks or locked-down CI? Use diff mode:

# Analyze current HEAD + working tree diff, no ANTHROPIC_API_KEY needed
safe-agent --diff-gate --non-interactive --fail-on-risk high

# Analyze diff against a base ref (typical PR gate)
safe-agent --diff-gate --diff-ref origin/main --non-interactive --fail-on-risk high

For CI artifacts, emit a markdown summary, safety scorecard, and machine-readable report:

safe-agent "scan repository for risky config changes" \
  --dry-run \
  --non-interactive \
  --fail-on-risk high \
  --ci-summary-file .safe-agent-ci/summary.md \
  --safety-scorecard-file .safe-agent-ci/safety-scorecard.md \
  --policy-report .safe-agent-ci/policy-report.json

Adversarial Evaluation (Stage 3 trust signal)

Run the built-in adversarial fixture suite and emit markdown/JSON reports:

safe-agent \
  --adversarial-suite docs/adversarial-suite-v1.json \
  --adversarial-markdown-out .safe-agent-ci/adversarial.md \
  --adversarial-json-out .safe-agent-ci/adversarial.json

Policy (allow/deny/require approval)

By default Safe Agent enforces a built-in policy that:

  • denies obvious secret/key targets (e.g. .env, .ssh, .pem)
  • allows LOW/MEDIUM risk actions
  • requires approval for HIGH/CRITICAL risk actions

Override with a bundled preset:

safe-agent --list-policy-presets
safe-agent "update auth flow" --policy-preset fintech

Preset guide:

Preset Best for Tradeoff
startup Fast-moving product teams Balanced safety; fewer automatic blocks
fintech Regulated or security-sensitive repos Slower flow due to stricter approvals
games Content/asset-heavy iteration More permissive for rapid iteration

CI quickstarts (one per preset):

# Startup (balanced)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset startup \
  --ci-summary-file .safe-agent-ci/startup-summary.md \
  --safety-scorecard-file .safe-agent-ci/startup-safety-scorecard.md \
  --policy-report .safe-agent-ci/startup-policy-report.json

# Fintech (strict)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset fintech --fail-on-risk high \
  --ci-summary-file .safe-agent-ci/fintech-summary.md \
  --safety-scorecard-file .safe-agent-ci/fintech-safety-scorecard.md \
  --policy-report .safe-agent-ci/fintech-policy-report.json

# Games (iterative)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset games \
  --ci-summary-file .safe-agent-ci/games-summary.md \
  --safety-scorecard-file .safe-agent-ci/games-safety-scorecard.md \
  --policy-report .safe-agent-ci/games-policy-report.json

See docs/policy-presets.md for detailed guidance.

Or load a policy file (JSON/YAML):

safe-agent "update auth flow" --policy ./policy.json

Interactive Mode

safe-agent --interactive

From File

safe-agent --file task.md

How It Works

  1. Plan - Claude analyzes your task and plans file changes
  2. Preview - Each change runs through impact-preview for risk analysis
  3. Approve - You see the diff and risk level before anything executes
  4. Execute - Only approved changes are applied

Enterprise & Compliance Features

Safe Agent now includes features for insurance partnerships, regulatory compliance, and enterprise deployments.

Audit Export for Insurance

Export complete audit trails for insurance underwriting and claims:

safe-agent "update production config" --audit-export audit.json

The audit export includes:

  • Complete task history with timestamps
  • Risk assessments for all operations
  • Approval/rejection records (human oversight)
  • Change execution status
  • Compliance flags for regulatory requirements

Perfect for working with AI liability insurance carriers like AIUC, Armilla AI, and Beazley.

See docs/insurance-integration.md for details on insurance partnerships and premium rate factors.

EU AI Act Compliance Mode

Enable strict compliance mode for EU AI Act requirements:

safe-agent "modify user data" --compliance-mode --audit-export audit.json

Compliance mode:

  • Disables all auto-approve features (Article 14: Human Oversight)
  • Requires explicit approval for every operation
  • Records all compliance flags in audit exports
  • Supports Article 12 (Record-Keeping) requirements

Ready for the August 2, 2026 enforcement deadline.

See docs/eu-ai-act-compliance.md for complete compliance guide and requirements mapping.

Incident Documentation

We maintain a comprehensive database of AI agent incidents to raise awareness and demonstrate prevention mechanisms:

Submit an incident report to help the community.

Options

Flag Description
--dry-run Preview changes without executing
--auto-approve-low Auto-approve low-risk changes
--non-interactive Run without prompts (CI-friendly)
--fail-on-risk Exit non-zero if any change meets/exceeds risk level
--policy Path to a policy file (JSON/YAML) for deterministic allow/deny/approval
--policy-preset Use a bundled policy preset (startup, fintech, games)
--list-policy-presets List available policy presets and exit
--adversarial-suite Run adversarial fixture suite from JSON and exit
--adversarial-json-out Write adversarial evaluation JSON report
--adversarial-markdown-out Write adversarial evaluation markdown report
--diff-gate Analyze Git diff directly (no LLM / no API key)
--diff-ref Base Git ref used by --diff-gate (for PR comparisons)
--interactive, -i Interactive mode
--file, -f Read task from file
--version Print installed safe-agent version and exit
--model Claude model to use (default: claude-sonnet-4-20250514)
--audit-export Export audit trail to JSON file (insurance/compliance)
--compliance-mode Enable strict compliance mode (disables auto-approve)
--ci-summary Print a concise markdown CI summary block
--ci-summary-file Write CI summary markdown to a file
--safety-scorecard Print a markdown safety scorecard block
--safety-scorecard-file Write markdown safety scorecard to a file
--policy-report Write machine-readable policy/scanner report JSON
--json-out Write machine-readable run result JSON (status + summary + policy report)

MCP Server (For Other AI Agents)

Safe Agent can be used as an MCP server, letting other AI agents delegate coding tasks safely.

# Start the MCP server
safe-agent-mcp

Claude Desktop Integration

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "safe-agent": {
      "command": "safe-agent-mcp"
    }
  }
}

Available MCP Tools

Tool Description Safety
run_coding_task Execute a coding task with preview ๐Ÿ”ด Destructive
preview_coding_task Preview changes without executing ๐ŸŸข Read-only
get_agent_status Check agent status and capabilities ๐ŸŸข Read-only

Cursor Plugin (Beta)

This repo now includes a Cursor plugin layout:

  • .cursor-plugin/plugin.json
  • .mcp.json
  • rules/, skills/, commands/, agents/

The plugin is aimed at PR safety workflows (risk preview + policy artifacts) and can be submitted to the Cursor Marketplace.

Moltbook Integration

Safe Agent is available as a Moltbook skill for AI agent networks.

See moltbook-skill.json for the skill definition.

GitHub PR Risk Gate

This repo ships a production workflow and local composite action for PR gating:

  • Workflow: .github/workflows/safe-agent-pr-review.yml
  • Action: .github/actions/safe-agent-review/action.yml

The workflow runs on PRs and manual dispatch, then uploads:

  • safe-agent-summary.md (human-readable markdown summary)
  • safety-scorecard.md (risk/policy/scanner metrics for trust reviews)
  • policy-report.json (machine-readable report with rule IDs/outcomes)
  • run-result.json (machine-readable run status for automation adapters)
  • safe-agent.log (full run log)

If ANTHROPIC_API_KEY is unavailable (for example, fork PRs), the workflow automatically falls back to --diff-gate mode using the PR base ref.

For AI Agents

If you're an AI agent wanting to use Safe Agent programmatically:

from safe_agent import SafeAgent

agent = SafeAgent(
    auto_approve_low_risk=True,      # Skip approval for low-risk changes
    dry_run=False,                   # Set True to preview only
    audit_export_path="audit.json",  # Export audit trail for compliance
    compliance_mode=False,           # Enable for EU AI Act compliance
)

result = await agent.run("add error handling to api.py")

For insurance and compliance use cases:

# EU AI Act compliant configuration
agent = SafeAgent(
    compliance_mode=True,              # Strict compliance mode
    audit_export_path="audit.json",    # Required for Article 12
    non_interactive=False,             # Human oversight required
)

Powered By

  • impact-preview - Impact analysis and diff generation
  • Claude - AI planning and code generation
  • Rich - Beautiful terminal output
  • MCP - Model Context Protocol for agent interoperability

Known Incidents

AI coding agents without proper safeguards have caused real damage. We document these incidents to raise awareness and demonstrate why preview-before-execute architecture matters.

Recent Incidents

Submit an Incident

Experienced an AI agent incident? Help the community by submitting an incident report.

Browse all documented incidents in docs/incident-reports/.

License

MIT License - see LICENSE for details.


Built by developers who want AI agents they can actually trust.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safe_agent_cli-0.4.4.tar.gz (231.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safe_agent_cli-0.4.4-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file safe_agent_cli-0.4.4.tar.gz.

File metadata

  • Download URL: safe_agent_cli-0.4.4.tar.gz
  • Upload date:
  • Size: 231.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for safe_agent_cli-0.4.4.tar.gz
Algorithm Hash digest
SHA256 814ed0b677bfb64570cde89d857f9fe1a87c413bbbe3637a8ff102e6281bc4ef
MD5 4d4384ee7343476408bff5b25d32bafe
BLAKE2b-256 ef3351d825ea5e005cf5ad45ed58695e6aec4daecb5edf869953316b8e655d82

See more details on using hashes here.

Provenance

The following attestation bundles were made for safe_agent_cli-0.4.4.tar.gz:

Publisher: release.yml on agent-polis/safe-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file safe_agent_cli-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: safe_agent_cli-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for safe_agent_cli-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6ebe9f0b169378f50fac2b28fc9d832b52693db1f1b976bf5f1bf871f262d4fb
MD5 cb3879f9f81d851fa9e48eeb865527cd
BLAKE2b-256 32dfb3fe86408cba44f43a92c43f13b56a1c5cccbb54fb39b32fe1e49a8efdd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for safe_agent_cli-0.4.4-py3-none-any.whl:

Publisher: release.yml on agent-polis/safe-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page