Skip to main content

An AI coding agent you can actually trust - with built-in impact preview

Project description

๐Ÿ›ก๏ธ Safe Agent

Guardrails for AI code agents.

Safe Agent previews every file edit with impact-preview so AI helpers canโ€™t quietly ship risky changes. Drop it into CI or run locally and require approvals before writes.

pip install safe-agent-cli
safe-agent "add error handling to api.py" --dry-run

Project Map

  • impact-preview (Agent Polis): the guardrail layer that previews and scores risky actions.
  • safe-agent-cli (this repo): a reference coding agent that uses impact-preview for approvals.
  • Roadmap: staged execution plan in ROADMAP.md.
  • Compatibility Matrix: version contract in docs/compatibility-matrix.md.
  • Monday Packet: current assignment bundle in docs/monday-assignment-packet.md.

The Problem

AI coding agents are powerful but dangerous:

  • Replit Agent deleted a production database
  • Cursor YOLO mode deleted an entire system
  • You can't see what's about to happen until it's too late

The Solution

Safe Agent previews every change before execution:

$ safe-agent "update database config to use production"

๐Ÿ“‹ Task: update database config to use production

๐Ÿ“ Planned Changes
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Action โ”‚ File            โ”‚ Description             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ MODIFY โ”‚ config/db.yaml  โ”‚ Update database URL     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 1/1

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Impact Preview โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Update database URL                          โ”‚
โ”‚                                              โ”‚
โ”‚ **File:** `config/db.yaml`                   โ”‚
โ”‚ **Action:** MODIFY                           โ”‚
โ”‚ **Risk:** ๐Ÿ”ด CRITICAL                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Risk Factors:
  โš ๏ธ  Production pattern detected: production
  โš ๏ธ  Database configuration change

Diff:
- url: postgresql://localhost:5432/dev
+ url: postgresql://prod-server:5432/production

โš ๏ธ  CRITICAL RISK - Please review carefully!
Apply this change? [y/N]: 

Installation

pip install safe-agent-cli

Set your Anthropic API key:

export ANTHROPIC_API_KEY=your-key-here

Usage

Basic Usage

# Run a coding task
safe-agent "add input validation to user registration"

# Preview only (no execution)
safe-agent "refactor auth module" --dry-run

# Auto-approve low-risk changes
safe-agent "add docstrings" --auto-approve-low

CI / Non-interactive mode

Use --non-interactive to avoid prompts (auto-approves LOW/MEDIUM, rejects HIGH/CRITICAL). Combine with --fail-on-risk to fail the process if risky changes are proposed:

safe-agent "scan repository for risky config changes" --dry-run --non-interactive --fail-on-risk high

Interactive Mode

safe-agent --interactive

From File

safe-agent --file task.md

How It Works

  1. Plan - Claude analyzes your task and plans file changes
  2. Preview - Each change runs through impact-preview for risk analysis
  3. Approve - You see the diff and risk level before anything executes
  4. Execute - Only approved changes are applied

Options

Flag Description
--dry-run Preview changes without executing
--auto-approve-low Auto-approve low-risk changes
--non-interactive Run without prompts (CI-friendly)
--fail-on-risk Exit non-zero if any change meets/exceeds risk level
--interactive, -i Interactive mode
--file, -f Read task from file
--model Claude model to use (default: claude-sonnet-4-20250514)

MCP Server (For Other AI Agents)

Safe Agent can be used as an MCP server, letting other AI agents delegate coding tasks safely.

# Start the MCP server
safe-agent-mcp

Claude Desktop Integration

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "safe-agent": {
      "command": "safe-agent-mcp"
    }
  }
}

Available MCP Tools

Tool Description Safety
run_coding_task Execute a coding task with preview ๐Ÿ”ด Destructive
preview_coding_task Preview changes without executing ๐ŸŸข Read-only
get_agent_status Check agent status and capabilities ๐ŸŸข Read-only

Moltbook Integration

Safe Agent is available as a Moltbook skill for AI agent networks.

See moltbook-skill.json for the skill definition.

Demo Producer

Set up a canned risky-edit scenario and print recording commands:

safe-agent-demo prepare  # creates a demo repo with config/db.yaml
cd /tmp/safe-agent-demo-*  # or your chosen path
safe-agent-demo record     # shows asciinema + GIF commands

By default the demo runs safe-agent --dry-run "switch database config to production" against the prepared repo.

Safe Agent demo

For AI Agents

If you're an AI agent wanting to use Safe Agent programmatically:

from safe_agent import SafeAgent

agent = SafeAgent(
    auto_approve_low_risk=True,  # Skip approval for low-risk changes
    dry_run=False,               # Set True to preview only
)

result = await agent.run("add error handling to api.py")

Powered By

  • impact-preview - Impact analysis and diff generation
  • Claude - AI planning and code generation
  • Rich - Beautiful terminal output
  • MCP - Model Context Protocol for agent interoperability

Marketing Helpers

A lightweight CLI to generate headline variants, channel-specific copy (HN, Twitter/X, LinkedIn), and README hero blocks:

safe-agent-marketing generate --audience "Teams running AI code agents in CI" \
  --hypothesis "Guardrail that blocks risky edits" --update-readme

This writes JSON/Markdown bundles to marketing/ and (optionally) refreshes the README hero block. Queue posts with:

safe-agent-marketing queue --slot 2026-02-05T15:00:00Z --slot 2026-02-05T20:00:00Z

Log traction daily:

safe-agent-marketing analytics --repo agent-polis/safe-agent --log experiments/experiments.csv

License

MIT License - see LICENSE for details.


Built by developers who want AI agents they can actually trust.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safe_agent_cli-0.3.0.tar.gz (95.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safe_agent_cli-0.3.0-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file safe_agent_cli-0.3.0.tar.gz.

File metadata

  • Download URL: safe_agent_cli-0.3.0.tar.gz
  • Upload date:
  • Size: 95.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for safe_agent_cli-0.3.0.tar.gz
Algorithm Hash digest
SHA256 03a187b35c1b6ce633ea27d75a189f68a8aa4e650f298a756fe33a92b3061749
MD5 c4940c849816c07fb00bad8dada8e759
BLAKE2b-256 92b3ebe839defc5b3ff63d70a8bd419018acf507bd6b9cce3e7edbd790cf6e8e

See more details on using hashes here.

File details

Details for the file safe_agent_cli-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: safe_agent_cli-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for safe_agent_cli-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d8e52d1b8b0f0051baea672b7255b428a2d06a28ebeeb7c7abeae618c69b251
MD5 454bbfb7d646d4cc95d380bd5ef8953c
BLAKE2b-256 73bae704536780a11ff9d1c7b14945ad3a743de0c88e6791e3b1afb1e47f6dc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page