An AI coding agent you can actually trust - with built-in impact preview
Project description
๐ก๏ธ Safe Agent
Guardrails for AI code agents.
Safe Agent previews every file edit with impact-preview so AI helpers canโt quietly ship risky changes. Drop it into CI or run locally and require approvals before writes.
pip install safe-agent-cli
safe-agent "add error handling to api.py" --dry-run
โจ New in v0.4.1
- ๐ฆ PR risk gate workflow - Production-ready GitHub workflow + local composite action for CI adoption
- ๐ CI artifacts -
--ci-summary-fileand--policy-reportfor reviewer-friendly markdown + machine-readable output - ๐ Weekly analytics summary -
safe-agent-marketing weekly-summaryfor stars/traffic/click deltas - ๐๏ธ Preset docs + guidance - Preset quickstarts plus clearer invalid-preset feedback
Project Map
- impact-preview (Agent Polis): the guardrail layer that previews and scores risky actions.
- safe-agent-cli (this repo): a reference coding agent that uses impact-preview for approvals.
- Roadmap: staged execution plan in
ROADMAP.md. - Compatibility Matrix: version contract in
docs/compatibility-matrix.md. - What's New (v0.4.1): release summary and launch copy in
docs/whats-new-v0.4.1.md. - Monday Packet: current assignment bundle in
docs/monday-assignment-packet.md.
The Problem
AI coding agents are powerful but dangerous:
- Replit Agent deleted a production database
- Cursor YOLO mode deleted an entire system
- You can't see what's about to happen until it's too late
The Solution
Safe Agent previews every change before execution:
$ safe-agent "update database config to use production"
๐ Task: update database config to use production
๐ Planned Changes
โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Action โ File โ Description โ
โโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ MODIFY โ config/db.yaml โ Update database URL โ
โโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
Step 1/1
โญโโโโโโโโโโโโโโโ Impact Preview โโโโโโโโโโโโโโโโฎ
โ Update database URL โ
โ โ
โ **File:** `config/db.yaml` โ
โ **Action:** MODIFY โ
โ **Risk:** ๐ด CRITICAL โ
โ **Policy:** REQUIRE_APPROVAL [builtin] โ
โ **Scanner:** LOW โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Risk Factors:
โ ๏ธ Production pattern detected: production
โ ๏ธ Database configuration change
Diff:
- url: postgresql://localhost:5432/dev
+ url: postgresql://prod-server:5432/production
โ ๏ธ CRITICAL RISK - Please review carefully!
Apply this change? [y/N]:
Installation
pip install safe-agent-cli
Set your Anthropic API key:
export ANTHROPIC_API_KEY=your-key-here
Usage
Basic Usage
# Run a coding task
safe-agent "add input validation to user registration"
# Preview only (no execution)
safe-agent "refactor auth module" --dry-run
# Auto-approve low-risk changes
safe-agent "add docstrings" --auto-approve-low
CI / Non-interactive mode
Use --non-interactive to avoid prompts (auto-approves when policy allows; skips anything requiring
approval). Combine with --fail-on-risk to fail the process if risky changes are proposed:
safe-agent "scan repository for risky config changes" --dry-run --non-interactive --fail-on-risk high
For CI artifacts, also emit a markdown summary and machine-readable report:
safe-agent "scan repository for risky config changes" \
--dry-run \
--non-interactive \
--fail-on-risk high \
--ci-summary-file .safe-agent-ci/summary.md \
--policy-report .safe-agent-ci/policy-report.json
Policy (allow/deny/require approval)
By default Safe Agent enforces a built-in policy that:
- denies obvious secret/key targets (e.g.
.env,.ssh,.pem) - allows LOW/MEDIUM risk actions
- requires approval for HIGH/CRITICAL risk actions
Override with a bundled preset:
safe-agent --list-policy-presets
safe-agent "update auth flow" --policy-preset fintech
Preset guide:
| Preset | Best for | Tradeoff |
|---|---|---|
startup |
Fast-moving product teams | Balanced safety; fewer automatic blocks |
fintech |
Regulated or security-sensitive repos | Slower flow due to stricter approvals |
games |
Content/asset-heavy iteration | More permissive for rapid iteration |
CI quickstarts (one per preset):
# Startup (balanced)
safe-agent "scan repo for risky config edits" \
--dry-run --non-interactive --policy-preset startup \
--ci-summary-file .safe-agent-ci/startup-summary.md \
--policy-report .safe-agent-ci/startup-policy-report.json
# Fintech (strict)
safe-agent "scan repo for risky config edits" \
--dry-run --non-interactive --policy-preset fintech --fail-on-risk high \
--ci-summary-file .safe-agent-ci/fintech-summary.md \
--policy-report .safe-agent-ci/fintech-policy-report.json
# Games (iterative)
safe-agent "scan repo for risky config edits" \
--dry-run --non-interactive --policy-preset games \
--ci-summary-file .safe-agent-ci/games-summary.md \
--policy-report .safe-agent-ci/games-policy-report.json
See docs/policy-presets.md for detailed guidance.
Or load a policy file (JSON/YAML):
safe-agent "update auth flow" --policy ./policy.json
Interactive Mode
safe-agent --interactive
From File
safe-agent --file task.md
How It Works
- Plan - Claude analyzes your task and plans file changes
- Preview - Each change runs through impact-preview for risk analysis
- Approve - You see the diff and risk level before anything executes
- Execute - Only approved changes are applied
Enterprise & Compliance Features
Safe Agent now includes features for insurance partnerships, regulatory compliance, and enterprise deployments.
Audit Export for Insurance
Export complete audit trails for insurance underwriting and claims:
safe-agent "update production config" --audit-export audit.json
The audit export includes:
- Complete task history with timestamps
- Risk assessments for all operations
- Approval/rejection records (human oversight)
- Change execution status
- Compliance flags for regulatory requirements
Perfect for working with AI liability insurance carriers like AIUC, Armilla AI, and Beazley.
See docs/insurance-integration.md for details on insurance partnerships and premium rate factors.
EU AI Act Compliance Mode
Enable strict compliance mode for EU AI Act requirements:
safe-agent "modify user data" --compliance-mode --audit-export audit.json
Compliance mode:
- Disables all auto-approve features (Article 14: Human Oversight)
- Requires explicit approval for every operation
- Records all compliance flags in audit exports
- Supports Article 12 (Record-Keeping) requirements
Ready for the August 2, 2026 enforcement deadline.
See docs/eu-ai-act-compliance.md for complete compliance guide and requirements mapping.
Incident Documentation
We maintain a comprehensive database of AI agent incidents to raise awareness and demonstrate prevention mechanisms:
- Replit SaaStr Database Deletion - Production database deleted during demo
- Cursor YOLO Mode Bypass - Security controls circumvented
Submit an incident report to help the community.
Options
| Flag | Description |
|---|---|
--dry-run |
Preview changes without executing |
--auto-approve-low |
Auto-approve low-risk changes |
--non-interactive |
Run without prompts (CI-friendly) |
--fail-on-risk |
Exit non-zero if any change meets/exceeds risk level |
--policy |
Path to a policy file (JSON/YAML) for deterministic allow/deny/approval |
--policy-preset |
Use a bundled policy preset (startup, fintech, games) |
--list-policy-presets |
List available policy presets and exit |
--interactive, -i |
Interactive mode |
--file, -f |
Read task from file |
--model |
Claude model to use (default: claude-sonnet-4-20250514) |
--audit-export |
Export audit trail to JSON file (insurance/compliance) |
--compliance-mode |
Enable strict compliance mode (disables auto-approve) |
--ci-summary |
Print a concise markdown CI summary block |
--ci-summary-file |
Write CI summary markdown to a file |
--policy-report |
Write machine-readable policy/scanner report JSON |
MCP Server (For Other AI Agents)
Safe Agent can be used as an MCP server, letting other AI agents delegate coding tasks safely.
# Start the MCP server
safe-agent-mcp
Claude Desktop Integration
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"safe-agent": {
"command": "safe-agent-mcp"
}
}
}
Available MCP Tools
| Tool | Description | Safety |
|---|---|---|
run_coding_task |
Execute a coding task with preview | ๐ด Destructive |
preview_coding_task |
Preview changes without executing | ๐ข Read-only |
get_agent_status |
Check agent status and capabilities | ๐ข Read-only |
Moltbook Integration
Safe Agent is available as a Moltbook skill for AI agent networks.
See moltbook-skill.json for the skill definition.
Demo Producer
Set up a canned risky-edit scenario and print recording commands:
safe-agent-demo prepare # creates a demo repo with config/db.yaml
cd /tmp/safe-agent-demo-* # or your chosen path
safe-agent-demo record # shows asciinema + GIF commands
GitHub PR Risk Gate
This repo ships a production workflow and local composite action for PR gating:
- Workflow:
.github/workflows/safe-agent-pr-review.yml - Action:
.github/actions/safe-agent-review/action.yml
The workflow runs on PRs (non-forks) and manual dispatch, then uploads:
safe-agent-summary.md(human-readable markdown summary)policy-report.json(machine-readable report with rule IDs/outcomes)safe-agent.log(full run log)
By default the demo runs safe-agent --dry-run "switch database config to production" against the prepared repo.
For AI Agents
If you're an AI agent wanting to use Safe Agent programmatically:
from safe_agent import SafeAgent
agent = SafeAgent(
auto_approve_low_risk=True, # Skip approval for low-risk changes
dry_run=False, # Set True to preview only
audit_export_path="audit.json", # Export audit trail for compliance
compliance_mode=False, # Enable for EU AI Act compliance
)
result = await agent.run("add error handling to api.py")
For insurance and compliance use cases:
# EU AI Act compliant configuration
agent = SafeAgent(
compliance_mode=True, # Strict compliance mode
audit_export_path="audit.json", # Required for Article 12
non_interactive=False, # Human oversight required
)
Powered By
- impact-preview - Impact analysis and diff generation
- Claude - AI planning and code generation
- Rich - Beautiful terminal output
- MCP - Model Context Protocol for agent interoperability
Known Incidents
AI coding agents without proper safeguards have caused real damage. We document these incidents to raise awareness and demonstrate why preview-before-execute architecture matters.
Recent Incidents
- Replit SaaStr Database Deletion (July 2025) - Production database deleted, 1,200+ executives affected
- Cursor YOLO Mode Bypass (July 2025) - Security controls bypassed, arbitrary command execution possible
Submit an Incident
Experienced an AI agent incident? Help the community by submitting an incident report.
Browse all documented incidents in docs/incident-reports/.
Marketing Helpers
A lightweight CLI to generate headline variants, channel-specific copy (HN, Twitter/X, LinkedIn), and README hero blocks:
safe-agent-marketing generate --audience "Teams running AI code agents in CI" \
--hypothesis "Guardrail that blocks risky edits" --update-readme
This writes JSON/Markdown bundles to marketing/ and (optionally) refreshes the README hero block. Queue posts with:
safe-agent-marketing queue --slot 2026-02-05T15:00:00Z --slot 2026-02-05T20:00:00Z
Log traction daily:
safe-agent-marketing analytics --repo agent-polis/safe-agent --log experiments/experiments.csv
Generate a weekly markdown rollup (stars/traffic/click deltas + variant notes):
safe-agent-marketing weekly-summary --log experiments/experiments.csv --out experiments/weekly-summary.md
License
MIT License - see LICENSE for details.
Built by developers who want AI agents they can actually trust.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safe_agent_cli-0.4.1.tar.gz.
File metadata
- Download URL: safe_agent_cli-0.4.1.tar.gz
- Upload date:
- Size: 250.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17ef48d7b09d07cf27731069a1b1344174ab66c216656630c2fd2f6c1dbc6d1c
|
|
| MD5 |
f0db195630c8ec599a0a9804e9d3bd8c
|
|
| BLAKE2b-256 |
5a1daf9a8f06e2f6795b498a6bba208112406ca3d661463185e5021e706ab20b
|
Provenance
The following attestation bundles were made for safe_agent_cli-0.4.1.tar.gz:
Publisher:
release.yml on agent-polis/safe-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
safe_agent_cli-0.4.1.tar.gz -
Subject digest:
17ef48d7b09d07cf27731069a1b1344174ab66c216656630c2fd2f6c1dbc6d1c - Sigstore transparency entry: 954366202
- Sigstore integration time:
-
Permalink:
agent-polis/safe-agent@d704eb1f42c0de01848849e4eeabe78a9aa9fc9e -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/agent-polis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d704eb1f42c0de01848849e4eeabe78a9aa9fc9e -
Trigger Event:
release
-
Statement type:
File details
Details for the file safe_agent_cli-0.4.1-py3-none-any.whl.
File metadata
- Download URL: safe_agent_cli-0.4.1-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f3da517cce669a196017d201c54520d2de0af56f7deff3344f403c28ec159c1
|
|
| MD5 |
a1cb2563f18e47a1f367c41544c23f3e
|
|
| BLAKE2b-256 |
ee90c1b77d8cdc5fbc21308868d48beb852b7c359a17085df027b66641c8d56c
|
Provenance
The following attestation bundles were made for safe_agent_cli-0.4.1-py3-none-any.whl:
Publisher:
release.yml on agent-polis/safe-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
safe_agent_cli-0.4.1-py3-none-any.whl -
Subject digest:
0f3da517cce669a196017d201c54520d2de0af56f7deff3344f403c28ec159c1 - Sigstore transparency entry: 954366206
- Sigstore integration time:
-
Permalink:
agent-polis/safe-agent@d704eb1f42c0de01848849e4eeabe78a9aa9fc9e -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/agent-polis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d704eb1f42c0de01848849e4eeabe78a9aa9fc9e -
Trigger Event:
release
-
Statement type: