An AI coding agent you can actually trust - with built-in impact preview

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

danwaterfield

These details have not been verified by PyPI

Project description

🛡️ Safe Agent

Guardrails for AI code agents.

Safe Agent previews every file edit with impact-preview so AI helpers can’t quietly ship risky changes. Drop it into CI or run locally and require approvals before writes.

pip install safe-agent-cli
safe-agent "add error handling to api.py" --dry-run

✨ New in v0.4.3

🧪 Deterministic adversarial eval - Run --adversarial-suite in CI (no API key) for a reproducible trust signal
📊 Safety scorecard artifact - --safety-scorecard-file captures risk/policy/scanner metrics for reviews
🧱 Hardened CI gate - composite action uses pipefail + fallback artifacts to avoid false-green gates
🎛️ Preset docs + guidance - Preset quickstarts plus clearer invalid-preset feedback

Project Map

impact-preview (Agent Polis): the guardrail layer that previews and scores risky actions.
safe-agent-cli (this repo): a reference coding agent that uses impact-preview for approvals.
Roadmap: staged execution plan in ROADMAP.md.
Compatibility Matrix: version contract in docs/compatibility-matrix.md.
What's New (v0.4.3): release summary in docs/whats-new-v0.4.3.md.
Monday Packet: current assignment bundle in docs/monday-assignment-packet.md.

The Problem

AI coding agents are powerful but dangerous:

Replit Agent deleted a production database
Cursor YOLO mode deleted an entire system
You can't see what's about to happen until it's too late

The Solution

Safe Agent previews every change before execution:

$ safe-agent "update database config to use production"

📋 Task: update database config to use production

📝 Planned Changes
┌────────┬─────────────────┬─────────────────────────┐
│ Action │ File            │ Description             │
├────────┼─────────────────┼─────────────────────────┤
│ MODIFY │ config/db.yaml  │ Update database URL     │
└────────┴─────────────────┴─────────────────────────┘

Step 1/1

╭─────────────── Impact Preview ───────────────╮
│ Update database URL                          │
│                                              │
│ **File:** `config/db.yaml`                   │
│ **Action:** MODIFY                           │
│ **Risk:** 🔴 CRITICAL                        │
│ **Policy:** REQUIRE_APPROVAL [builtin]       │
│ **Scanner:** LOW                             │
╰──────────────────────────────────────────────╯

Risk Factors:
  ⚠️  Production pattern detected: production
  ⚠️  Database configuration change

Diff:
- url: postgresql://localhost:5432/dev
+ url: postgresql://prod-server:5432/production

⚠️  CRITICAL RISK - Please review carefully!
Apply this change? [y/N]:

Installation

pip install safe-agent-cli

Set your Anthropic API key:

export ANTHROPIC_API_KEY=your-key-here

Usage

Basic Usage

# Run a coding task
safe-agent "add input validation to user registration"

# Preview only (no execution)
safe-agent "refactor auth module" --dry-run

# Auto-approve low-risk changes
safe-agent "add docstrings" --auto-approve-low

CI / Non-interactive mode

Use --non-interactive to avoid prompts (auto-approves when policy allows; skips anything requiring approval). Combine with --fail-on-risk to fail the process if risky changes are proposed:

safe-agent "scan repository for risky config changes" --dry-run --non-interactive --fail-on-risk high

For CI artifacts, emit a markdown summary, safety scorecard, and machine-readable report:

safe-agent "scan repository for risky config changes" \
  --dry-run \
  --non-interactive \
  --fail-on-risk high \
  --ci-summary-file .safe-agent-ci/summary.md \
  --safety-scorecard-file .safe-agent-ci/safety-scorecard.md \
  --policy-report .safe-agent-ci/policy-report.json

Adversarial Evaluation (Stage 3 trust signal)

Run the built-in adversarial fixture suite and emit markdown/JSON reports:

safe-agent \
  --adversarial-suite docs/adversarial-suite-v1.json \
  --adversarial-markdown-out .safe-agent-ci/adversarial.md \
  --adversarial-json-out .safe-agent-ci/adversarial.json

Policy (allow/deny/require approval)

By default Safe Agent enforces a built-in policy that:

denies obvious secret/key targets (e.g. .env, .ssh, .pem)
allows LOW/MEDIUM risk actions
requires approval for HIGH/CRITICAL risk actions

Override with a bundled preset:

safe-agent --list-policy-presets
safe-agent "update auth flow" --policy-preset fintech

Preset guide:

Preset	Best for	Tradeoff
`startup`	Fast-moving product teams	Balanced safety; fewer automatic blocks
`fintech`	Regulated or security-sensitive repos	Slower flow due to stricter approvals
`games`	Content/asset-heavy iteration	More permissive for rapid iteration

CI quickstarts (one per preset):

# Startup (balanced)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset startup \
  --ci-summary-file .safe-agent-ci/startup-summary.md \
  --safety-scorecard-file .safe-agent-ci/startup-safety-scorecard.md \
  --policy-report .safe-agent-ci/startup-policy-report.json

# Fintech (strict)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset fintech --fail-on-risk high \
  --ci-summary-file .safe-agent-ci/fintech-summary.md \
  --safety-scorecard-file .safe-agent-ci/fintech-safety-scorecard.md \
  --policy-report .safe-agent-ci/fintech-policy-report.json

# Games (iterative)
safe-agent "scan repo for risky config edits" \
  --dry-run --non-interactive --policy-preset games \
  --ci-summary-file .safe-agent-ci/games-summary.md \
  --safety-scorecard-file .safe-agent-ci/games-safety-scorecard.md \
  --policy-report .safe-agent-ci/games-policy-report.json

See docs/policy-presets.md for detailed guidance.

Or load a policy file (JSON/YAML):

safe-agent "update auth flow" --policy ./policy.json

Interactive Mode

safe-agent --interactive

From File

safe-agent --file task.md

How It Works

Plan - Claude analyzes your task and plans file changes
Preview - Each change runs through impact-preview for risk analysis
Approve - You see the diff and risk level before anything executes
Execute - Only approved changes are applied

Enterprise & Compliance Features

Safe Agent now includes features for insurance partnerships, regulatory compliance, and enterprise deployments.

Audit Export for Insurance

Export complete audit trails for insurance underwriting and claims:

safe-agent "update production config" --audit-export audit.json

The audit export includes:

Complete task history with timestamps
Risk assessments for all operations
Approval/rejection records (human oversight)
Change execution status
Compliance flags for regulatory requirements

Perfect for working with AI liability insurance carriers like AIUC, Armilla AI, and Beazley.

See docs/insurance-integration.md for details on insurance partnerships and premium rate factors.

EU AI Act Compliance Mode

Enable strict compliance mode for EU AI Act requirements:

safe-agent "modify user data" --compliance-mode --audit-export audit.json

Compliance mode:

Disables all auto-approve features (Article 14: Human Oversight)
Requires explicit approval for every operation
Records all compliance flags in audit exports
Supports Article 12 (Record-Keeping) requirements

Ready for the August 2, 2026 enforcement deadline.

See docs/eu-ai-act-compliance.md for complete compliance guide and requirements mapping.

Incident Documentation

We maintain a comprehensive database of AI agent incidents to raise awareness and demonstrate prevention mechanisms:

Replit SaaStr Database Deletion - Production database deleted during demo
Cursor YOLO Mode Bypass - Security controls circumvented

Submit an incident report to help the community.

Options

Flag	Description
`--dry-run`	Preview changes without executing
`--auto-approve-low`	Auto-approve low-risk changes
`--non-interactive`	Run without prompts (CI-friendly)
`--fail-on-risk`	Exit non-zero if any change meets/exceeds risk level
`--policy`	Path to a policy file (JSON/YAML) for deterministic allow/deny/approval
`--policy-preset`	Use a bundled policy preset (startup, fintech, games)
`--list-policy-presets`	List available policy presets and exit
`--adversarial-suite`	Run adversarial fixture suite from JSON and exit
`--adversarial-json-out`	Write adversarial evaluation JSON report
`--adversarial-markdown-out`	Write adversarial evaluation markdown report
`--interactive`, `-i`	Interactive mode
`--file`, `-f`	Read task from file
`--version`	Print installed safe-agent version and exit
`--model`	Claude model to use (default: claude-sonnet-4-20250514)
`--audit-export`	Export audit trail to JSON file (insurance/compliance)
`--compliance-mode`	Enable strict compliance mode (disables auto-approve)
`--ci-summary`	Print a concise markdown CI summary block
`--ci-summary-file`	Write CI summary markdown to a file
`--safety-scorecard`	Print a markdown safety scorecard block
`--safety-scorecard-file`	Write markdown safety scorecard to a file
`--policy-report`	Write machine-readable policy/scanner report JSON

MCP Server (For Other AI Agents)

Safe Agent can be used as an MCP server, letting other AI agents delegate coding tasks safely.

# Start the MCP server
safe-agent-mcp

Claude Desktop Integration

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "safe-agent": {
      "command": "safe-agent-mcp"
    }
  }
}

Available MCP Tools

Tool	Description	Safety
`run_coding_task`	Execute a coding task with preview	🔴 Destructive
`preview_coding_task`	Preview changes without executing	🟢 Read-only
`get_agent_status`	Check agent status and capabilities	🟢 Read-only

Moltbook Integration

Safe Agent is available as a Moltbook skill for AI agent networks.

See moltbook-skill.json for the skill definition.

GitHub PR Risk Gate

This repo ships a production workflow and local composite action for PR gating:

Workflow: .github/workflows/safe-agent-pr-review.yml
Action: .github/actions/safe-agent-review/action.yml

The workflow runs on PRs (non-forks) and manual dispatch, then uploads:

safe-agent-summary.md (human-readable markdown summary)
safety-scorecard.md (risk/policy/scanner metrics for trust reviews)
policy-report.json (machine-readable report with rule IDs/outcomes)
safe-agent.log (full run log)

For AI Agents

If you're an AI agent wanting to use Safe Agent programmatically:

from safe_agent import SafeAgent

agent = SafeAgent(
    auto_approve_low_risk=True,      # Skip approval for low-risk changes
    dry_run=False,                   # Set True to preview only
    audit_export_path="audit.json",  # Export audit trail for compliance
    compliance_mode=False,           # Enable for EU AI Act compliance
)

result = await agent.run("add error handling to api.py")

For insurance and compliance use cases:

# EU AI Act compliant configuration
agent = SafeAgent(
    compliance_mode=True,              # Strict compliance mode
    audit_export_path="audit.json",    # Required for Article 12
    non_interactive=False,             # Human oversight required
)

Powered By

impact-preview - Impact analysis and diff generation
Claude - AI planning and code generation
Rich - Beautiful terminal output
MCP - Model Context Protocol for agent interoperability

Known Incidents

AI coding agents without proper safeguards have caused real damage. We document these incidents to raise awareness and demonstrate why preview-before-execute architecture matters.

Recent Incidents

Replit SaaStr Database Deletion (July 2025) - Production database deleted, 1,200+ executives affected
Cursor YOLO Mode Bypass (July 2025) - Security controls bypassed, arbitrary command execution possible

Submit an Incident

Experienced an AI agent incident? Help the community by submitting an incident report.

Browse all documented incidents in docs/incident-reports/.

License

MIT License - see LICENSE for details.

Built by developers who want AI agents they can actually trust.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

danwaterfield

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.4

Feb 20, 2026

This version

0.4.3

Feb 18, 2026

0.4.2

Feb 18, 2026

0.4.1

Feb 16, 2026

0.4.0

Feb 14, 2026

0.3.0

Feb 11, 2026

0.2.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safe_agent_cli-0.4.3.tar.gz (221.5 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

safe_agent_cli-0.4.3-py3-none-any.whl (24.8 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file safe_agent_cli-0.4.3.tar.gz.

File metadata

Download URL: safe_agent_cli-0.4.3.tar.gz
Upload date: Feb 18, 2026
Size: 221.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for safe_agent_cli-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`0acee9afdcdd21c296452bdbf5c76dbb4358001e54ba2396ea6fb99d356e10a7`
MD5	`597a38f3e119c8d4161d3b635b7c9c16`
BLAKE2b-256	`923de943efd9ae1bdff5c01578bbdbf15182e5868f93399bfacda9e5cb6ffce1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for safe_agent_cli-0.4.3.tar.gz:

Publisher: release.yml on agent-polis/safe-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: safe_agent_cli-0.4.3.tar.gz
- Subject digest: 0acee9afdcdd21c296452bdbf5c76dbb4358001e54ba2396ea6fb99d356e10a7
- Sigstore transparency entry: 962220927
- Sigstore integration time: Feb 18, 2026
Source repository:
- Permalink: agent-polis/safe-agent@b9e6d0f997d31cc08a478090a8860028ac3ff8c6
- Branch / Tag: refs/tags/v0.4.3
- Owner: https://github.com/agent-polis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b9e6d0f997d31cc08a478090a8860028ac3ff8c6
- Trigger Event: release

File details

Details for the file safe_agent_cli-0.4.3-py3-none-any.whl.

File metadata

Download URL: safe_agent_cli-0.4.3-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 24.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for safe_agent_cli-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eb9e692ae99a98d96189e756891c355d2cb5e44fb118b5e6c4b022d8d468f536`
MD5	`74cb5347c2dc0207c0739ecac3233724`
BLAKE2b-256	`cd60a6dde1a9a578556001fd6e957bb7872e14e189c12ea1d31d970ddd195873`

See more details on using hashes here.

Provenance

The following attestation bundles were made for safe_agent_cli-0.4.3-py3-none-any.whl:

Publisher: release.yml on agent-polis/safe-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: safe_agent_cli-0.4.3-py3-none-any.whl
- Subject digest: eb9e692ae99a98d96189e756891c355d2cb5e44fb118b5e6c4b022d8d468f536
- Sigstore transparency entry: 962220934
- Sigstore integration time: Feb 18, 2026
Source repository:
- Permalink: agent-polis/safe-agent@b9e6d0f997d31cc08a478090a8860028ac3ff8c6
- Branch / Tag: refs/tags/v0.4.3
- Owner: https://github.com/agent-polis
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b9e6d0f997d31cc08a478090a8860028ac3ff8c6
- Trigger Event: release

safe-agent-cli 0.4.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🛡️ Safe Agent

✨ New in v0.4.3

Project Map

The Problem

The Solution

Installation

Usage

Basic Usage

CI / Non-interactive mode

Adversarial Evaluation (Stage 3 trust signal)

Policy (allow/deny/require approval)

Interactive Mode

From File

How It Works

Enterprise & Compliance Features

Audit Export for Insurance

EU AI Act Compliance Mode

Incident Documentation

Options

MCP Server (For Other AI Agents)

Claude Desktop Integration

Available MCP Tools

Moltbook Integration

GitHub PR Risk Gate

For AI Agents

Powered By

Known Incidents

Recent Incidents

Submit an Incident

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance