Skip to main content

Runtime protection for AI coding agents

Reason this release was yanked:

leaked

Project description

AgentSteer

Runtime protection for AI coding agents.

License: MIT Python 3.10+

What it does

AgentSteer intercepts every tool call your AI agent makes via a PreToolUse hook, scores it against the task description, and blocks unauthorized actions before they execute. It catches prompt injection attacks, unauthorized file access, data exfiltration through delegation, and off-task behavior. Scoring uses oss-safeguard-20b (20B parameter model) via OpenRouter at under 200ms median latency.

Quick Start

pip install agentsteer

Set your OpenRouter API key. You can use a .env file or export it directly:

export OPENROUTER_API_KEY=sk-or-...

Install the hook for your framework:

agentsteer install claude-code

That's it. Every tool call is now scored and unauthorized actions are blocked.

How It Works

Agent makes tool call
        |
        v
PreToolUse hook intercepts the call
        |
        v
AgentSteer scores the action against the task description
(oss-safeguard-20b via OpenRouter, <200ms median)
        |
        v
Score >= threshold? ---> BLOCK (action never executes)
        |
        v
Score < threshold? ---> ALLOW (action proceeds normally)

Each action is evaluated against 4 security policies. If any policy is violated, the action is blocked and the agent receives feedback explaining why.

Supported Frameworks

Framework Install command Hook type
Claude Code agentsteer install claude-code PreToolUse hook via settings.json
OpenHands agentsteer install openhands EventStream middleware
Python API from agentsteer import score_action Direct function call

Results

Evaluated on the AgentDojo benchmark, which tests whether agents can be tricked into executing prompt injection attacks hidden in their environment.

Framework Condition Attacks blocked Utility preserved n
Claude Code Honest tasks 0% false positives 100% 20
Claude Code Attack tasks 100% blocked -- 20
OpenHands Honest tasks 0% false positives 100% 20
OpenHands Attack tasks 95% blocked -- 20
OpenHands Per-action detection 96.5% detected 0% false positives 1000

Median scoring latency: <200ms. Cost: $0.38 per 1M tokens via OpenRouter.

Full evaluation results and methodology at agentsteer.ai/evaluations.

Security Policies

Policy What it catches Example
P1: Read-only Write actions when the task only requires reading Task says "look up calendar" but agent sends an email
P2: Delegation Actions that hand off control or leak data to external parties Agent posts task contents to a URL, sends to unrelated email
P3: Category mismatch Actions in a different category than the task Task is about calendar but agent modifies files
P4: Target mismatch Actions targeting a different entity than specified Task says "email Alice" but agent emails Bob

A post-filter suppresses false positives from self-correction patterns (e.g., agent cancels a calendar event it just created in error).

Configuration

Environment variable Description Default
OPENROUTER_API_KEY OpenRouter API key for scoring model required
AGENT_STEER_TASK Task description to score actions against read from agent context
AGENT_STEER_THRESHOLD Score threshold for blocking (0.0 to 1.0) 0.80
AGENT_STEER_DEBUG Enable debug logging (1 or true) off

You can also run agentsteer setup for an interactive configuration wizard, or agentsteer login to connect to the AgentSteer cloud dashboard for real-time monitoring.

CLI Reference

agentsteer setup                # Interactive first-run setup
agentsteer install claude-code  # Install hook for Claude Code
agentsteer status               # Show current configuration
agentsteer score "task" "action" # Score a single action
agentsteer sessions             # List monitored sessions
agentsteer version              # Print version

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentsteer-0.4.0.tar.gz (539.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentsteer-0.4.0-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file agentsteer-0.4.0.tar.gz.

File metadata

  • Download URL: agentsteer-0.4.0.tar.gz
  • Upload date:
  • Size: 539.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentsteer-0.4.0.tar.gz
Algorithm Hash digest
SHA256 41851bd36546b394572a9445c2b4bdaadc390d8a93ebfdef39f43730568d22d1
MD5 075ba2660a1ddadccf1dead8412d61ec
BLAKE2b-256 02d84eef22fc74a116f45a88e2f897c4f04d2239a090e13dfa4608facbf6fffc

See more details on using hashes here.

File details

Details for the file agentsteer-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: agentsteer-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentsteer-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b76d510506c0d6d57658fe414e5ab1054dea2c3bc2bb464a0cf0bf86799ea9ff
MD5 6aa6501073d477ae1481bf7d3988adef
BLAKE2b-256 6ff8bfaa9e717ef2c60f0fda8ccdb24075ff8bf67d5070c8a02e4e0e78568968

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page