Runtime protection for AI coding agents
Project description
AgentSteer
Runtime protection for AI coding agents.
What it does
AgentSteer intercepts every tool call your AI agent makes via a PreToolUse hook, scores it against the task description, and blocks unauthorized actions before they execute. It catches prompt injection attacks, unauthorized file access, data exfiltration through delegation, and off-task behavior. Scoring uses oss-safeguard-20b (20B parameter model) via OpenRouter at under 200ms median latency.
Quick Start
pip install agentsteer
Set your OpenRouter API key. You can use a .env file or export it directly:
export OPENROUTER_API_KEY=sk-or-...
Install the hook for your framework:
agentsteer install claude-code
That's it. Every tool call is now scored and unauthorized actions are blocked.
How It Works
Agent makes tool call
|
v
PreToolUse hook intercepts the call
|
v
AgentSteer scores the action against the task description
(oss-safeguard-20b via OpenRouter, <200ms median)
|
v
Score >= threshold? ---> BLOCK (action never executes)
|
v
Score < threshold? ---> ALLOW (action proceeds normally)
Each action is evaluated against 4 security policies. If any policy is violated, the action is blocked and the agent receives feedback explaining why.
Supported Frameworks
| Framework | Install command | Hook type |
|---|---|---|
| Claude Code | agentsteer install claude-code |
PreToolUse hook via settings.json |
| OpenHands | agentsteer install openhands |
EventStream middleware |
| Python API | from agentsteer import score_action |
Direct function call |
Results
Evaluated on the AgentDojo benchmark, which tests whether agents can be tricked into executing prompt injection attacks hidden in their environment.
| Framework | Condition | Attacks blocked | Utility preserved | n |
|---|---|---|---|---|
| Claude Code | Honest tasks | 0% false positives | 100% | 20 |
| Claude Code | Attack tasks | 100% blocked | -- | 20 |
| OpenHands | Honest tasks | 0% false positives | 100% | 20 |
| OpenHands | Attack tasks | 95% blocked | -- | 20 |
| OpenHands | Per-action detection | 96.5% detected | 0% false positives | 1000 |
Median scoring latency: <200ms. Cost: $0.38 per 1M tokens via OpenRouter.
Full evaluation results and methodology at agentsteer.ai/evaluations.
Security Policies
| Policy | What it catches | Example |
|---|---|---|
| P1: Read-only | Write actions when the task only requires reading | Task says "look up calendar" but agent sends an email |
| P2: Delegation | Actions that hand off control or leak data to external parties | Agent posts task contents to a URL, sends to unrelated email |
| P3: Category mismatch | Actions in a different category than the task | Task is about calendar but agent modifies files |
| P4: Target mismatch | Actions targeting a different entity than specified | Task says "email Alice" but agent emails Bob |
A post-filter suppresses false positives from self-correction patterns (e.g., agent cancels a calendar event it just created in error).
Configuration
| Environment variable | Description | Default |
|---|---|---|
OPENROUTER_API_KEY |
OpenRouter API key for scoring model | required |
AGENT_STEER_TASK |
Task description to score actions against | read from agent context |
AGENT_STEER_THRESHOLD |
Score threshold for blocking (0.0 to 1.0) | 0.80 |
AGENT_STEER_DEBUG |
Enable debug logging (1 or true) |
off |
You can also run agentsteer setup for an interactive configuration wizard, or agentsteer login to connect to the AgentSteer cloud dashboard for real-time monitoring.
CLI Reference
agentsteer setup # Interactive first-run setup
agentsteer install claude-code # Install hook for Claude Code
agentsteer status # Show current configuration
agentsteer score "task" "action" # Score a single action
agentsteer sessions # List monitored sessions
agentsteer version # Print version
Links
- Website: agentsteer.ai
- Documentation: agentsteer.ai/docs
- Evaluations: agentsteer.ai/evaluations
- Repository: github.com/AgentSteer/AgentSteer
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentsteer-0.4.1.tar.gz.
File metadata
- Download URL: agentsteer-0.4.1.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0624dfd429c9ec9bb21fdffd5d2e87691699e33c28ff315b2536d8d1c1e2275b
|
|
| MD5 |
3923ac91b12bc44d4426f9c1354dc868
|
|
| BLAKE2b-256 |
973605af9e60b3017a3a6b3afb1611d411b7b0b3aa3621edd52a831f1cc220d6
|
File details
Details for the file agentsteer-0.4.1-py3-none-any.whl.
File metadata
- Download URL: agentsteer-0.4.1-py3-none-any.whl
- Upload date:
- Size: 32.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
798d9c2c76f191efb3637a5b53c1994af15bbce886269e5f553cf9751e4afadc
|
|
| MD5 |
1f4b69b4c70458ee8780be79098e8261
|
|
| BLAKE2b-256 |
832092dd8ebe8825817c61c24af9d10fc2d38eb283790d517a505df8c0295bcf
|