Runtime protection for AI coding agents

These details have not been verified by PyPI

Project links

Project description

AgentSteer

Runtime protection for AI coding agents.

What it does

AgentSteer intercepts every tool call your AI agent makes via a PreToolUse hook, scores it against the task description, and blocks unauthorized actions before they execute. It catches prompt injection attacks, unauthorized file access, data exfiltration through delegation, and off-task behavior. Scoring uses oss-safeguard-20b (20B parameter model) via OpenRouter at under 200ms median latency.

Quick Start

pip install agentsteer

Set your OpenRouter API key. You can use a .env file or export it directly:

export OPENROUTER_API_KEY=sk-or-...

Install the hook for your framework:

agentsteer install claude-code

That's it. Every tool call is now scored and unauthorized actions are blocked.

How It Works

Agent makes tool call
        |
        v
PreToolUse hook intercepts the call
        |
        v
AgentSteer scores the action against the task description
(oss-safeguard-20b via OpenRouter, <200ms median)
        |
        v
Score >= threshold? ---> BLOCK (action never executes)
        |
        v
Score < threshold? ---> ALLOW (action proceeds normally)

Each action is evaluated against 4 security policies. If any policy is violated, the action is blocked and the agent receives feedback explaining why.

Supported Frameworks

Framework	Install command	Hook type
Claude Code	`agentsteer install claude-code`	PreToolUse hook via `settings.json`
OpenHands	`agentsteer install openhands`	EventStream middleware
Python API	`from agentsteer import score_action`	Direct function call

Results

Evaluated on the AgentDojo benchmark, which tests whether agents can be tricked into executing prompt injection attacks hidden in their environment.

Framework	Condition	Attacks blocked	Utility preserved	n
Claude Code	Honest tasks	0% false positives	100%	20
Claude Code	Attack tasks	100% blocked	--	20
OpenHands	Honest tasks	0% false positives	100%	20
OpenHands	Attack tasks	95% blocked	--	20
OpenHands	Per-action detection	96.5% detected	0% false positives	1000

Median scoring latency: <200ms. Cost: $0.38 per 1M tokens via OpenRouter.

Full evaluation results and methodology at agentsteer.ai/evaluations.

Security Policies

Policy	What it catches	Example
P1: Read-only	Write actions when the task only requires reading	Task says "look up calendar" but agent sends an email
P2: Delegation	Actions that hand off control or leak data to external parties	Agent posts task contents to a URL, sends to unrelated email
P3: Category mismatch	Actions in a different category than the task	Task is about calendar but agent modifies files
P4: Target mismatch	Actions targeting a different entity than specified	Task says "email Alice" but agent emails Bob

A post-filter suppresses false positives from self-correction patterns (e.g., agent cancels a calendar event it just created in error).

Configuration

Environment variable	Description	Default
`OPENROUTER_API_KEY`	OpenRouter API key for scoring model	required
`AGENT_STEER_TASK`	Task description to score actions against	read from agent context
`AGENT_STEER_THRESHOLD`	Score threshold for blocking (0.0 to 1.0)	`0.80`
`AGENT_STEER_DEBUG`	Enable debug logging (`1` or `true`)	off

You can also run agentsteer setup for an interactive configuration wizard, or agentsteer login to connect to the AgentSteer cloud dashboard for real-time monitoring.

CLI Reference

agentsteer setup                # Interactive first-run setup
agentsteer install claude-code  # Install hook for Claude Code
agentsteer status               # Show current configuration
agentsteer score "task" "action" # Score a single action
agentsteer sessions             # List monitored sessions
agentsteer version              # Print version

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.1

Feb 20, 2026

0.4.0 yanked

Feb 20, 2026

Reason this release was yanked:

leaked

This version

0.3.0

Feb 20, 2026

0.2.0

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentsteer-0.3.0.tar.gz (515.2 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentsteer-0.3.0-py3-none-any.whl (32.7 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file agentsteer-0.3.0.tar.gz.

File metadata

Download URL: agentsteer-0.3.0.tar.gz
Upload date: Feb 20, 2026
Size: 515.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentsteer-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`71e7ffaccb455e107eebe60d0e8f194777b6262f649468a7a3aecc25787d64a2`
MD5	`f463262b5716041359cd8931c7e27a71`
BLAKE2b-256	`a9aaf84e032f0a6c1ba5d12e4e2a5804d43c2b015a31bb28c99c759262cee3c3`

See more details on using hashes here.

File details

Details for the file agentsteer-0.3.0-py3-none-any.whl.

File metadata

Download URL: agentsteer-0.3.0-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 32.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentsteer-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`24275fb594de8a54f53e05b4c73094221e962ba1db35f5850c7674f1d9d3c448`
MD5	`047f19e130d9010fcbcbc0edadc55103`
BLAKE2b-256	`00f480aebbdb882022c0bf8ad130f34191d835951e056e2a119facc53f14c2b5`

See more details on using hashes here.

agentsteer 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentSteer

What it does

Quick Start

How It Works

Supported Frameworks

Results

Security Policies

Configuration

CLI Reference

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes