Safety control credential isolation for AI agents. Implements ACRF-10 defense pattern.
Project description
acrf-safety-shield
Safety control credential isolation for AI agents. Implements the ACRF-10 (Safety Controls Not Self-Protecting) defense pattern.
Part of the ACRF framework: https://github.com/kannasekar-alt/ACRF PyPI: https://pypi.org/project/acrf-safety-shield/ Presented at RSA Conference 2026.
Try it in your environment right now
No Docker. No setup. Just Python 3.10+.
Step 1 - Install:
pip install acrf-safety-shield
Step 2 - Generate an admin credential (one-time, by ops/security team):
from acrf_safety_shield import AdminCredential
admin = AdminCredential.generate(admin_name="security-team")
admin.save_private("admin_private.json") # KEEP OFFLINE
admin.public_card().save_to("admin_public.json") # safe to load into shield
Step 3 - Build the shield and trust the admin:
from acrf_safety_shield import SafetyShield, PublicAdminCard
shield = SafetyShield()
shield.trust_admin(PublicAdminCard.load("admin_public.json"))
Step 4 - Modify safety state - admin only:
shield.set_guardrail("max_trade_amount", 1000, signer=admin)
shield.set_guardrail("require_human_approval", True, signer=admin)
Step 5 - Agents can read but never modify:
from acrf_safety_shield import AgentCredential, PrivilegeError
agent = AgentCredential(agent_name="PricingAgent", token="agt_xxx")
shield.get_guardrail("max_trade_amount", agent) # OK -> 1000
try:
shield.set_guardrail("max_trade_amount", 999999, signer=agent)
except PrivilegeError:
# Agent cannot escalate. Only admin credentials can change safety state.
...
A compromised agent token is now incapable of disabling the very controls meant to constrain that agent.
The problem this solves
In most agent platforms, the same credential that authorizes routine agent operations also authorizes changes to safety controls.
If an attacker compromises that credential, they can call:
shield.disable("require_human_approval")
and the protection is gone.
This is ACRF-10: safety controls not self-protecting. Reference: CVE-2026-25253. The control plane and the data plane share a credential.
acrf-safety-shield enforces a hard wall:
Agent credentials -> can READ safety state
Admin credentials -> can READ and MODIFY safety state
Agents -> CANNOT escalate to admin
The admin credential is an Ed25519 keypair stored offline. The shield only ever holds the public card. Modifying safety state requires an admin to sign the change with the private key.
Security features built in
Two distinct credential types AdminCredential is an Ed25519 keypair. AgentCredential is a regular agent token. The shield enforces the type distinction in code: a non-Admin signer raises PrivilegeError.
Hard wall enforcement set_guardrail() and delete_guardrail() check isinstance(signer, AdminCredential). get_guardrail() and list_guardrails() accept either. There is no API path where an agent can mutate state.
Trust set The shield only accepts admin credentials whose public card has been trusted via trust_admin(). An attacker who generates a fresh AdminCredential gets UnknownAdminError.
Two-person rule (optional) Mark high-risk keys via declare_high_risk(key) and set required_approvals > 1. Changes to those keys are staged as pending and require additional admin signatures before applying. The initiating admin cannot self-approve.
Tamper-evident audit log Every operation - read, write, denied, pending, applied - produces an audit entry with timestamp, actor type, actor name, key, and result.
Persistence The shield can be saved to a JSON file and reloaded. Trusted admin cards and high-risk keys persist; pending changes do not (they live only within a process to prevent stale approvals).
CLI
Set the shield state path once:
export ACRF_SAFETY_SHIELD=/etc/acrf/safety_shield.json
Initialize the shield:
acrf-safety-shield init
Generate an admin credential (do this on a secure offline machine):
acrf-safety-shield generate-admin security-team
Trust the admin:
acrf-safety-shield trust-admin security-team_admin_public.json
Set a guardrail (admin only):
acrf-safety-shield set max_trade_amount 1000 \
--admin security-team_admin_private.json
Read a guardrail (any actor):
acrf-safety-shield get max_trade_amount
List all guardrails:
acrf-safety-shield list
Show the audit log:
acrf-safety-shield audit
Try to use an agent credential (default --actor agent) and you will see that read works, while set/delete simply do not exist for agents.
How it works
- Ops generates an AdminCredential offline (Ed25519 keypair)
- The public card is loaded into the SafetyShield via trust_admin()
- The private credential is stored in HSM, paper backup, or hardware token
- To change safety state, an admin signs the change payload with the private key
- The shield verifies the signature using the trusted public card
- If the key is high-risk and required_approvals > 1, the change is staged
- Other admins approve via approve_pending() until the threshold is met
- The change is applied; an audit entry is written
- Agents call get_guardrail() to read state; modification calls fail with PrivilegeError
Two-person rule example
shield.declare_high_risk("kill_switch")
shield.set_required_approvals(2)
# Admin 1 initiates the change
change_id = shield.set_guardrail("kill_switch", "armed", signer=admin1)
# change_id is non-None - the change is pending
# Admin 1 self-approval is rejected
applied = shield.approve_pending(change_id, signer=admin1) # False
# Admin 2 approves - applied
applied = shield.approve_pending(change_id, signer=admin2) # True
shield.get_guardrail("kill_switch", admin1) # "armed"
The pattern matches how production systems already handle critical infrastructure changes.
What the admin private key needs
Treat the admin private key like a code-signing key:
- Hardware Security Module (HSM)
- Hardware token (Yubikey, etc.)
- Paper backup in a safe
- Multi-party computation (split key shares)
What NOT to do:
- Commit it to source control
- Bake it into a Docker image
- Mount it on the same host as the agents
- Hand it to a deployment automation account
If the admin private key is on the same host as the agents, you do not have ACRF-10 protection. The whole point is that an agent compromise cannot reach the admin private key.
Real-world use
Wrap your agent action handler:
from acrf_safety_shield import (
SafetyShield,
AgentCredential,
PrivilegeError,
)
import os
SHIELD = SafetyShield.load(os.environ["ACRF_SAFETY_SHIELD"])
def handle_trade(request, agent_token):
agent = AgentCredential(agent_name=request.agent_name, token=agent_token)
max_amount = SHIELD.get_guardrail("max_trade_amount", agent)
if request.amount > max_amount:
return {"error": "exceeds max_trade_amount"}, 403
return execute_trade(request)
def admin_change_max_trade(new_value, admin_credential):
# only this code path is reachable with an AdminCredential object
SHIELD.set_guardrail("max_trade_amount", new_value, signer=admin_credential)
SHIELD.save(os.environ["ACRF_SAFETY_SHIELD"])
If a compromised agent token reaches handle_trade, the worst it can do is exceed the existing max_trade_amount, which is rejected. The agent cannot raise the cap. Only admin_change_max_trade can do that, and that function is only callable from within the offline admin tooling.
ACRF-10 control objectives addressed
SP-1 Agents operate with minimum necessary permissions
SP-2 Safety controls require a separate admin credential, not the agent token
SP-3 All safety control changes go through approval and audit trail
What this library does NOT do
- It does not store or distribute admin private keys (use an HSM/Vault)
- It does not enforce kernel-level isolation (use OS sandboxing too)
- It does not protect against an attacker who already has the admin private key
- It does not replace per-agent token validation (use acrf-tokens or similar)
It only ensures that, given a properly isolated admin private key, agent compromise cannot disable or weaken safety controls. That is the ACRF-10 defense pattern.
Works with any Python AI agent framework
LangChain, CrewAI, AutoGen, MCP-based systems, custom agents. Anywhere your agents enforce safety controls in production, you can use this library.
Authors
Ravi Karthick Sankara Narayanan, Kanna Sekar
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file acrf_safety_shield-0.1.0.tar.gz.
File metadata
- Download URL: acrf_safety_shield-0.1.0.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84713d70299a79215d22cc774e9d766e8eb3ed923e56050b7e6d44e30e70a77d
|
|
| MD5 |
7318154e637c16c4863d2b12bb7802d1
|
|
| BLAKE2b-256 |
fdf71ba6bbfcc8458f194d2acd1464c26d1d7156cb7c032a45f421cf45cf7758
|
File details
Details for the file acrf_safety_shield-0.1.0-py3-none-any.whl.
File metadata
- Download URL: acrf_safety_shield-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
874e9e04f8a0488c4157e25ba8697850436ec80873cd30037a1d34287288f71c
|
|
| MD5 |
f9a621edb4024759eef70ea1bb30c460
|
|
| BLAKE2b-256 |
2dce36278b7e6b41c69545f55eaacf2167afd16a8fd3f4d0f7e0f25493acbedf
|