A runtime governance layer that enforces hard behavioral bounds in autonomous agents.
Project description
๐ก๏ธ Agent-Harness
A Rigorous Engineering Kernel for Bounded Autonomous Agents
A deterministic runtime governance engine that enforces bounded, auditable execution for autonomous AI agents.
๐ Table of Contents
- Overview & Motivation ยท Determinism vs. Industry ยท How It's Useful
- Deployment & Quick Start ยท Architecture ยท System Behavior
- Core Components ยท Governance Checklist ยท Failure Semantics
- Integrations ยท Performance ยท Security & Threat Model
- Testing & Validation
โ What is Agent-Harness?
Agent-Harness is a deterministic governance runtime that acts as a non-bypassable Runtime Execution Firewall between an agent's reasoning loop and real-world execution surfaces. It translates environmental feedback (risk, effort, stagnation) into strict mathematical execution budgets. When an agent exceeds its bounds, the Harness forcibly terminates execution.
๐ Motivation / Problem Statement
Autonomous agents rely on "soft" boundaries (prompt engineering, RLHF). Under sustained failure or adversarial conditions, agents ignore these and enter unbounded loops. Agent-Harness replaces soft boundaries with a runtime execution firewall that dynamically tracks Effort, Exploration, and Risk. Once budgets deplete, permission to execute always collapses.
๐ Why Agent-Harness?
Most AI safety solutions are probabilistic (LLM validation) or stateless (regex). Agent-Harness introduces Stateful Determinismโenforcing physical runtime bounds that an agent cannot reason its way out of.
โ๏ธ With vs. Without Agent-Harness
| Feature / Risk | Without Agent-Harness | With Agent-Harness |
|---|---|---|
| Runaway Loops | Infinite token burn & cost | Deterministically halted via STAGNATION |
| Prompt Injection | Agent goal hijacking | Blocked via GuardrailStack |
| Malicious Code | Direct os.system execution |
Intercepted & session terminated |
| Cost Control | Manual monitoring | Hard-capped effort & risk budgets |
| Auditability | Opaque/Manual logs | SHA256 hash-chained JSONL traces |
| Safety Logic | Hardcoded in prompts (soft) | Enforced by external kernel (hard) |
๐ข How Agent-Harness Compares
| Capability | Agent-Harness | NeMo Guardrails | Llama Guard | Guardrails AI | LangChain Limits |
|---|---|---|---|---|---|
| Stops a runaway agent mid-loop | โ | โ | โ | โ | โ ๏ธ max_iterations only |
| Agent cannot bypass it | โ External sidecar | โ ๏ธ In-app | โ ๏ธ Wrapper | โ In-process | โ In-framework |
| Detects stagnation (busy โ productive) | โ Signal-based | โ | โ | โ | โ |
| Blocks dangerous tool calls before execution | โ Guardrail stack | โ ๏ธ Dialog-level | โ Text only | โ Validators | โ |
| Works without an LLM in the safety path | โ Pure math | โ Uses LLM | โ Is an LLM | โ ๏ธ Some regex | โ |
| Cryptographic audit trail | โ SHA256 chain | โ | โ | โ | โ |
| Multi-agent budget coordination | โ SharedBudgetPool | โ | โ | โ | โ |
| Same input โ same decision, always | โ Deterministic | โ Probabilistic | โ Probabilistic | โ ๏ธ Heuristic | โ Static |
| Works across any LLM vendor | โ | โ ๏ธ NVIDIA stack | โ ๏ธ Meta models | โ | โ ๏ธ LangChain only |
Key difference: Other systems ask "is this output safe?" โ Agent-Harness asks "should this agent still be allowed to act?"
๐ง How Agent-Harness Is Useful
1. Prevent Runaway Loops โ Agents frequently retry failing tools indefinitely. Agent-Harness detects stagnation (zero reward across steps) and halts execution deterministically via FailureType.STAGNATION.
2. Bound LLM API Costs โ Effort budgets drain faster when the agent fails to produce real environment progress, preventing token burn from "reasoning theater" (verbose monologue with zero action).
3. Secure Tool Execution โ Every tool call passes through GuardrailStack before execution, blocking dangerous code patterns (os.system(), exec(), PII leakage). The dangerous call never reaches the execution surface.
4. Multi-Agent Stability โ SharedBudgetPool and CascadeDetector enforce global budget limits across swarms, preventing cascading agent spawn failures.
5. Compliance Audit Trails โ SHA256 hash-chained JSONL logs provide tamper-evident, non-repudiable decision history for regulated environments (finance, healthcare, enterprise).
Agent-Harness uses deterministic budget math to guarantee bounded execution without LLM inference.
๐ Deployment & Installation
Local Engine Setup
Install the lightweight engine via pip to integrate governance into your Python logic:
# Install via pip
pip install agentharnessengine
# Or install from source
git clone https://github.com/Sarthaksahu777/Agent-Harness
cd Agent-Harness
pip install -e .
Docker Proxy (Production Ready)
Deploy a non-root, multi-stage optimized governance proxy in seconds. This is the recommended method for production environments to ensure non-bypassable enforcement:
# Start the Governance Proxy + Metrics Stack
docker-compose -f deployment/docker-compose.yml up -d
# Check health
curl http://localhost:8080/health
The proxy exposes port 8080 (Production) and 8081 (Dev/Hot-reload).
Quick Start (Engine Only)
A minimal, framework-free implementation of the engine checking a basic step progression:
from governance.kernel import GovernanceKernel
from governance.profiles import BALANCED
# 1. Initialize the Kernel with a standard behavioral profile
kernel = GovernanceKernel(profile=BALANCED)
# 2. Simulate environmental signals on an agent step
result = kernel.step(reward=0.6, novelty=0.1, urgency=0.0)
if result.halted:
print(f"TERMINATED: {result.failure}")
else:
print(f"ALLOWED. Remaining Effort: {result.budget.effort:.2f}")
๐๏ธ Architecture
Agent-Harness intercepts execution across three distinct, modular layers. It strictly separates the "Mind" (Agent Reasoning) from the "Body" (Tool Execution).
LEVEL 1: High-Level System Model
- Agent (Mind): The LLM controller generating tool inputs and plans. It never touches APIs directly.
- Agent-Harness (Governance Runtime): The non-bypassable firewall. It translates abstract environmental signals into concrete behavioral budgets and blocks non-compliant behavior.
- Execution Surface (Tools/APIs): The actual code functions, network requests, or database queries the agent wishes to execute.
LEVEL 2: Governance Pipeline
Every tool call goes through a deterministic processing flow before execution:
- Agent outputs a tool request.
- Proxy Enforcer intercepts the request and normalizes it.
- Guardrail Stack scans the payload for malicious logic or PII.
- Signal Evaluator continuously processes external feedback (reward, difficulty).
- Governance Kernel translates signals into deterministic behavioral budgets.
- Budget Decision fuses the state and issues a hard
ALLOWorHALT. - Execute / Halt: The tool is executed, or a 403-Forbidden block terminates the session.
LEVEL 3: Budget Decision Loop
The internal kernel state update algorithm ensures that no action is taken without a fresh budget calculation.
Example System Behavior
Here is how the Agent-Harness reacts to standard scenarios:
1. Successful Step (Allowed)
[KERNEL] Action: query_database (args: search="Q3 revenue")
[KERNEL] Signals: Reward=0.8, Novelty=0.1, Urgency=0.0
[KERNEL] Status: ALLOWED.
[KERNEL] Remaining Effort: 0.98. Executing tool...
2. Blocked Tool Call (Guardrail Triggered)
[KERNEL] Action: execute_python (args: code="os.system('cat /etc/passwd')")
[GUARDRAIL] Triggered: code_execution
[GUARDRAIL] Reason: Potentially dangerous code pattern detected (matched: \bos\.system\s*\()
[KERNEL] Status: HALTED.
[KERNEL] Result: FailureType.SAFETY (Guardrail violation). Block 403.
3. Halted Agent (Budget Exhausted / Stagnation)
[KERNEL] Action: search_web (args: query="retry 104")
[KERNEL] Signals: Reward=0.0, Difficulty=1.0, Urgency=0.4
[KERNEL] โ ๏ธ Effort depleted to 0.15.
[KERNEL] Status: HALTED.
[KERNEL] Result: FailureType.EXHAUSTION (effort_exhausted). Session terminated.
๐ฆ Core Components
Based on the src/governance/ module structure, the repository is composed of several critical files:
kernel.py(GovernanceKernel): The central state machine. Deterministic orchestrator that receives signals, evaluates budgets, and enforces terminal halt conditions based on progress accumulation.guardrails.py(GuardrailStack): Pluggable security detectors that intercept and block dangerous payloads.audit.py(HashChainedAuditLogger): Records a cryptographically immutable JSONL log of every system decision.evaluation.py(SignalEvaluator): Normalizes external semantic signals before integrating them into the core control state.profiles.py&behavior.py: Defines agent temperament patterns (e.g.,BALANCED,CONSERVATIVE).
โ The 15-Point AI Governance Checklist
Agent-Harness natively implements the 15-Point AI Governance Checklist via deterministic runtime execution mechanisms:
| Governance Rule | Enforcement Mechanism |
|---|---|
| 1. Unbounded Behavior (Prevent infinite loops) | Finite effort and persistence budgets deplete with action. Zero budget = Terminal Halt. |
| 2. Runtime Control (Intervene during execution) | Dynamic step() evaluates signals at runtime (Hz) and updates control state immediately. |
| 3. Deterministic Behavior (Same inputs โ Same decision) | State transitions use hardcoded, versioned matrices. No random seeds exist in the Kernel. |
| 4. Explainable Halting (Explicit halt reasons) | Halts return explicit FailureType flags (e.g., OVERRISK, STAGNATION) and precise string reasons. |
| 5. Fail-Closed Semantics (Default to block) | Once halted, state freezes permanently. Proxy middleware returns 403 on any error. |
| 6. Physical Enforcement (Physically block actions) | ProxyEnforcer intercepts all tool calls at the network level, forcing halts. |
| 7. Auditability (Log every decision) | Hash-chained SHA256-linked entries in append-only JSONL files. CLI verification utility provided. |
| 8. Accountability (Who authorized this?) | Multi-agent coordination mechanisms track agent_id and parentage for every logged action. |
| 9. Risk Containment (Bound risky actions) | Dedicated risk budget accumulator. Hard caps trigger an immediate terminal halt. |
| 10. Progress Discrimination (Busy โ Productive) | Stagnation detection windows identify and halt cycles of low-reward, busy-work activity. |
| 11. Bad Telemetry Resilience (If sensors lie, slow down) | trust signal dampens positive feedback (reward) but correctly passes negative feedback (difficulty). |
| 12. Model-Agnosticism (Work across vendors) | The Kernel consumes dimensional Signals (floats), completely independent of the LLM or embeddings used. |
| 13. Human Override (Humans remain authority) | The reset() method requires explicit calls. There is no autonomous self-healing from terminal failures. |
| 14. Compliance Readiness (Support reporting) | Hash-chained JSONL trace exports. Prometheus metrics natively broadcast at /metrics. |
| 15. Scalability (Scale across multiple agents) | SystemGovernor and SharedBudgetPool manage swarms globally to prevent cascading swarm failures. |
๐ Budget Dynamics
Behavioral budgets (effort, persistence) are strictly bounded and monotonically depleting. Under sustained failure, the budget inevitably crosses the exhaustion threshold, forcing a terminal halt.
๐ Failure Semantics
| Failure | Trigger | Consequence |
|---|---|---|
| EXHAUSTION | Effort drops below threshold |
Terminal halt |
| STAGNATION | Zero reward beyond stagnation window | Terminal halt |
| OVERRISK | Risk exceeds maximum limit |
Immediate 403 |
| SAFETY | Exploration capacity exceeded |
Session severed |
| EXTERNAL | Hard step fuse limit reached | Instant halt |
๐ Failure Mode Progression
State transitions: Healthy โ Warning (low ROI) โ Critical (near boundary) โ Terminal Halt (EXHAUSTION, STAGNATION, OVERRISK). A 403 permanently ends the session.
๐ Integrations
The Harness detects native abstractions across common LLM agent frameworks and enforces boundaries transparently. Working wrapper examples are located in integrations/:
- LangChain: Intercepts
AgentExecutorloops (langchain_ollama.py,langchain_openai.py). - CrewAI: Limits task iterations and monitors role boundaries (
crewai_ollama.py,crewai_openai.py). - AutoGen: Hooks into
UserProxyAgentconversations (autogen_ollama.py,autogen_openai.py). - OpenAI SDK: Provides direct function-calling guardrails (
openai_sdk.py,openai_sdk_ollama.py).
๐ Project Structure
Agent-Harness/
โโโ src/
โ โโโ governance/ # Core kernel, budgets, evaluators, mechanisms
โโโ problems/ # S-class problem sets and benchmarks
โโโ integrations/ # Connectors for LangChain, AutoGen, CrewAI, OpenAI
โโโ examples/ # Demonstrations of layered governance and edge cases
โโโ tests/ # PyTest suites verifying mathematical limits
โโโ docs/ # Deep dives and visual assets
โโโ deployment/ # Docker, Compose, and Dashboards
โโโ scripts/ # Utility and maintenance scripts
โโโ config/ # YAML configuration bindings
โก Performance & Efficiency
Agent-Harness is designed for high-frequency runtime interception with near-zero overhead. Values below are from a real benchmark of 10,000 kernel.step() calls on commodity hardware.
| Metric | Value | Notes |
|---|---|---|
| Step Latency (Median) | ~0.06ms | Full kernel evaluation (signal processing + budget update). |
| Step Latency (P99) | ~0.23ms | Worst-case tail latency under sustained load. |
| Guardrail Check (Median) | ~0.07ms | Full GuardrailStack.check_all() with 3 detectors. |
| Throughput | ~13k step()/sec | Single-threaded on standard hardware. |
| Memory (Kernel Delta) | ~2 KB | Kernel + internal state after 100 steps (excludes Python runtime). |
๐ก๏ธ Anti-Gaming & Robustness
Native anti-gaming logic in extractor.py:
- Yapping Detection: Decays trust when agent produces high internal activity with zero environment change.
- Fake Success Anchoring: Rejects agent-claimed success if
env_state_deltahasn't moved. - Novelty Debt: Prevents budget resets via "new but useless" actions.
- State Cycling Detection: Environment hashing penalizes oscillating behaviors.
๐ Observability and Logging
# Live terminal dashboard
python -m governance.visualizer
# Prometheus metrics export
curl http://localhost:8080/metrics
# Verify audit chain integrity
python scripts/replay_audit.py verify audit_chain.jsonl
๐ Security Model
- Fail-Closed: Exceeding any budget translates into
FailureTypeflags and physical session termination. No auto-recoveryโhumans must explicitlyreset(). - Pattern Isolation:
GuardrailStackcatches malicious patterns via regex without LLM inference. - Read-Only Access: External systems cannot mutate
ControlStateexcept through deterministicevaluatorsignals.
๐ก๏ธ Threat Model
Agent-Harness enforces bounded execution. It is designed with a specific threat landscape in mind.
What it PROTECTS against:
- Runaway Loops: Agents getting stuck hallucinating the same failing tool calls indefinitely.
- Prompt Isolation Breakdown: Adversarial inputs (Prompt Injections) overriding the system prompt and hijacking the agent's goals.
- Unauthorized OS Probing: LLMs attempting to execute arbitrary code (e.g.,
os.system(),exec()) when they shouldn't. - Unbounded Costs: Infinite loops draining LLM API token budgets.
- Data Exfiltration (Basic): Pattern-based PII leakage in tool outputs or inputs.
What it DOES NOT protect against:
- Semantic Hallucinations: If the underlying evaluator feeds "fake" rewards to the Harness, the Harness cannot know the progress is hallucinated.
- Sophisticated Obfuscation: The current guardrails use regex/heuristics and may miss highly encoded or obfuscated attacks.
- Host System Compromise: Agent-Harness bounds the agent's logic, but sandbox isolation (like Docker) is still required to secure the host machine.
๐งช Testing & Validation Rigor
Agent-Harness is validated against extreme operational conditions to ensure zero-bypass governance:
- Event Horizon Benchmarks: 15 internal problem sets mapping complex semantic failures (from the Black-Hole problem set) to deterministic kernel signal patterns (RL Generalization, Distributed Consensus, etc.).
- Monte Carlo Stress Tests: A high-intensity suite (
monte_carlo_stress.py) that executes 12,000 randomized trajectories per run to verify that governance invariants hold under chaotic conditions. - Adversarial Pattern Suite: Unit tests verifying the
GuardrailStackagainst known prompt injection, PII leakage, and dangerous code patterns (regex-based detection). - Multi-Agent Swarm Simulations: Validated coordination of swarm-level constraints via
SharedBudgetPoolandCascadeDetectortests. - Policy Enforcement Scaling: Continuous stability validation across 1,000+ sequential policy evaluations to ensure zero memory leaks or state drift.
โ ๏ธ Limitations
- Heuristic Reliance: The built-in
PromptInjectionDetector,PIIDetector, andCodeExecutionGuarduse hardcoded regex strategies. They may raise false positives on highly contextual payloads or miss deeply obfuscated attacks. - Signal Definition: The engine trusts the semantic signals (
reward,novelty) provided by the broader orchestrator. If the orchestrator feeds "fake" rewards, the Engine cannot know it is hallucinating. - Coarse Reset Handling: Because halting is permanent for safety, workflows hitting false-positive stagnation must manage their own session checkpoints (
kernel.reset()) conservatively.
๐บ๏ธ Future Projections & Research
The Agent-Harness roadmap focuses on broadening the scope of autonomous safety. Key research directions include:
- OS-Level Isolation: Exploring eBPF and Kernel-level enforcement to move security boundaries outside the application runtime.
- Mechanical Signal Extraction: Researching reasoning-free telemetry to eliminate evaluation bias.
- Policy & Swarms: Enhancing multi-agent coordination and decentralized budget pooling.
- LLM-assisted Semantic Guardrails: Moving beyond regex into fast, quantized models evaluating signal context natively.
- Dynamic Threshold Selection: Automatically pivoting between
CONSERVATIVEandAGGRESSIVEprofiles based on historical session hashes.
For a full list of documentation and research papers, see the Master Index.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentharnessengine-1.2.0.tar.gz.
File metadata
- Download URL: agentharnessengine-1.2.0.tar.gz
- Upload date:
- Size: 657.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
408f9390859a185d54c98ed50ec8a6941a5623dbabad66e61ee24031bc8c1c6b
|
|
| MD5 |
50c326cdeb8694c063faf7e81b4cd006
|
|
| BLAKE2b-256 |
f6a3c817a024c3ff3c85acad3f9ae51372a9a9604d432fe1facc9cb97d4f97aa
|
Provenance
The following attestation bundles were made for agentharnessengine-1.2.0.tar.gz:
Publisher:
publish.yml on Sarthaksahu777/Agent-Harness
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentharnessengine-1.2.0.tar.gz -
Subject digest:
408f9390859a185d54c98ed50ec8a6941a5623dbabad66e61ee24031bc8c1c6b - Sigstore transparency entry: 1101586438
- Sigstore integration time:
-
Permalink:
Sarthaksahu777/Agent-Harness@c865a748d4e89303a96a9861fe058046f3ee5a3b -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/Sarthaksahu777
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c865a748d4e89303a96a9861fe058046f3ee5a3b -
Trigger Event:
release
-
Statement type:
File details
Details for the file agentharnessengine-1.2.0-py3-none-any.whl.
File metadata
- Download URL: agentharnessengine-1.2.0-py3-none-any.whl
- Upload date:
- Size: 75.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79d8c50323a1cdbfaf2046446cbe57c9523ad213341ba6b923368f0e2881e829
|
|
| MD5 |
4f755a17ffe689414fcbe7c801d99065
|
|
| BLAKE2b-256 |
3544db86fa9b900a7b82b3bd00134117ff5a98a0d67e9944fa5fdb6165faf43e
|
Provenance
The following attestation bundles were made for agentharnessengine-1.2.0-py3-none-any.whl:
Publisher:
publish.yml on Sarthaksahu777/Agent-Harness
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentharnessengine-1.2.0-py3-none-any.whl -
Subject digest:
79d8c50323a1cdbfaf2046446cbe57c9523ad213341ba6b923368f0e2881e829 - Sigstore transparency entry: 1101586443
- Sigstore integration time:
-
Permalink:
Sarthaksahu777/Agent-Harness@c865a748d4e89303a96a9861fe058046f3ee5a3b -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/Sarthaksahu777
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c865a748d4e89303a96a9861fe058046f3ee5a3b -
Trigger Event:
release
-
Statement type: