Layer 0 of agent reliability — the framework-agnostic harness that makes any AI agent stable.
Project description
Not another agent framework.
A reliability layer that makes any agent stable.
Quick Start · 5-Layer Model · Individual Layers · User Manual · Architecture
Agent = Loop(Model + Harness)
You provide the Model. harness0 provides the Harness — the engineering infrastructure that makes the model work reliably in production.
┌─────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────┤
│ Orchestration (pick one) │
│ LangChain / CrewAI / PydanticAI / DIY │
├─────────────────────────────────────────┤
│ ★ harness0 — reliability layer │ ← Layer 0
│ Context · Tools · Security │
│ Feedback · Entropy │
├─────────────────────────────────────────┤
│ LLM API │
│ OpenAI / Anthropic / DeepSeek / Local │
└─────────────────────────────────────────┘
Keep using whatever framework you already use. harness0 is complementary — it adds the reliability layer underneath.
Concept origin: Harness Engineering was introduced by OpenAI (Feb 2026), based on building a 1M-line, fully agent-generated codebase. The key insight: the model is the engine; the harness is what makes it driveable. harness0 is the first open-source library built entirely around this discipline.
The Problem
Every agent developer hits the same walls:
| Problem | Root Cause | harness0 Layer |
|---|---|---|
| "Demo works, production fails" | No structured context management | L1 Context Assembly |
| "More tools = less stable" | Tools are an ungoverned bag of functions | L2 Tool Governance |
| "Afraid to let agents run commands" | Security relies on prompt-level trust | L3 Security Guard |
| "Agent fails but doesn't know why" | System errors aren't translated for the model | L4 Feedback Loop |
| "Agent drifts on long tasks" | Context decays — stale rules, bloated history | L5 Entropy Management |
Existing frameworks solve orchestration. harness0 solves reliability.
Quick Start
pip install harness0
import asyncio
from openai import AsyncOpenAI
from harness0 import HarnessEngine, RiskLevel
engine = HarnessEngine.default()
@engine.tool(risk_level=RiskLevel.READ)
async def read_file(path: str) -> str:
"""Read a file and return its contents."""
return open(path).read()
@engine.tool(risk_level=RiskLevel.EXECUTE, requires_approval=True, timeout=30)
async def run_command(command: str) -> str:
"""Execute a shell command."""
import subprocess
return subprocess.check_output(command, shell=True, text=True)
async def main():
result = await engine.run("Summarise README.md", llm_client=AsyncOpenAI())
print(result.output)
asyncio.run(main())
v0.0.3 — L1–L5 and
HarnessEngineare implemented and functional. Framework integrations are planned. See TODO.md.
With harness.yaml — declarative configuration
llm:
provider: openai
model: gpt-4o
context:
layers:
- name: base
source: prompts/base.md
priority: 0
disclosure_level: index
- name: security-guide
source: docs/security.md
priority: 10
disclosure_level: detail
keywords: ["security", "permission", "auth"]
total_token_budget: 8000
security:
blocked_commands: ["rm -rf", "sudo", "> /dev/sda"]
approval_mode: risky_only
entropy:
gardener_enabled: true
gardener_interval_turns: 5
golden_rules:
- id: no_duplicate_tools
description: "No two tools may share the same description"
severity: error
- id: no_stale_layers
description: "All FileSource layers must be fresher than 24h"
severity: warning
engine = HarnessEngine.from_config("harness.yaml")
The 5-Layer Harness Model
L1: Context Assembly
Prompts are assembly systems, not documents.
Give the agent a map, not a 1,000-page manual. INDEX layers are always injected (base prompt, rules summary). DETAIL layers are keyword-gated — loaded only when the task mentions relevant terms. Per-layer and total token budgets prevent context overflow.
assembler = ContextAssembler(layers=[
ContextLayer(name="base", source=FileSource("base.md"),
disclosure_level=DisclosureLevel.INDEX), # always loaded
ContextLayer(name="security", source=FileSource("security.md"),
disclosure_level=DisclosureLevel.DETAIL, # loaded only for security tasks
keywords=["security", "auth"]),
ContextLayer(name="state", source=CallableSource(get_state),
freshness=Freshness.PER_TURN), # dynamic per turn
], total_token_budget=8000)
L2: Tool Governance
Tools are governed capabilities, not a bag of functions.
Four risk levels (READ → WRITE → EXECUTE → CRITICAL), schema validation, output truncation, and full audit trail. Every tool call passes through a unified pipeline; every failure emits a structured signal the agent can act on.
@engine.tool(risk_level=RiskLevel.EXECUTE, requires_approval=True, timeout=30)
async def run_command(command: str) -> str: ...
ToolCall → Validate → CommandGuard → Approval → Execute → Truncate → Audit → ToolResult
L3: Security Guard
Security at runtime, not in prompts.
Three lines of defense: CommandGuard (pattern blocklist with fix instructions), ProcessSandbox (configurable resource limits), ApprovalManager (human-in-the-loop with SHA-256 fingerprint cache — approve once per session, not once per call).
result = engine.command_guard.check("sudo rm -rf /tmp")
result.allowed # False
result.matched_pattern # "sudo"
result.signal.fix_instructions
# "1. Do NOT retry — matches the security blocklist.
# 2. Reason: `sudo` causes irreversible side effects.
# 3. Safer alternatives: run without sudo, or use targeted delete."
Even if the model "misbehaves," the system has hard boundaries.
L4: Feedback Loop
System events must be translated into model-consumable signals.
The agent should never see a bare PermissionError. It should see what happened, why, and what to do next:
| System Event | Without L4 | With L4 |
|---|---|---|
| Command blocked | PermissionError |
"Command blocked: sudo. Step 1: don't retry. Step 2: run without sudo." |
| Output truncated | Silent cutoff | "Output truncated 12K→5K tokens. Narrow your search scope." |
| Subprocess timeout | TimeoutError |
"Exceeded 30s timeout. Break into smaller steps or increase timeout." |
| Schema invalid | ValidationError |
"Missing required parameter content. Check the tool schema and retry." |
Every signal carries a fix_instructions field — numbered steps the agent can execute immediately. Signals are rendered as XML and auto-injected into the next turn's context via L1:
<harness:signal id="a3f8c1d2" type="constraint" source="security.command_guard">
<message>Command `sudo apt install` blocked.</message>
<fix_instructions>1. Do NOT retry.
2. Install without sudo.
3. Or request user approval.</fix_instructions>
</harness:signal>
L5: Entropy Management
Active quality maintenance, not passive compression.
Agent context decays over time. Other frameworks react only when tokens overflow. harness0 proactively detects and repairs degradation every turn:
| Passive (other frameworks) | Active (harness0) | |
|---|---|---|
| Trigger | Token count near limit | Every turn, proactively |
| Method | LLM summarizes old messages | Detect + classify + targeted fix |
| Stale signal removal | No | Yes |
| Duplicate detection | No | Yes |
| Conflict detection | No | Yes |
| Background GC | No | Yes — EntropyGardener |
Golden rules are mechanically verifiable invariants declared in YAML. Violations emit FeedbackSignals — the agent can self-repair:
entropy:
golden_rules:
- id: no_duplicate_tools
description: "No two tools may share the same description"
severity: error
- id: no_stale_layers
description: "All FileSource layers must be fresher than 24h"
severity: warning
Cross-Layer Coordination
The 5 layers are not independent pipelines — they form a coordinated feedback loop:
L3 SecurityGuard blocks "rm -rf /"
→ L4 FeedbackTranslator generates signal with fix_instructions
→ L1 ContextAssembler injects signal into next turn's context
→ LLM receives actionable feedback, adjusts behavior
→ L5 EntropyManager garbage-collects stale signals later
→ Full API reference → User Manual
Use Individual Layers
Every layer is independently importable. No full buy-in required.
from harness0.context import ContextAssembler # L1 — multi-layer prompt assembly
from harness0.tools import ToolInterceptor # L2 — governed tool execution
from harness0.security import CommandGuard # L3 — security enforcement
from harness0.feedback import FeedbackTranslator # L4 — better error messages for models
from harness0.entropy import EntropyManager # L5 — context quality maintenance
Use just L3 for security, just L1 for prompt assembly, or all 5 together. Each layer has zero dependencies on the others.
→ Individual layer usage examples → User Manual §11
How It Compares
Based on publicly available documentation as of March 2026. See competitive-analysis.md for methodology.
| Capability | LangChain | OpenAI SDK | MS AGT | harness0 |
|---|---|---|---|---|
| Multi-layer context assembly | — | Basic (2-tier) | — | ✅ L1 |
| Progressive disclosure | — | — | — | ✅ INDEX/DETAIL |
| Tool risk classification | — | allow/reject | Policy engine | ✅ 4-level |
| Sandbox execution | Remote | — | 4 privilege rings | ✅ Lightweight |
| Approval workflows | HITL | approve/reject | Yes | ✅ + fingerprint cache |
| Feedback translation | — | — | — | ✅ L4 |
| Entropy detection + GC | — | — | — | ✅ L5 |
| Golden rule enforcement | — | — | — | ✅ EntropyGardener |
| Declarative config | — | — | OPA/Rego | ✅ harness.yaml |
| Framework agnostic | No | No | Yes | ✅ |
Three capabilities no major framework addresses: multi-layer context assembly, feedback translation, and entropy management.
Framework Integrations [Planned]
harness0 works with your existing framework. Adapters are on the roadmap:
| Framework | Install | Strategy |
|---|---|---|
| LangChain | pip install harness0[langchain] |
Middleware hooks |
| OpenAI Agents SDK | pip install harness0[openai] |
Input/output/tool guardrails |
| PydanticAI | pip install harness0[pydantic-ai] |
Dependency injection |
| CrewAI | pip install harness0[crewai] |
@harness_tool decorator |
→ Integration architecture → Architecture docs
Why "harness0"?
The 0 means Layer 0 — the foundational reliability substrate beneath every agent framework, like Layer 0 in networking is the physical medium all higher layers depend on. Ground zero of agent reliability.
Three lessons from OpenAI's harness engineering that directly shaped the design:
- "Give the agent a map, not a manual" → L1 Progressive Disclosure (INDEX/DETAIL)
- "Error messages must contain fix instructions" → L4
fix_instructionson every signal - "Entropy is inevitable — automate the gardening" → L5
EntropyGardenerwith golden rules
Requirements
- Python 3.11+
- Dependencies:
pydantic>=2.0·pyyaml>=6.0·tiktoken>=0.7·httpx>=0.27·aiofiles>=24.0 - Any
openai.AsyncOpenAI-compatible LLM client
Contributing
Contributions welcome. See TODO.md for the full roadmap.
Priority areas: test suite · LLM provider layer · built-in tool plugins · framework adapters · entropy detection strategies
License
MIT
User Manual · Architecture · Competitive Analysis · Vision · Roadmap
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harness0-0.0.3.tar.gz.
File metadata
- Download URL: harness0-0.0.3.tar.gz
- Upload date:
- Size: 639.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62e6c48961440983b1456195eae0daa413876a1e82739d61102a55152ac4a6bd
|
|
| MD5 |
b91ad53ed10fedaa66edce337c4159d9
|
|
| BLAKE2b-256 |
3930e18849fa66f57a076cca76e5ef03ebb56c42aeb97c0d54a841a732eda416
|
File details
Details for the file harness0-0.0.3-py3-none-any.whl.
File metadata
- Download URL: harness0-0.0.3-py3-none-any.whl
- Upload date:
- Size: 42.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b567e1edf9ac91678f03c175324f50de7bfefe89594cf3c918f7db216ed8eaca
|
|
| MD5 |
1fe09a8cd996fefa7d3d709be4050a86
|
|
| BLAKE2b-256 |
f0b658028b5e15d4748d08c023db7f289d5c9e9bc000d4e460c96a65a09a978c
|