Skip to main content

Layer 0 of agent reliability — the framework-agnostic harness that makes any AI agent stable.

Project description

harness0 — Layer 0 of Agent Reliability

PyPI version Python 3.11+ License: MIT

Harness Engine for AI Agents.
A reliability layer that makes any agent stable.

Quick Start · 5-Layer Model · Individual Layers · User Manual · Architecture


Agent = Loop(Model + Harness)

You provide the Model. harness0 provides the Harness — the engineering infrastructure that makes the model work reliably in production.

┌─────────────────────────────────────────┐
│           Your Application              │
├─────────────────────────────────────────┤
│  Orchestration (pick one)               │
│  LangChain / CrewAI / PydanticAI / DIY  │
├─────────────────────────────────────────┤
│  ★ harness0 — reliability layer         │  ← Layer 0
│  Context · Tools · Security             │
│  Feedback · Entropy                     │
├─────────────────────────────────────────┤
│  LLM API                                │
│  OpenAI / Anthropic / DeepSeek / Local  │
└─────────────────────────────────────────┘

Keep using whatever framework you already use. harness0 is complementary — it adds the reliability layer underneath.

Concept origin: Harness Engineering was introduced by OpenAI (Feb 2026), based on building a 1M-line, fully agent-generated codebase. The key insight: the model is the engine; the harness is what makes it driveable. harness0 is the first open-source library built entirely around this discipline.

The Problem

Every agent developer hits the same walls:

Problem Root Cause harness0 Layer
"Demo works, production fails" No structured context management L1 Context Assembly
"More tools = less stable" Tools are an ungoverned bag of functions L2 Tool Governance
"Afraid to let agents run commands" Security relies on prompt-level trust L3 Security Guard
"Agent fails but doesn't know why" System errors aren't translated for the model L4 Feedback Loop
"Agent drifts on long tasks" Context decays — stale rules, bloated history L5 Entropy Management

Existing frameworks solve orchestration. harness0 solves reliability.


Quick Start

pip install harness0
import asyncio
from openai import AsyncOpenAI
from harness0 import HarnessEngine, RiskLevel

engine = HarnessEngine.default()

@engine.tool(risk_level=RiskLevel.READ)
async def read_file(path: str) -> str:
    """Read a file and return its contents."""
    return open(path).read()

@engine.tool(risk_level=RiskLevel.EXECUTE, requires_approval=True, timeout=30)
async def run_command(command: str) -> str:
    """Execute a shell command."""
    import subprocess
    return subprocess.check_output(command, shell=True, text=True)

async def main():
    result = await engine.run("Summarise README.md", llm_client=AsyncOpenAI())
    print(result.output)

asyncio.run(main())

v0.0.5 — L1–L5 and HarnessEngine are implemented and functional. Framework integrations are planned. See TODO.md.

With harness.yaml — declarative configuration
llm:
  provider: openai
  model: gpt-4o

context:
  layers:
    - name: base
      source: prompts/base.md
      priority: 0
      disclosure_level: index
    - name: security-guide
      source: docs/security.md
      priority: 10
      disclosure_level: detail
      keywords: ["security", "permission", "auth"]
  total_token_budget: 8000

security:
  blocked_commands: ["rm -rf", "sudo", "> /dev/sda"]
  approval_mode: risky_only

entropy:
  gardener_enabled: true
  gardener_interval_turns: 5
  golden_rules:
    - id: no_duplicate_tools
      description: "No two tools may share the same description"
      severity: error
    - id: no_stale_layers
      description: "All FileSource layers must be fresher than 24h"
      severity: warning
engine = HarnessEngine.from_config("harness.yaml")

The 5-Layer Harness Model

L1: Context Assembly

Prompts are assembly systems, not documents.

Give the agent a map, not a 1,000-page manual. INDEX layers are always injected (base prompt, rules summary). DETAIL layers are keyword-gated — loaded only when the task mentions relevant terms. Per-layer and total token budgets prevent context overflow.

assembler = ContextAssembler(layers=[
    ContextLayer(name="base", source=FileSource("base.md"),
                 disclosure_level=DisclosureLevel.INDEX),          # always loaded
    ContextLayer(name="security", source=FileSource("security.md"),
                 disclosure_level=DisclosureLevel.DETAIL,          # loaded only for security tasks
                 keywords=["security", "auth"]),
    ContextLayer(name="state", source=CallableSource(get_state),
                 freshness=Freshness.PER_TURN),                    # dynamic per turn
], total_token_budget=8000)

L2: Tool Governance

Tools are governed capabilities, not a bag of functions.

Four risk levels (READWRITEEXECUTECRITICAL), schema validation, output truncation, and full audit trail. Every tool call passes through a unified pipeline; every failure emits a structured signal the agent can act on.

@engine.tool(risk_level=RiskLevel.EXECUTE, requires_approval=True, timeout=30)
async def run_command(command: str) -> str: ...
ToolCall → Validate → CommandGuard → Approval → Execute → Truncate → Audit → ToolResult

L3: Security Guard

Security at runtime, not in prompts.

Three lines of defense: CommandGuard (pattern blocklist with fix instructions), ProcessSandbox (configurable resource limits), ApprovalManager (human-in-the-loop with SHA-256 fingerprint cache — approve once per session, not once per call).

result = engine.command_guard.check("sudo rm -rf /tmp")
result.allowed          # False
result.matched_pattern  # "sudo"
result.signal.fix_instructions
# "1. Do NOT retry — matches the security blocklist.
#  2. Reason: `sudo` causes irreversible side effects.
#  3. Safer alternatives: run without sudo, or use targeted delete."

Even if the model "misbehaves," the system has hard boundaries.

L4: Feedback Loop

System events must be translated into model-consumable signals.

The agent should never see a bare PermissionError. It should see what happened, why, and what to do next:

System Event Without L4 With L4
Command blocked PermissionError "Command blocked: sudo. Step 1: don't retry. Step 2: run without sudo."
Output truncated Silent cutoff "Output truncated 12K→5K tokens. Narrow your search scope."
Subprocess timeout TimeoutError "Exceeded 30s timeout. Break into smaller steps or increase timeout."
Schema invalid ValidationError "Missing required parameter content. Check the tool schema and retry."

Every signal carries a fix_instructions field — numbered steps the agent can execute immediately. Signals are rendered as XML and auto-injected into the next turn's context via L1:

<harness:signal id="a3f8c1d2" type="constraint" source="security.command_guard">
  <message>Command `sudo apt install` blocked.</message>
  <fix_instructions>1. Do NOT retry.
2. Install without sudo.
3. Or request user approval.</fix_instructions>
</harness:signal>

L5: Entropy Management

Active quality maintenance, not passive compression.

Agent context decays over time. Other frameworks react only when tokens overflow. harness0 proactively detects and repairs degradation every turn:

Passive (other frameworks) Active (harness0)
Trigger Token count near limit Every turn, proactively
Method LLM summarizes old messages Detect + classify + targeted fix
Stale signal removal No Yes
Duplicate detection No Yes
Conflict detection No Yes
Background GC No YesEntropyGardener

Golden rules are mechanically verifiable invariants declared in YAML. Violations emit FeedbackSignals — the agent can self-repair:

entropy:
  golden_rules:
    - id: no_duplicate_tools
      description: "No two tools may share the same description"
      severity: error
    - id: no_stale_layers
      description: "All FileSource layers must be fresher than 24h"
      severity: warning

Cross-Layer Coordination

The 5 layers are not independent pipelines — they form a coordinated feedback loop:

L3 SecurityGuard blocks "rm -rf /"
  → L4 FeedbackTranslator generates signal with fix_instructions
    → L1 ContextAssembler injects signal into next turn's context
      → LLM receives actionable feedback, adjusts behavior
        → L5 EntropyManager garbage-collects stale signals later

Full API reference → User Manual


Use Individual Layers

Every layer is independently importable. No full buy-in required.

from harness0.context import ContextAssembler       # L1 — multi-layer prompt assembly
from harness0.tools import ToolInterceptor           # L2 — governed tool execution
from harness0.security import CommandGuard            # L3 — security enforcement
from harness0.feedback import FeedbackTranslator      # L4 — better error messages for models
from harness0.entropy import EntropyManager           # L5 — context quality maintenance

Use just L3 for security, just L1 for prompt assembly, or all 5 together. Each layer has zero dependencies on the others.

Individual layer usage examples → User Manual §11


How It Compares

Based on publicly available documentation as of March 2026. See competitive-analysis.md for methodology.

Capability LangChain OpenAI SDK MS AGT harness0
Multi-layer context assembly Basic (2-tier) ✅ L1
Progressive disclosure ✅ INDEX/DETAIL
Tool risk classification allow/reject Policy engine ✅ 4-level
Sandbox execution Remote 4 privilege rings ✅ Lightweight
Approval workflows HITL approve/reject Yes ✅ + fingerprint cache
Feedback translation ✅ L4
Entropy detection + GC ✅ L5
Golden rule enforcement EntropyGardener
Declarative config OPA/Rego harness.yaml
Framework agnostic No No Yes

Three capabilities no major framework addresses: multi-layer context assembly, feedback translation, and entropy management.


Framework Integrations [Planned]

harness0 works with your existing framework. Adapters are on the roadmap:

Framework Install Strategy
LangChain pip install harness0[langchain] Middleware hooks
OpenAI Agents SDK pip install harness0[openai] Input/output/tool guardrails
PydanticAI pip install harness0[pydantic-ai] Dependency injection
CrewAI pip install harness0[crewai] @harness_tool decorator

Integration architecture → Architecture docs


Why "harness0"?

The 0 means Layer 0 — the foundational reliability substrate beneath every agent framework, like Layer 0 in networking is the physical medium all higher layers depend on. Ground zero of agent reliability.

Three lessons from OpenAI's harness engineering that directly shaped the design:

  1. "Give the agent a map, not a manual" → L1 Progressive Disclosure (INDEX/DETAIL)
  2. "Error messages must contain fix instructions" → L4 fix_instructions on every signal
  3. "Entropy is inevitable — automate the gardening" → L5 EntropyGardener with golden rules

Requirements

  • Python 3.11+
  • Dependencies: pydantic>=2.0 · pyyaml>=6.0 · tiktoken>=0.7 · httpx>=0.27 · aiofiles>=24.0
  • Any openai.AsyncOpenAI-compatible LLM client

Contributing

Contributions welcome. See TODO.md for the full roadmap.

Priority areas: test suite · LLM provider layer · built-in tool plugins · framework adapters · entropy detection strategies

License

MIT


User Manual · Architecture · Competitive Analysis · Vision · Roadmap

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harness0-0.0.5.tar.gz (643.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

harness0-0.0.5-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file harness0-0.0.5.tar.gz.

File metadata

  • Download URL: harness0-0.0.5.tar.gz
  • Upload date:
  • Size: 643.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for harness0-0.0.5.tar.gz
Algorithm Hash digest
SHA256 f52e9cf7d9d04bad07d1fde4ee4a143f4d55a1c423999a9d53a93790c0a5424b
MD5 f4dc623683efd21612b3028a002b6443
BLAKE2b-256 269f78721e91236a9a2779344017c946056171ae922b9975213f4dee6fe89a9b

See more details on using hashes here.

File details

Details for the file harness0-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: harness0-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for harness0-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 120ead4ca6fde73fb752810a2eff23370235795fbfcbb917a60ecc4e37fb09b2
MD5 b929055f2eaed9a7e08c0a98f8e94ca1
BLAKE2b-256 591129fe8cf18abcc8d238ab99c890b318ddb7b3b8ddb26e6bcebd55d72b0356

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page