Skip to main content

Production-grade reusable AI agent infrastructure base

Project description

llm-harness

PyPI version License: MIT Tests

中文介绍 · English Introduction · GitHub

Production-grade reusable agent infrastructure base — ~10,000 lines, 290 tests.

Build an AI agent by defining your tools, writing your skills, and choosing a provider. Everything else — ReAct loop, tool pipeline, permissions, hooks, session persistence, memory consolidation, observability — is handled by the harness.

from agent_harness import AgentLoop, LoopCallbacks, ToolRegistry, AnthropicProvider

tools = ToolRegistry()
tools.register(MyBusinessTool())

callbacks = LoopCallbacks(
    build_messages=...,               # your system prompt
    execute_tool=...,                 # your tool execution
    get_tool_definitions=lambda: tools.to_api_schema("anthropic"),
)

agent = AgentLoop(AnthropicProvider(api_key="..."), callbacks)
result = await agent.process_direct("Do the thing")
print(result.final_content)

Why This Exists

Option Problem
LangChain/LangGraph 300K+ lines, 50+ dependencies, constant API churn
From scratch Rebuild loop, retry, registry, session, permissions... every time
llm-harness ~10K lines. Read in an afternoon. Fork without fear. 290 tests

Architecture

Each tool call goes through:
  LLM → Permission.check → Hook.execute(PRE_TOOL_USE) → Tool.execute → Hook.execute(POST_TOOL_USE) → LLM

Each conversation turn goes through:
  Message → AgentLoop → Provider.chat_with_retry → (tool calls? → execute → loop) → Text response

Every event flows through:
  Any module → EventBus → Tracker (JSONL file) / Prometheus / Dashboard
llm-harness/
  loop/             ReAct skeleton + concurrency (per-session Lock + Semaphore)
  tools/            24 built-in tools + config-driven builder
  providers/        Anthropic + OpenAI-compatible (25 backends), retry + backoff
  permissions/      Sensitive path protection, 3 modes, path/cmd rules
  hooks/            PreToolUse/PostToolUse, 4 hook types (cmd/http/prompt/agent)
  security/         SSRF protection (DNS + private IP blocking)
  sandbox/          OS-level isolation (srt CLI wrapper)
  session/          JSONL persistence + legal boundary alignment
  memory/           Two-tier (MEMORY.md + HISTORY.md) + LLM consolidation
  skills/           .md loading + dependency checking
  cron/             Scheduler (at/every/cron) + persistence
  mcp/              MCP stdio/SSE/HTTP, tools as BaseTool subclasses
  channels/         BaseChannel ABC + ChannelManager (WebSocket, Telegram...)
  commands/         4-tier slash command router
  plugins/          Discovery + manifest loading
  auth/             Credential storage (file + keyring + encryption)
  prompts/          AGENTS.md discovery + environment + SectionProviders
  tasks/            Background subprocess manager + stdout capture
  coordinator/      Subagent spawning with restricted tools
  state/            Observable state store (get/set/subscribe)
  config/           Multi-layer (CLI > env > file > defaults)
  observability/    Structured events + EventBus + JSONL tracker (auto-start)

Quick Start

pip install llm-harness[all]
import asyncio
from pathlib import Path
from agent_harness import (
    AgentLoop, LoopCallbacks, ToolRegistry, BaseTool,
    ToolResult, ToolExecutionContext, AnthropicProvider,
)
from pydantic import BaseModel, Field

class GreetInput(BaseModel):
    name: str = Field(description="Who to greet")

class GreetTool(BaseTool):
    name = "greet"
    description = "Greet someone"
    input_model = GreetInput

    async def execute(self, args, ctx):
        return ToolResult(output=f"Hello, {args.name}!")

tools = ToolRegistry()
tools.register(GreetTool())

async def _exec(tools, name, args):
    tool = tools.get(name)
    parsed = tool.input_model.model_validate(args)
    result = await tool.execute(parsed, ToolExecutionContext(cwd=Path.cwd()))
    return result.output

callbacks = LoopCallbacks(
    build_messages=lambda msg: [
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": msg.content},
    ],
    execute_tool=lambda name, args: _exec(tools, name, args),
    get_tool_definitions=lambda: tools.to_api_schema("anthropic"),
    on_event=lambda e: print(f"[{type(e).__name__}]"),  # optional observability
)

agent = AgentLoop(AnthropicProvider(api_key="..."), callbacks)

async def main():
    result = await agent.process_direct("Greet Alice!")
    print(result.final_content)

asyncio.run(main())

Config-Driven Setup

{
  "agent": { "model": "claude-sonnet-4-6" },
  "tools": { "enabled": ["web_search", "message", "write_memory"] },
  "permission": { "mode": "default" },
  "observability": { "track_file": "~/.llm-harness/track.jsonl" }
}
from agent_harness import load_config, build_tools_from_config, start_tracker_from_config

config = load_config()
tools = build_tools_from_config(config.tools)
tracker = await start_tracker_from_config(config)  # auto-starts if configured

Observability

Zero-config by default. Set observability.track_file in config to auto-start JSONL tracking:

{"type":"SessionOpened","ts":"...","data":{"session_key":"cli:test"}}
{"type":"ToolExecutionStarted","ts":"...","data":{"tool_name":"web_search","tool_input":{...}}}
{"type":"ToolExecutionCompleted","ts":"...","data":{"tool_name":"web_search","output":"...","is_error":false,"duration_ms":123.4}}
{"type":"AssistantTurnComplete","ts":"...","data":{"content":"Done","usage":{"prompt_tokens":10,"completion_tokens":5}}}

Or subscribe programmatically for real-time metrics:

from agent_harness.observability import get_event_bus

async def prometheus_collector(event):
    if isinstance(event, ToolExecutionCompleted):
        histogram(f"tool.{event.tool_name}.latency_ms", event.duration_ms)

get_event_bus().subscribe(prometheus_collector)

Deployment

# One Deployment per agent scenario
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cs-agent
spec:
  replicas: 3
  template:
    spec:
      containers:
      - image: llm-harness:latest
        env:
        - name: AGENT_SCENARIO
          value: "customer-service"
        volumeMounts:
        - name: tools
          mountPath: /app/tools
        - name: skills
          mountPath: /app/skills
Kafka: topic:customer-service → cs-agent (3 pods)
       topic:code-review      → cr-agent (2 pods)
       topic:ops-automation   → ops-agent (1 pod)

Installation

pip install llm-harness               # base
pip install llm-harness[anthropic]    # + Claude
pip install llm-harness[openai]       # + OpenAI
pip install llm-harness[all]          # everything
pip install llm-harness[dev]          # + pytest, ruff

Requirements

Core: Python >= 3.10, pydantic >= 2.0, httpx >= 0.27, pyyaml >= 6.0, mcp >= 1.0, croniter >= 2.0, json-repair >= 0.57 Optional: anthropic, openai, ddgs, readability-lxml

Tests

290 passed, 9 skipped, 0 failed

9 skipped are optional dependency tests (ddgs, readability-lxml). Install those packages to enable them.

Design Principles

  1. Callback injection, not inheritance. LoopCallbacks dataclass holds all app-specific behavior. The loop knows nothing about your tools, channels, or prompts.

  2. Config-driven. Switch agent behavior via JSON. Tools, permissions, provider, sandbox, observability — all configurable without code changes.

  3. Transport-agnostic. BaseChannel defines the contract. WebSocket, HTTP, gRPC, Telegram — same interface.

  4. You own the code. ~10,000 lines. Fork it. Modify it. No framework to learn.

  5. Production observability. Structured events, EventBus, JSONL tracker, auto-start from config. Zero overhead when disabled.

License

MIT — see LICENSE.

Credits

Extracted and refined from two mature open-source agent projects:

  • OpenHarness — tools, permissions, hooks, skills, sandbox, plugins, tasks
  • nanobot — agent loop, providers, message bus, session, memory, cron, channels

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_harness-0.2.0.tar.gz (283.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_harness-0.2.0-py3-none-any.whl (167.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_harness-0.2.0.tar.gz.

File metadata

  • Download URL: llm_harness-0.2.0.tar.gz
  • Upload date:
  • Size: 283.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for llm_harness-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ff21624af3c2067c0c33f752103014298720aca848a0df6ebab44f1d941d4f98
MD5 4a569508327df529c9d43334a01da35c
BLAKE2b-256 c204c0f3ca96fdd1b3d491b4bda4bcdfd3fc8446483dac5f9650e577add0eeb1

See more details on using hashes here.

File details

Details for the file llm_harness-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llm_harness-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 167.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for llm_harness-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d92d7ae255990661894a7f23a11601bb0f93063c077f61854697454821546c78
MD5 a972dfb468fb73e5d671f0b5cba628de
BLAKE2b-256 478adaff8af5ea4002a6d57001b2cdaacf40fac6904e5daf3c4c5d794991347d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page