HALO engine: LLM agent runtime over OTel trace data, with bundled CLI.

Project description

HALO

✨ RLM-based Automatic Agent Optimization Loop ✨

What is this? • Install • Why RLM? • Benchmarks • Development • Contributing

What is this?

Note: If you're looking a hosted, plug-n-play version of HALO, please sign up for inference.net.

HALO (Hierarchical Agent Loop Optimization) is a methodology for building recursively self-improving agent harnesses using RLMs. This repository contains:

Information on HALO methodology.
A Python package that implements the core HALO-RLM engine. View on PyPI
A demo project that shows how to build HALO loops for your agents using the Python package. View demo
Benchmarking examples applying HALO to popular agent benchmarks. (View AppWorld).

HALO Loop

The core HALO loop is suprisingly simple:

Collect execution traces from your agent harness. HALO uses OpenTelemetry-compatible tracing.
Feed traces into HALO-RLM engine.
The engine decomposes the traces to understand common failure modes and across harness executions and produces a report with it’s findings.
This report is fed into a coding agent like Cursor or Claude Code to generate and apply a set of changes to your harness.
The harness is then re-deployed, more traces are gathered, and the cycle repeats.

HALO is great at finding issues in production agent deployments. We find high-traffic environments tend to generate more data with higher variance across executions, creating the type of issues that HALO is great at identifying.

Why an RLM?

A general-purpose harness like Claude Code is the wrong tool for trace analysis. This isn’t because the model isn’t smart, but because traces can get extremely long, and you need a specialized toolkit in order to make observations about systemic agentic behavior. We noticed in our testing that harnesses like CC would often overfit to an error present in a single/few traces rather than generalize to harness-level problems. This led us to creating a specialized form of a RLM.

Get Started

Install

Install the HALO engine + CLI from PyPI:

pip install halo-engine

# Verify installation
halo --help

Usage

Integrate Tracing
Collect traces by running your agent
Run the HALO engine, see the CLI docs for more info

export OPENAI_API_KEY=...

halo path_to_your_traces.jsonl -p "Diagnose errors you find and suggest fixes"

We have provided a simple demo and an AppWorld demo.

Python API

The engine exposes four entry points from engine.main. Use whichever matches the trade-off you want between observability and code simplicity. The yielded types (AgentOutputItem and AgentTextDelta) are defined in engine/models/engine_output.py:

Function	Sync / async	Returns	When to use
`stream_engine_async`	async	`AsyncIterator[AgentOutputItem \| AgentTextDelta]`	You want every event including streaming-token deltas (live UI, custom rendering).
`stream_engine_output_async`	async	`AsyncIterator[AgentOutputItem]`	You want to log / persist each completed step (assistant message, tool call, tool result) as it lands.
`run_engine_async`	async	`list[AgentOutputItem]`	You want the final list at the end and don't care about per-step observability.
`stream_engine`	sync	`Iterator[AgentOutputItem \| AgentTextDelta]`	Sync generator; yields every event including deltas. Drives the async iterator on a private event loop.
`stream_engine_output`	sync	`Iterator[AgentOutputItem]`	Sync generator; yields completed items only. Same shape as the async variant for sync callers.
`run_engine`	sync	`list[AgentOutputItem]`	Sync, collects to a list. Pure convenience over `asyncio.run(run_engine_async(...))`.

from engine.main import stream_engine_output_async

async for item in stream_engine_output_async(messages, cfg, trace_path):
    logger.info("step", extra={"sequence": item.sequence, "agent": item.agent_name})
    # item.item is an AgentMessage (assistant / tool / etc.)

Benchmarks

HALO is consistently capable of driving improvements on benchmarks, solely by optimizing the harness.

AppWorld

We applied HALO to the AppWorld benchmark, a set of agentic tasks that assess the LLM’s ability to use multi-app services like Spotify, Venmo, file systems, and phone contacts. We tested HALO’s ability to improve harnesses for both Gemini 3 Flash and Sonnet 4.6. We iterated on the harness using the dev split, and then used the test_normal split as a proxy to verify that improvements did not come from overfitting.

The feedback from HALO Engine surfaced failures in the harnesses such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt edit. HALO’s claims were independently verified from the source trace files with the findings holding up under scrutiny.

The peak improvements over baseline were substantial for both models. For Gemini 3 Flash, dev SGC went from 36.8% to 52.6% (+15.8 points) and test_normal SGC went from 37.5% to 48.2% (+10.7 points). For Sonnet 4.6, dev SGC went from 73.7% to 89.5% (+15.8 points) and test_normal SGC went from 62.5% to 73.2% (+10.7 points).

Development

Local development against this repo uses uv for dependency management and go-task as the task runner.

Setup

git clone https://github.com/context-labs/HALO
cd HALO
task env:setup

task env:setup installs uv (if missing), syncs the venv from uv.lock, and configures the repo's git hooks. After that, the halo CLI is available via uv run halo ... (or activate .venv/).

Common tasks

Run task --list for the full list. The ones you'll use most:

Task	What it does
`task check`	Run all pre-commit checks: pinned-versions, lint, format, typecheck, unit tests
`task check:fix`	Same, but auto-fix lint/format issues
`task test:unit`	Unit tests under `tests/unit/`
`task test:integration`	Integration tests under `tests/integration/`

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a pull request.

Project details

Release history Release notifications | RSS feed

0.1.20

Jun 4, 2026

0.1.19

Jun 3, 2026

0.1.18

Jun 2, 2026

0.1.17

May 29, 2026

0.1.16

May 29, 2026

0.1.15

May 23, 2026

0.1.14

May 21, 2026

This version

0.1.13

May 21, 2026

0.1.12

May 21, 2026

0.1.11

May 19, 2026

0.1.10

May 15, 2026

0.1.9

May 15, 2026

0.1.8

May 13, 2026

0.1.7

May 12, 2026

0.1.6

May 12, 2026

0.1.5

May 7, 2026

0.1.4

May 6, 2026

0.1.3

May 4, 2026

0.1.2

Apr 29, 2026

0.1.1

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

halo_engine-0.1.13.tar.gz (2.5 MB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

halo_engine-0.1.13-py3-none-any.whl (105.4 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file halo_engine-0.1.13.tar.gz.

File metadata

Download URL: halo_engine-0.1.13.tar.gz
Upload date: May 21, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for halo_engine-0.1.13.tar.gz
Algorithm	Hash digest
SHA256	`d6c0c67d84ff216b5d53e685fe688558fa0eb8ba9f244b9a04d2ec5668bc5ae8`
MD5	`0d1ccb27030d2fc719ba0fcef4baf55f`
BLAKE2b-256	`f5ac0c9a370379feab103246c7ea7ebfd5811f37eb0f38c27b1e485edb1da2cd`

See more details on using hashes here.

File details

Details for the file halo_engine-0.1.13-py3-none-any.whl.

File metadata

Download URL: halo_engine-0.1.13-py3-none-any.whl
Upload date: May 21, 2026
Size: 105.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for halo_engine-0.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8678f7cae2a5ec50539008dc533640ecaabb89c080d425e81025fa86f4f7afd6`
MD5	`e55de2056bfcd62104062763cc133371`
BLAKE2b-256	`4b6ba581a0226559f27095646109c544af9329c5c9df2e392f4f0a2887507aa7`

See more details on using hashes here.

halo-engine 0.1.13

Navigation

Verified details

Owner

Unverified details

Meta

Project description

HALO

✨ RLM-based Automatic Agent Optimization Loop ✨

What is this?

HALO Loop

Why an RLM?

Get Started

Install

Usage

Python API

Benchmarks

AppWorld

Development

Setup

Common tasks

License

Contributing

Project details

Verified details

Owner

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes