Skip to main content

HALO engine: LLM agent runtime over OTel trace data, with bundled CLI.

Project description


😇
HALO

✨ RLM-based Automatic Agent Optimization Loop ✨

X (formerly Twitter) License GitHub

What is this?InstallWhy RLM?BenchmarksDevelopmentContributing

What is this?

HALO (Hierarchical Agent Loop Optimization) is a methodology for building recursively self-improving agent harnesses using RLMs. This repository contains:

  • Information on HALO methodology.
  • A Python package that implements the core HALO-RLM engine. View on PyPI
  • A demo project that shows how to build HALO loops for your agents using the Python package. View demo
  • Benchmarking examples applying HALO to popular agent benchmarks. (View AppWorld).

HALO Loop

The core HALO loop is suprisingly simple:

  1. Collect execution traces from your agent harness. HALO uses OpenTelemetry-compatible tracing.
  2. Feed traces in the HALO RLM.
  3. The RLM decomposes the traces to understand common failure modes and across harness executions and produces a report with it’s findings.
  4. This report is then fed to a coding agent like Cursor or Claude Code, which is responsible for generating and applying a set of changes to your harness to improve performance.
  5. The harness is then re-deployed, more traces are gathered, and the cycle repeats again.

HALO is great at finding issues in production agent deployments. We find production environments tend to generate more data with higher variance across executions, creating the type of issues that HALO’s RLM-decomposition is great at spotting.

Install

Install the HALO engine + CLI from PyPI:

pip install halo-engine

# Verify
halo --help

Get Started

For instructions on using the HALO loop with your OpenAI Agents SDK Agent, see our integration guide to start gathering traces. Then, use the HALO Python package to generate a report you can use to improve your agent. Included in the package is a CLI.

For integration examples, we have provided a simple demo and an AppWorld demo.

Why an RLM?

A general-purpose harness like Claude Code is the wrong tool for trace analysis. This isn’t because the model isn’t smart, but because traces can get extremely long, and you need a specialized toolkit in order to make observations about systemic agentic behavior. We noticed in our testing that harnesses like CC would often overfit to an error present in a single/few traces rather than generalize to harness-level problems. This led us to creating a specialized form of a RLM.

rlm

Benchmarks

HALO is consistently capable of driving improvements on benchmarks, solely by optimizing the harness.

AppWorld

We applied HALO to the AppWorld benchmark, a set of agentic tasks that assess the LLM’s ability to use multi-app services like Spotify, Venmo, file systems, and phone contacts. We tested HALO’s ability to improve harnesses for both Gemini 3 Flash and Sonnet 4.6. We iterated on the harness using the dev split, and then used the test_normal split as a proxy to verify that improvements did not come from overfitting.

The feedback from HALO Engine surfaced failures in the harnesses such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt edit. HALO’s claims were independently verified from the source trace files with the findings holding up under scrutiny.

app-world-sgc The peak improvements over baseline were substantial for both models. For Gemini 3 Flash, dev SGC went from 36.8% to 52.6% (+15.8 points) and test_normal SGC went from 37.5% to 48.2% (+10.7 points). For Sonnet 4.6, dev SGC went from 73.7% to 89.5% (+15.8 points) and test_normal SGC went from 62.5% to 73.2% (+10.7 points).

Development

Local development against this repo uses uv for dependency management and go-task as the task runner.

Setup

git clone https://github.com/context-labs/HALO
cd HALO
task env:setup

task env:setup installs uv (if missing), syncs the venv from uv.lock, and configures the repo's git hooks. After that, the halo CLI is available via uv run halo ... (or activate .venv/).

Common tasks

Run task --list for the full list. The ones you'll use most:

Task What it does
task check Run all pre-commit checks: pinned-versions, lint, format, typecheck, unit tests
task check:fix Same, but auto-fix lint/format issues
task test:unit Unit tests under tests/unit/
task test:integration Integration tests under tests/integration/

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

halo_engine-0.1.1.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

halo_engine-0.1.1-py3-none-any.whl (82.8 kB view details)

Uploaded Python 3

File details

Details for the file halo_engine-0.1.1.tar.gz.

File metadata

  • Download URL: halo_engine-0.1.1.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for halo_engine-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c3cfb9099a41d6cf181dad5bcaae717c54fc43bffb2559dc11ae3a83acabb652
MD5 94bdf8423e674320499435faa30ed327
BLAKE2b-256 c9dfc3ea438971376c0d126ed1fcaf6c3a70ac246cdf8d04ae3d5a0392760eed

See more details on using hashes here.

File details

Details for the file halo_engine-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: halo_engine-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 82.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for halo_engine-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53f5ab1c4c4da6c654ea0e18680f513799d17eb81010b172a1b723b34a1fb395
MD5 7daf870f0910b02cfe7a667ead18733f
BLAKE2b-256 0be163026029ffe6eaab531e2db8045c992f0322c6218c9f426ebbfef98d01cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page