HALO engine: LLM agent runtime over OTel trace data, with bundled CLI.
Project description
HALO
✨ RLM-based Automatic Agent Optimization Loop ✨
What is this? • Install • Why RLM? • Benchmarks • Development • Contributing
What is this?
HALO (Hierarchical Agent Loop Optimization) is a methodology for building recursively self-improving agent harnesses using RLMs. This repository contains:
- Information on HALO methodology.
- A Python package that implements the core HALO-RLM engine. View on PyPI
- A demo project that shows how to build HALO loops for your agents using the Python package. View demo
- Benchmarking examples applying HALO to popular agent benchmarks. (View AppWorld).
HALO Loop
The core HALO loop is suprisingly simple:
- Collect execution traces from your agent harness. HALO uses OpenTelemetry-compatible tracing.
- Feed traces in the HALO RLM.
- The RLM decomposes the traces to understand common failure modes and across harness executions and produces a report with it’s findings.
- This report is then fed to a coding agent like Cursor or Claude Code, which is responsible for generating and applying a set of changes to your harness to improve performance.
- The harness is then re-deployed, more traces are gathered, and the cycle repeats again.
HALO is great at finding issues in production agent deployments. We find production environments tend to generate more data with higher variance across executions, creating the type of issues that HALO’s RLM-decomposition is great at spotting.
Install
Install the HALO engine + CLI from PyPI:
pip install halo-engine
# Verify
halo --help
Get Started
For instructions on using the HALO loop with your OpenAI Agents SDK Agent, see our integration guide to start gathering traces. Then, use the HALO Python package to generate a report you can use to improve your agent. Included in the package is a CLI.
For integration examples, we have provided a simple demo and an AppWorld demo.
Why an RLM?
A general-purpose harness like Claude Code is the wrong tool for trace analysis. This isn’t because the model isn’t smart, but because traces can get extremely long, and you need a specialized toolkit in order to make observations about systemic agentic behavior. We noticed in our testing that harnesses like CC would often overfit to an error present in a single/few traces rather than generalize to harness-level problems. This led us to creating a specialized form of a RLM.
Benchmarks
HALO is consistently capable of driving improvements on benchmarks, solely by optimizing the harness.
AppWorld
We applied HALO to the AppWorld benchmark, a set of agentic tasks that assess the LLM’s ability to use multi-app services like Spotify, Venmo, file systems, and phone contacts. We tested HALO’s ability to improve harnesses for both Gemini 3 Flash and Sonnet 4.6. We iterated on the harness using the dev split, and then used the test_normal split as a proxy to verify that improvements did not come from overfitting.
The feedback from HALO Engine surfaced failures in the harnesses such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt edit. HALO’s claims were independently verified from the source trace files with the findings holding up under scrutiny.
Development
Local development against this repo uses uv for dependency management and go-task as the task runner.
Setup
git clone https://github.com/context-labs/HALO
cd HALO
task env:setup
task env:setup installs uv (if missing), syncs the venv from uv.lock, and configures the repo's git hooks. After that, the halo CLI is available via uv run halo ... (or activate .venv/).
Common tasks
Run task --list for the full list. The ones you'll use most:
| Task | What it does |
|---|---|
task check |
Run all pre-commit checks: pinned-versions, lint, format, typecheck, unit tests |
task check:fix |
Same, but auto-fix lint/format issues |
task test:unit |
Unit tests under tests/unit/ |
task test:integration |
Integration tests under tests/integration/ |
License
Contributing
Contributions are welcome! Please feel free to submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file halo_engine-0.1.1.tar.gz.
File metadata
- Download URL: halo_engine-0.1.1.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3cfb9099a41d6cf181dad5bcaae717c54fc43bffb2559dc11ae3a83acabb652
|
|
| MD5 |
94bdf8423e674320499435faa30ed327
|
|
| BLAKE2b-256 |
c9dfc3ea438971376c0d126ed1fcaf6c3a70ac246cdf8d04ae3d5a0392760eed
|
File details
Details for the file halo_engine-0.1.1-py3-none-any.whl.
File metadata
- Download URL: halo_engine-0.1.1-py3-none-any.whl
- Upload date:
- Size: 82.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53f5ab1c4c4da6c654ea0e18680f513799d17eb81010b172a1b723b34a1fb395
|
|
| MD5 |
7daf870f0910b02cfe7a667ead18733f
|
|
| BLAKE2b-256 |
0be163026029ffe6eaab531e2db8045c992f0322c6218c9f426ebbfef98d01cd
|