Skip to main content

Deterministic, framework-agnostic detector of multi-agent coordination pathologies that trips at iteration 2

Project description

looptrip

Deterministic, framework-agnostic detection of multi-agent coordination pathologies — caught at iteration 2, not on the invoice.

looptrip watches a multi-agent run as a stream of normalized events and flags the coordination pathologies that make agent systems burn money and spin: duplicate-work loops, ping-pong / livelock, deadlock, and non-termination. It is detection-first — it works over data you already have (OpenTelemetry GenAI spans, or a CAST cast.db) — and deterministic / zero-LLM: the same event stream always yields the same verdict. looptrip is an observer, never a gate; it reports, it never blocks.

This release (0.1.1) ships full pathology coverage (duplicate-work, ping-pong / livelock, deadlock, non-termination), configurable sensitivity controls, counterfactual-replay attribution (via the attribute subcommand), and the cast.db adapter with reproducible proof on real data — plus OpenTelemetry support (Phase 4): an offline adapter (OTelSpanAdapter from flat span dicts and OTLP/JSON exports) plus a live LooptripSpanProcessor for in-flight detection, available in the looptrip[otel] extra. (0.1.0 was published before the Phase 4 code merged and shipped without the OTel modules; 0.1.1 is the first artifact to actually include them.)

The headline

On two real recorded multi-agent runaway sessions, a single workflow-subagent dispatch recurred 54 and 49 times with no progress between repeats. Tripping at the second dispatch — the first repeat — instead of letting the loop run to exhaustion would have saved:

session runaway loop dispatches trip point saved
2e6c0288 workflow-subagent 54 dispatch #2 $320.16
da27b414 workflow-subagent 49 dispatch #2 $472.80
total $792.96

Reproduce it yourself — no database required, the data is a committed fixture:

pip install -e .
looptrip proof

Why "iteration 2"

Native runaway guards are blunt total-step counters that trip at N=10–25 — after the waste has compounded. looptrip's trip is a safety predicate keyed on the pathology signature: no signature (agent, tool, args_hash) may recur without an intervening progress delta. The instant a signature is seen a second time (within a configurable input-token tolerance, with no progress marker between), it fires — before the third wasted turn and the O(N²) context-cost compounding. "2" is the default threshold, not a magic number. The approach (signature-keyed detection with configurable thresholds) is what matters — the detector itself is not the moat; the durable asset is standards engagement — adopting the upstream OpenTelemetry GenAI gen_ai.agent.handoff.* convention and contributing the agent-loop pathology layer (pending-wait and loop-termination semantics) that looptrip uniquely detects.

The worst real runaways are the hardest to catch: a workflow-subagent loop emits no structured handoff contract at all. So looptrip detects from the (agent, ts) repeat signal plus input-token variance alone; any handoff metadata only enriches the signal — it is never required.

Usage

looptrip proof                                  # reproduce the $792.96 headline on the committed fixture
looptrip scan fixture:<session_id>              # scan a session from the packaged fixture
looptrip scan cast-db:<session_id>              # scan a live cast.db session (CAST hosts only)
looptrip scan --all fixture:<session_id>        # run all four detectors (adds a 'kind' column)
looptrip attribute fixture:<session_id>         # attribute pathologies to decisive handoffs (overdetermined = no single one)
looptrip --version

How it works

  1. One normalized event(agent, tool, args_hash, ts, handoff_state) plus optional cost/token metadata. An adapter maps each source's fields onto this schema, so detection logic never touches source-specific span-attribute renames.
  2. Detection-first — Phase 1 ships a cast.db adapter. Phase 4 (now shipped) adds an offline OTel adapter (OTelSpanAdapter ingesting flat span dicts and OTLP/JSON/JSONL exports) and a live LooptripSpanProcessor for in-flight pathology detection in the looptrip[otel] extra. Because agent_runs carries no per-dispatch args, the adapter sets args_hash=None and detection leans on the token-variance signal.
  3. Stdlib state machine — the detector groups events by signature and trips on the 2nd same-signature occurrence with no progress delta. The core is stdlib-only; OpenTelemetry is an optional [otel] extra, never imported by the detector.
  4. False-positive control is first-class — a configurable input-token tolerance, a progress-delta marker, and an idempotent_agents allowlist keep legitimately-repeatable work (commits, reviews) from tripping. looptrip is meant to be run detect-then-print and dogfooded before any signal is trusted.

Honest framing

This project tries hard not to oversell:

  • Attribution numbers. Published LLM-prompting baselines for "which handoff broke the run" sit around ~14% — but that is the prompting baseline; structured / deterministic methods reach 29–52%. Adding structure is the lever, and looptrip's deterministic replay (Phase 3) is the limit case of that frontier — not a fix for a permanent ceiling. We don't anchor to "14%."
  • Cost numbers. The $792.96 here is verifiable from the committed fixture. Larger figures circulate — e.g. a widely-reported "$47K" agent-loop bill — but those are unverified, and we label them as such.
  • Prior art. The market gap is real, but the durable asset is the standard, not the ~200-line detector. A direct competitor exists — Watchtower (MIT, LangGraph-only, trips at 3+ repeats, no handoff contract, no attribution). looptrip differentiates on framework-agnosticism, speed, and standards engagement with the OpenTelemetry GenAI agent-observability conventions — adopting the upstream gen_ai.agent.handoff.* handoff identity and contributing the agent-loop pathology layer it uniquely detects.

Roadmap

  • Phase 1(SHIPPED) cast.db adapter + duplicate-work / iteration-2 detector + reproducible proof.
  • Phase 2(SHIPPED) full pathology coverage (ping-pong / livelock, deadlock, non-termination) + sensitivity controls.
  • Phase 3(SHIPPED) counterfactual replay attribution ("which handoff was decisive").
  • Phase 4(SHIPPED) OpenTelemetry support: offline adapter (OTelSpanAdapter ingesting flat span dicts, OTLP/JSON, JSONL exports) and live LooptripSpanProcessor (in-flight pathology detection via on_start hooks, thread-safe, deduped) in the looptrip[otel] extra. Unit and synthetic testing complete; production multi-agent validation pending.
  • Phase 5(SHIPPED) packaging (Claude Code plugin, Homebrew).
  • Phase 6(SHIPPED) documentation (reference deep-dives, examples, architecture notes).
  • Phase 7(repo work merged; upstream engagement in progress) OpenTelemetry GenAI agent-observability semantic-convention engagement: adopt the upstream gen_ai.agent.handoff.* handoff identity (semantic-conventions-genai) and contribute the pathology layer — a pending/blocking wait-for state and loop-termination (gen_ai.agent.finish_reason) semantics — with looptrip as the deterministic reference implementation.
  • Phase 8(in progress) launch.

Documentation

  • Proof — Reproduce the $792.96 headline. Evidence that the fixture is real and reproducible.
  • Usage — CLI and library API reference, adapters, and configuration.
  • Architecture — Detector design, event normalization, signature matching, and phase-by-phase roadmap.
  • Adapters — Implementing a custom adapter for your event source (OTel spans, custom JSON, etc.).
  • Testing — Test structure, mutation sanity, fixture integrity, and independent re-derivation.
  • Framing — Attribution, cost baselines, related work (Watchtower), and the role of standards.
  • Case Studies — Real runaways: workflow-subagent loops, deadlock scenarios, and non-termination traces.
  • Contributing — How to contribute, issue triage, and development setup.

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

looptrip-0.1.2.tar.gz (146.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

looptrip-0.1.2-py3-none-any.whl (79.0 kB view details)

Uploaded Python 3

File details

Details for the file looptrip-0.1.2.tar.gz.

File metadata

  • Download URL: looptrip-0.1.2.tar.gz
  • Upload date:
  • Size: 146.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for looptrip-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a3174d240eea6784628fb130f6bb65ac3615dae85eeb04e4eea32ecc2b521310
MD5 454a72e0c0c464cea38281ab82eacc0c
BLAKE2b-256 ccb5653ac452da81f71c9f4fb9443cee042c9781fb7dd571d42ce612af06233a

See more details on using hashes here.

File details

Details for the file looptrip-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: looptrip-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 79.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for looptrip-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1818f5f247f8fe95cebd39baee70f4edaf8c12d640679d430bbf0ab212ddf151
MD5 b9c123391f0eb690d0a8ee4bff09b45c
BLAKE2b-256 01c0f992e009d3be9e71c5f5a3cefea13dd8e2b4d2cbcccc0aca02c06f64f9a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page