Skip to main content

Watches your LLM agent in real time and intercepts meltdowns before they spiral.

Project description

Sotis

Sotis watches your LLM agent and catches it before it spirals.

PyPI version Python 3.10+ License: MIT

pip install sotis

Long-running agents fail in predictable ways — they loop on the same tool calls, flood their context with error traces, and spiral until the task collapses. Sotis detects these failure patterns in real time and transparently resets execution before they take hold.

Based on "Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents" (arXiv:2603.29231, April 2026)


The Problem

Current AI agents fail predictably under long-horizon execution. As tasks grow longer, agents accumulate error and drift into terminal failure modes:

  • Infinite Loops — repeating the same tool calls with identical arguments
  • Semantic Spirals — rephrasing failed queries hoping for different outcomes
  • Context Poisoning — flooding history with massive error traces and linter warnings
  • Edit Storms — making rapid, uncoordinated file edits without shifting outputs

Frontier models do not fail because they are simple. They fail because long-horizon execution decays their reliability envelope until strategy collapse occurs. Sotis acts as an active runtime stabilizer — monitoring execution, detecting behavioral meltdowns, and transparently resetting context to restore forward progress.


Usage

from sotis import SotisGuard

guard = SotisGuard()

for step in range(max_steps):
    action = agent.decide()
    result = tools.execute(action)

    meltdown = guard.watch(action.name, action.args, result.summary)

    if meltdown:
        guard.reset()  # rolls back files, distills context, resumes cleanly

What it looks like in practice

[Step 22] write_file -> {"path": "src/main.py", "content": "import math"} | SUCCESS
[Step 23] run_tests  -> {"cmd": "pytest"} | FAIL (ImportError)
[Step 24] write_file -> {"path": "src/main.py", "content": "import math"} | SUCCESS
[Step 25] run_tests  -> {"cmd": "pytest"} | FAIL (ImportError)

[WARNING]   Anomaly detected: Workspace edit storm and exact argument loops
[INTERCEPT] Sotis Meltdown Interception Triggered!
[RECOVER]   Restored workspace files to stable baseline (step 22 diff)
[RECOVER]   Distilled session context history (78% token savings)
[RESUME]    Injecting resumption briefing into agent context...

[Step 26] grep_search -> {"query": "math"} | Execution resumed cleanly

Active Stabilization, Not Passive Tracing

Tools like LangSmith, Langfuse, and Helicone log what happened after your agent already spent $20 looping in production.

Sotis intervenes during execution. It intercepts spiraling tool calls, rolls back uncommitted file edits, distills conversation history, and redirects the model's reasoning loop — before the damage accumulates.


Capabilities

Capability Description
Meltdown Detection Sliding-window Shannon entropy (w=5, H=1.5) + exact loop detection
Workspace Density Guard Detects infinite same-file edit cycles
Transparent Reset Git-diff checkpointing + distilled context rebuild (≥60% token savings)
Graceful Degradation GDS scoring preserves partial progress across resets
LangGraph Integration Native guard node — intercepts state, rolls back files
Document Processing PDF, XLSX, Word, CSV support + Jaccard semantic loop detection
LLM Support OpenAI, Anthropic, DeepSeek, Google Gemini
Observability Streamlit dashboard + structured JSON session logs

The Science

Sotis operationalizes the formal reliability framework from "Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents" (arXiv:2603.29231, April 2026).

Four key findings from the paper that Sotis directly addresses:

Meltdown Onset Point (MOP) — the paper quantifies the transition from coherent planning to chaotic looping via sliding-window Shannon entropy. Sotis implements this as a live runtime monitor with a calibrated threshold of H=1.5 bits over a 5-step window.

Super-linear reliability decay — agent success rates decay faster than mathematically expected because errors are positively correlated across steps. A confused agent stays confused. Sotis acts as a circuit breaker that resets the error correlation coefficient by starting fresh from a verified checkpoint.

Episodic memory failures — the paper demonstrates that naive memory scaffolds universally degrade long-horizon performance by accumulating context overhead. Sotis uses controlled checkpointed resets instead of continuous memory accumulation.

Graceful Degradation Score (GDS) — rather than binary pass/fail, Sotis scores partial task completion using weighted subtask graphs, preserving measured progress across reset boundaries.


Performance

Metric Result
Entropy + loop detection latency < 0.2ms per step
Context distillation token reduction 86.14%
Test suite 127 tests, 88% coverage
Live recovery Verified on circular import and AST recursive loop traps

Full empirical ledger: performance_metrics.txt


Project Structure

sotis/
  core/     # Entropy, loop detection, checkpoint, decomposition, GDS
  lib/      # ReAct runtime, LangGraph integration, LLM adapters
  obs/      # Streamlit dashboard + structured JSON logger
  bench/    # Benchmark harness and task generators

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sotis-1.0.2.tar.gz (68.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sotis-1.0.2-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file sotis-1.0.2.tar.gz.

File metadata

  • Download URL: sotis-1.0.2.tar.gz
  • Upload date:
  • Size: 68.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for sotis-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6d8c46a8752c4102054171aa744613458aec2ff65a4e1c01bae9a6c42f76a930
MD5 4fee40908d0b29fb9e9f9b2d31621fdc
BLAKE2b-256 b7f92680dd2625348df5fe4eb92ccc05e42ec0c2f1af5db20df07c641b79c7fe

See more details on using hashes here.

File details

Details for the file sotis-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: sotis-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for sotis-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c6ae7e1804f24ef6d14f13b12325d6ebb2efb9741aa06969c59cf9b1f2fc72b2
MD5 00eeb6c05c0742c74ea6933e003dd52f
BLAKE2b-256 d945b6b744af9917b01f7d907301b927efabdcea8d6e40d6b78fa565794062e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page