Watches your LLM agent in real time and intercepts meltdowns before they spiral.
Project description
Sotis
Sotis watches your LLM agent and catches it before it spirals.
pip install sotis
Long-running agents fail in predictable ways — they loop on the same tool calls, flood their context with error traces, and spiral until the task collapses. Sotis detects these failure patterns in real time and transparently resets execution before they take hold.
Based on "Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents" (arXiv:2603.29231, April 2026)
The Problem
Current AI agents fail predictably under long-horizon execution. As tasks grow longer, agents accumulate error and drift into terminal failure modes:
- Infinite Loops — repeating the same tool calls with identical arguments
- Semantic Spirals — rephrasing failed queries hoping for different outcomes
- Context Poisoning — flooding history with massive error traces and linter warnings
- Edit Storms — making rapid, uncoordinated file edits without shifting outputs
Frontier models do not fail because they are simple. They fail because long-horizon execution decays their reliability envelope until strategy collapse occurs. Sotis acts as an active runtime stabilizer — monitoring execution, detecting behavioral meltdowns, and transparently resetting context to restore forward progress.
Usage
from sotis import SotisGuard
guard = SotisGuard()
for step in range(max_steps):
action = agent.decide()
result = tools.execute(action)
meltdown = guard.watch(action.name, action.args, result.summary)
if meltdown:
guard.reset() # rolls back files, distills context, resumes cleanly
What it looks like in practice
[Step 22] write_file -> {"path": "src/main.py", "content": "import math"} | SUCCESS
[Step 23] run_tests -> {"cmd": "pytest"} | FAIL (ImportError)
[Step 24] write_file -> {"path": "src/main.py", "content": "import math"} | SUCCESS
[Step 25] run_tests -> {"cmd": "pytest"} | FAIL (ImportError)
[WARNING] Anomaly detected: Workspace edit storm and exact argument loops
[INTERCEPT] Sotis Meltdown Interception Triggered!
[RECOVER] Restored workspace files to stable baseline (step 22 diff)
[RECOVER] Distilled session context history (78% token savings)
[RESUME] Injecting resumption briefing into agent context...
[Step 26] grep_search -> {"query": "math"} | Execution resumed cleanly
Active Stabilization, Not Passive Tracing
Tools like LangSmith, Langfuse, and Helicone log what happened after your agent already spent $20 looping in production.
Sotis intervenes during execution. It intercepts spiraling tool calls, rolls back uncommitted file edits, distills conversation history, and redirects the model's reasoning loop — before the damage accumulates.
Capabilities
| Capability | Description |
|---|---|
| Meltdown Detection | Sliding-window Shannon entropy (w=5, H=1.5) + exact loop detection |
| Workspace Density Guard | Detects infinite same-file edit cycles |
| Transparent Reset | Git-diff checkpointing + distilled context rebuild (≥60% token savings) |
| Graceful Degradation | GDS scoring preserves partial progress across resets |
| LangGraph Integration | Native guard node — intercepts state, rolls back files |
| Document Processing | PDF, XLSX, Word, CSV support + Jaccard semantic loop detection |
| LLM Support | OpenAI, Anthropic, DeepSeek, Google Gemini |
| Observability | Streamlit dashboard + structured JSON session logs |
The Science
Sotis operationalizes the formal reliability framework from "Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents" (arXiv:2603.29231, April 2026).
Four key findings from the paper that Sotis directly addresses:
Meltdown Onset Point (MOP) — the paper quantifies the transition from coherent planning to chaotic looping via sliding-window Shannon entropy. Sotis implements this as a live runtime monitor with a calibrated threshold of H=1.5 bits over a 5-step window.
Super-linear reliability decay — agent success rates decay faster than mathematically expected because errors are positively correlated across steps. A confused agent stays confused. Sotis acts as a circuit breaker that resets the error correlation coefficient by starting fresh from a verified checkpoint.
Episodic memory failures — the paper demonstrates that naive memory scaffolds universally degrade long-horizon performance by accumulating context overhead. Sotis uses controlled checkpointed resets instead of continuous memory accumulation.
Graceful Degradation Score (GDS) — rather than binary pass/fail, Sotis scores partial task completion using weighted subtask graphs, preserving measured progress across reset boundaries.
Performance
| Metric | Result |
|---|---|
| Entropy + loop detection latency | < 0.2ms per step |
| Context distillation token reduction | 86.14% |
| Test suite | 127 tests, 88% coverage |
| Live recovery | Verified on circular import and AST recursive loop traps |
Full empirical ledger: performance_metrics.txt
Project Structure
sotis/
core/ # Entropy, loop detection, checkpoint, decomposition, GDS
lib/ # ReAct runtime, LangGraph integration, LLM adapters
obs/ # Streamlit dashboard + structured JSON logger
bench/ # Benchmark harness and task generators
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sotis-1.0.2.tar.gz.
File metadata
- Download URL: sotis-1.0.2.tar.gz
- Upload date:
- Size: 68.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d8c46a8752c4102054171aa744613458aec2ff65a4e1c01bae9a6c42f76a930
|
|
| MD5 |
4fee40908d0b29fb9e9f9b2d31621fdc
|
|
| BLAKE2b-256 |
b7f92680dd2625348df5fe4eb92ccc05e42ec0c2f1af5db20df07c641b79c7fe
|
File details
Details for the file sotis-1.0.2-py3-none-any.whl.
File metadata
- Download URL: sotis-1.0.2-py3-none-any.whl
- Upload date:
- Size: 54.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6ae7e1804f24ef6d14f13b12325d6ebb2efb9741aa06969c59cf9b1f2fc72b2
|
|
| MD5 |
00eeb6c05c0742c74ea6933e003dd52f
|
|
| BLAKE2b-256 |
d945b6b744af9917b01f7d907301b927efabdcea8d6e40d6b78fa565794062e0
|