Skip to main content

Context garbage collector for long-running LLM agents — offload and recall to save tokens (Claude Code & Codex via MCP)

Project description

LETHE

Live Ephemeral Token & History Engine — a model-agnostic context garbage collector for long-running LLM agents.

PyPI Python License: Unlicense

🌍 This README is bilingual. English · Español


🔌 Use it in Claude Code or Codex (save tokens now)

LETHE ships as an MCP server. Two lines and your agent can offload big outputs out of its context and recall them on demand — fewer tokens on every long task. / LETHE viene como servidor MCP. Dos líneas y tu agente descarga outputs grandes fuera del contexto y los recupera cuando los necesita — menos tokens en cada tarea larga.

Claude Code:

pip install "lethe-llm-context[mcp]"
claude mcp add lethe -- lethe-mcp

Codex: add an MCP block to ~/.codex/config.toml — see integrations/codex/mcp-config.md.

Then drop in the guiding skill so it happens automatically: integrations/claude-code/SKILL.md.

Tools exposed: lethe_archive · lethe_recall · lethe_status. Full guide: integrations/claude-code/mcp-config.md.


English

When an LLM agent runs a long task (tens to hundreds of steps), its context window fills with material that was useful but no longer is: stale tool outputs, files read 30 steps ago, dead reasoning branches. This causes three failures: quality decay (relevant tokens buried under noise), cost growth (every turn re-sends the bloated history), and hard limits (the agent eventually hits the context ceiling and breaks).

LETHE sits inside the agent loop and manages the live context like an operating system manages virtual memory. A multi-agent core scores each context block's relevance to the current goal, compacts finished work into dense notes, and pages cold material to an external store — losslessly, so anything can be recalled on demand.

The mental model (OS analogy)

Operating system LETHE
Physical RAM The context window (working set)
Disk External store (SQLite)
Page-table entry Stub / handle left in context
Page-in on fault Rehydrating an evicted block
Eviction policy Curator (relevance scoring)
Cold-page compression Compactor (consolidation notes)
Wired / non-swappable memory Pinned blocks

The three workers

  • Curator — scores each block 0..1 for relevance to the current goal (heuristics + a cheap model).
  • Compactor — replaces runs of finished steps with one dense summary note.
  • Archivist — pages cold blocks to the store and brings them back on demand.

A Scheduler orchestrates them on triggers (every K steps, or when over budget).

Status & progress / Estado y progreso

This repository is being built as a vertical slice first: the full block lifecycle working end-to-end with a single provider (Claude), proven by a needle-in-haystack test, before adding multi-provider, ensemble curation, embeddings, and the MCP adapter.

Each milestone ships as a tagged release. Full notes in CHANGELOG.md.

Version Milestone What it does / Qué hace Status
v0.1.0 A — Foundation Core types, fake adapter, stores — the testable bedrock
v0.2.0 B — Heuristic Engine Curator + Scheduler + Manager: score & evict under budget
v0.3.0 C — Compactor Summarize finished runs into dense notes
v0.4.0 D — Archivist & Paging Lossless paging + recall + needle test (1721→197 tok, ~89% ↓)
v0.5.0 E — Visualizer + Claude Live console view + real Claude adapter + runnable demos
v0.6.0 MCP server lethe_archive/recall/status for Claude Code + Codex, plus guiding skill

🎉 Vertical slice complete and shipping via MCP. Next: PyPI + MCP registry publish, then multi-provider, ensemble, and embeddings — each its own spec → plan → release cycle.

See the design and plan:

  • docs/specs/2026-06-12-lethe-vertical-slice-design.md — approved design
  • docs/plans/2026-06-12-lethe-vertical-slice.md — task-by-task implementation plan
  • docs/LETHE_engineering_design.md — the full long-term engineering design

Quickstart (no API key needed)

python -m pytest -q                  # run the full test suite, including the needle test
python -m lethe.examples.fake_loop   # WATCH it work: live view, blocks paging out, budget held

Real Claude demo

$env:ANTHROPIC_API_KEY="sk-..."   # PowerShell
python -m lethe.examples.claude_loop

License

Released into the public domain under the Unlicense. Free for everyone, anywhere.


Español

Cuando un agente LLM ejecuta una tarea larga (decenas o cientos de pasos), su ventana de contexto se llena de material que fue útil pero ya no lo es: resultados de herramientas obsoletos, archivos leídos hace 30 pasos, ramas de razonamiento muertas. Esto provoca tres fallos: pérdida de calidad (lo relevante queda enterrado entre ruido), aumento de costo (cada turno reenvía todo el historial inflado) y límites duros (el agente acaba chocando con el techo de contexto y se rompe).

LETHE vive dentro del bucle del agente y gestiona el contexto vivo como un sistema operativo gestiona la memoria virtual. Un núcleo multi-agente puntúa la relevancia de cada bloque respecto al objetivo actual, compacta el trabajo terminado en notas densas, y pagina el material frío a un almacén externo — sin pérdida, de modo que todo se puede recuperar cuando haga falta.

El modelo mental (analogía con el SO)

Sistema operativo LETHE
Memoria RAM La ventana de contexto (working set)
Disco Almacén externo (SQLite)
Entrada de tabla de páginas Stub / handle que queda en contexto
Traer página al fallar Rehidratar un bloque expulsado
Política de expulsión Curator (puntúa relevancia)
Compresión de páginas frías Compactor (notas de consolidación)
Memoria fija / no intercambiable Bloques fijados (pinned)

Los tres trabajadores

  • Curator — puntúa cada bloque 0..1 según su relevancia al objetivo actual (heurísticas + un modelo barato).
  • Compactor — reemplaza secuencias de pasos terminados por una nota-resumen densa.
  • Archivist — pagina los bloques fríos al almacén y los recupera cuando se necesitan.

Un Scheduler los coordina mediante disparadores (cada K pasos, o al exceder el presupuesto).

Estado

Este repositorio se construye primero como un corte vertical: el ciclo de vida completo de un bloque funcionando de punta a punta con un solo proveedor (Claude), demostrado por una prueba de "aguja en el pajar", antes de añadir multi-proveedor, curación por ensamble, embeddings y el adaptador MCP.

Consulta el diseño y el plan:

  • docs/specs/2026-06-12-lethe-vertical-slice-design.md — diseño aprobado
  • docs/plans/2026-06-12-lethe-vertical-slice.md — plan de implementación tarea por tarea
  • docs/LETHE_engineering_design.md — el diseño de ingeniería completo a largo plazo

Inicio rápido (sin API key)

python -m pytest -q                  # corre toda la suite, incluida la prueba de la aguja
python -m lethe.examples.fake_loop   # VELO funcionar: vista en vivo, bloques paginándose, presupuesto sostenido

Demo con Claude real

$env:ANTHROPIC_API_KEY="sk-..."   # PowerShell
python -m lethe.examples.claude_loop

Licencia

Liberado al dominio público bajo la Unlicense. Libre para todos, en cualquier lugar.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lethe_llm_context-0.6.1.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lethe_llm_context-0.6.1-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file lethe_llm_context-0.6.1.tar.gz.

File metadata

  • Download URL: lethe_llm_context-0.6.1.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for lethe_llm_context-0.6.1.tar.gz
Algorithm Hash digest
SHA256 6e35af5d7a4c56ac2c166be67bc25c83fed5fdfba8c8f8599535d3ea98efbe6d
MD5 1dde47099a902cfc06ef7dfe18f96b57
BLAKE2b-256 0a1efd63b54263c01e9758661064221c3f853e7cc7ac71ae427155110de6770d

See more details on using hashes here.

File details

Details for the file lethe_llm_context-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for lethe_llm_context-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c00454c06b3338575045d2b43e6ea9a43164443ce372ec54c01c0c5fd8296822
MD5 4a0a40ffb276f060e12175c050f36bcd
BLAKE2b-256 fdbd8a9cbde840fb86e201a3e814d0c3c3745e3a29cf71adbe5338f1f86fdd98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page