Skip to main content

Context garbage collector for long-running LLM agents — offload and recall to save tokens (Claude Code & Codex via MCP)

Project description

LETHE

Live Ephemeral Token & History Engine — a model-agnostic context garbage collector for long-running LLM agents.

PyPI Python License: Unlicense

🌍 This README is bilingual. English · Español


🔌 Use it in Claude Code or Codex (save tokens now)

LETHE ships as an MCP server. Two lines and your agent can offload big outputs out of its context and recall them on demand — fewer tokens on every long task. / LETHE viene como servidor MCP. Dos líneas y tu agente descarga outputs grandes fuera del contexto y los recupera cuando los necesita — menos tokens en cada tarea larga.

Claude Code:

pip install "lethe-llm-context[mcp]"
claude mcp add lethe -- lethe-mcp

Codex: add an MCP block to ~/.codex/config.toml — see integrations/codex/mcp-config.md.

Then drop in the guiding skill so it happens automatically: integrations/claude-code/SKILL.md.

Tools exposed: lethe_archive · lethe_recall · lethe_status. Full guide: integrations/claude-code/mcp-config.md.


English

When an LLM agent runs a long task (tens to hundreds of steps), its context window fills with material that was useful but no longer is: stale tool outputs, files read 30 steps ago, dead reasoning branches. This causes three failures: quality decay (relevant tokens buried under noise), cost growth (every turn re-sends the bloated history), and hard limits (the agent eventually hits the context ceiling and breaks).

LETHE sits inside the agent loop and manages the live context like an operating system manages virtual memory. A multi-agent core scores each context block's relevance to the current goal, compacts finished work into dense notes, and pages cold material to an external store — losslessly, so anything can be recalled on demand.

The mental model (OS analogy)

Operating system LETHE
Physical RAM The context window (working set)
Disk External store (SQLite)
Page-table entry Stub / handle left in context
Page-in on fault Rehydrating an evicted block
Eviction policy Curator (relevance scoring)
Cold-page compression Compactor (consolidation notes)
Wired / non-swappable memory Pinned blocks

The three workers

  • Curator — scores each block 0..1 for relevance to the current goal (heuristics + a cheap model).
  • Compactor — replaces runs of finished steps with one dense summary note.
  • Archivist — pages cold blocks to the store and brings them back on demand.

A Scheduler orchestrates them on triggers (every K steps, or when over budget).

Status & progress / Estado y progreso

This repository is being built as a vertical slice first: the full block lifecycle working end-to-end with a single provider (Claude), proven by a needle-in-haystack test, before adding multi-provider, ensemble curation, embeddings, and the MCP adapter.

Each milestone ships as a tagged release. Full notes in CHANGELOG.md.

Version Milestone What it does / Qué hace Status
v0.1.0 A — Foundation Core types, fake adapter, stores — the testable bedrock
v0.2.0 B — Heuristic Engine Curator + Scheduler + Manager: score & evict under budget
v0.3.0 C — Compactor Summarize finished runs into dense notes
v0.4.0 D — Archivist & Paging Lossless paging + recall + needle test (1721→197 tok, ~89% ↓)
v0.5.0 E — Visualizer + Claude Live console view + real Claude adapter + runnable demos
v0.6.0 MCP server lethe_archive/recall/status for Claude Code + Codex, plus guiding skill

🎉 Vertical slice complete and shipping via MCP. Next: PyPI + MCP registry publish, then multi-provider, ensemble, and embeddings — each its own spec → plan → release cycle.

See the design and plan:

  • docs/specs/2026-06-12-lethe-vertical-slice-design.md — approved design
  • docs/plans/2026-06-12-lethe-vertical-slice.md — task-by-task implementation plan
  • docs/LETHE_engineering_design.md — the full long-term engineering design

Quickstart (no API key needed)

python -m pytest -q                  # run the full test suite, including the needle test
python -m lethe.examples.fake_loop   # WATCH it work: live view, blocks paging out, budget held

Real Claude demo

$env:ANTHROPIC_API_KEY="sk-..."   # PowerShell
python -m lethe.examples.claude_loop

License

Released into the public domain under the Unlicense. Free for everyone, anywhere.


Español

Cuando un agente LLM ejecuta una tarea larga (decenas o cientos de pasos), su ventana de contexto se llena de material que fue útil pero ya no lo es: resultados de herramientas obsoletos, archivos leídos hace 30 pasos, ramas de razonamiento muertas. Esto provoca tres fallos: pérdida de calidad (lo relevante queda enterrado entre ruido), aumento de costo (cada turno reenvía todo el historial inflado) y límites duros (el agente acaba chocando con el techo de contexto y se rompe).

LETHE vive dentro del bucle del agente y gestiona el contexto vivo como un sistema operativo gestiona la memoria virtual. Un núcleo multi-agente puntúa la relevancia de cada bloque respecto al objetivo actual, compacta el trabajo terminado en notas densas, y pagina el material frío a un almacén externo — sin pérdida, de modo que todo se puede recuperar cuando haga falta.

El modelo mental (analogía con el SO)

Sistema operativo LETHE
Memoria RAM La ventana de contexto (working set)
Disco Almacén externo (SQLite)
Entrada de tabla de páginas Stub / handle que queda en contexto
Traer página al fallar Rehidratar un bloque expulsado
Política de expulsión Curator (puntúa relevancia)
Compresión de páginas frías Compactor (notas de consolidación)
Memoria fija / no intercambiable Bloques fijados (pinned)

Los tres trabajadores

  • Curator — puntúa cada bloque 0..1 según su relevancia al objetivo actual (heurísticas + un modelo barato).
  • Compactor — reemplaza secuencias de pasos terminados por una nota-resumen densa.
  • Archivist — pagina los bloques fríos al almacén y los recupera cuando se necesitan.

Un Scheduler los coordina mediante disparadores (cada K pasos, o al exceder el presupuesto).

Estado

Este repositorio se construye primero como un corte vertical: el ciclo de vida completo de un bloque funcionando de punta a punta con un solo proveedor (Claude), demostrado por una prueba de "aguja en el pajar", antes de añadir multi-proveedor, curación por ensamble, embeddings y el adaptador MCP.

Consulta el diseño y el plan:

  • docs/specs/2026-06-12-lethe-vertical-slice-design.md — diseño aprobado
  • docs/plans/2026-06-12-lethe-vertical-slice.md — plan de implementación tarea por tarea
  • docs/LETHE_engineering_design.md — el diseño de ingeniería completo a largo plazo

Inicio rápido (sin API key)

python -m pytest -q                  # corre toda la suite, incluida la prueba de la aguja
python -m lethe.examples.fake_loop   # VELO funcionar: vista en vivo, bloques paginándose, presupuesto sostenido

Demo con Claude real

$env:ANTHROPIC_API_KEY="sk-..."   # PowerShell
python -m lethe.examples.claude_loop

Licencia

Liberado al dominio público bajo la Unlicense. Libre para todos, en cualquier lugar.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lethe_llm_context-0.6.2.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lethe_llm_context-0.6.2-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file lethe_llm_context-0.6.2.tar.gz.

File metadata

  • Download URL: lethe_llm_context-0.6.2.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for lethe_llm_context-0.6.2.tar.gz
Algorithm Hash digest
SHA256 92a07958bce00cfd49399047cae1f0dac86770ecacc5a8e8b294059fdef16767
MD5 7d58b6422327df5977dfa21b091f196d
BLAKE2b-256 e1227418705d2d411619c1bfdb45b95801430662c6228a6e019bc335c0b2ce36

See more details on using hashes here.

File details

Details for the file lethe_llm_context-0.6.2-py3-none-any.whl.

File metadata

File hashes

Hashes for lethe_llm_context-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2f7e78ddce5baad6074ac2b32be268d376609db0482f22f505bb9f8873aa86bf
MD5 75e2ab08be3e4849cbb61807f9025f88
BLAKE2b-256 dcdc6271730efd513cc21101f0ca95442b468de1949282cf8bdc01a42a113e56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page