Lossless Context Management (LCM) for Hermes Agent — DAG-based conversation summarization that never forgets

These details have not been verified by PyPI

Project links

Project description

lossless-hermes-py

DAG-based lossless context management for LLM conversations. Never lose a message — summarize them into a directed acyclic graph.

A Python port of lossless-claw for use with Hermes Agent or as a standalone library.

What It Does

Long conversations with LLMs hit context window limits. The typical solution is to truncate or naively summarize, losing information permanently. Lossless Context Management (LCM) takes a different approach: it builds a directed acyclic graph of summaries that compresses older messages while preserving the ability to drill back into any detail.

Your conversation never loses information. It just gets more compact.

See the animated visualization at losslesscontext.ai.

How It Works

LCM operates in two compaction passes that build a summary DAG:

Leaf pass — When the context window fills up, raw messages (excluding a protected "fresh tail" of recent messages) are chunked and summarized into leaf nodes at depth 0. Each leaf summary covers a group of messages and links back to the originals.

Condensed pass — When enough leaf summaries accumulate, they are themselves summarized into condensed nodes at depth 1. This process repeats upward: depth-1 summaries get condensed into depth-2, and so on. The result is a tree-like DAG where the root captures the entire conversation at high compression, and any branch can be expanded to recover full detail.

Context assembly reconstructs the optimal prompt by combining:

The highest-level summaries (covering the full history compactly)
The fresh tail (recent messages kept verbatim for continuity)

Cache-aware compaction adapts to prompt caching behavior. When the cache is hot (high hit rate), compaction backs off to avoid invalidating cached prefixes. When the cache is cold, compaction runs more aggressively.

Agent tools (lcm_grep, lcm_expand, lcm_describe) let the LLM search and drill into compacted history on demand, recovering detail without keeping everything in context.

Installation

Hermes Agent Plugin (recommended)

hermes plugins install mssteuer/lossless-hermes-py

Then enable the engine in ~/.hermes/config.yaml (top level, not under agent:):

context:
  engine: lcm

(Optional) Configure a dedicated summarization model via environment variables in ~/.hermes/.env:

LCM_SUMMARY_MODEL=gemini-2.5-flash
LCM_SUMMARY_PROVIDER=openai          # your litellm provider name

Or edit the plugin's plugin.yaml directly:

nano ~/.hermes/plugins/lossless-hermes/plugin.yaml

Restart the gateway:

hermes gateway restart

The agent will now have lcm_grep, lcm_describe, and lcm_expand tools available.

Via pip

pip install lossless-hermes-py

This is useful for standalone library usage (see below) or if you prefer managing Python packages separately. For use with Hermes, the hermes plugins install method above is simpler.

Standalone Usage

LCM also works as a standalone library without Hermes:

from lossless_hermes import LcmContextEngine

engine = LcmContextEngine(
    model="gemini-2.5-flash",
    provider="openai",
    config_context_length=128000,
)

# Start a session
engine.on_session_start("my-session")

# Check if compaction is needed
if engine.should_compress(prompt_tokens=100000):
    compressed = engine.compress(messages, current_tokens=100000)

Configuration

Settings are resolved with three-tier precedence: environment variables > plugin.yaml > defaults.

Key Settings

Setting	Default	Description
`enabled`	`true`	Enable/disable LCM
`context_threshold`	`0.75`	Fraction of context window that triggers compaction
`fresh_tail_count`	`64`	Number of recent messages kept verbatim
`fresh_tail_max_tokens`	`null`	Optional token budget cap for fresh tail
`leaf_chunk_tokens`	`20000`	Max tokens per leaf chunk
`leaf_target_tokens`	`2400`	Target summary size for leaf nodes
`condensed_target_tokens`	`2000`	Target summary size for condensed nodes
`leaf_min_fanout`	`8`	Minimum messages per leaf chunk
`condensed_min_fanout`	`4`	Minimum summaries per condensed chunk
`condensed_min_fanout_hard`	`2`	Hard minimum for condensed chunks
`incremental_max_depth`	`1`	Max depth levels to compact per pass
`summary_provider`	`""`	LLM provider for summarization (falls back to host agent's provider)
`summary_model`	`""`	Model for summarization (falls back to host agent's model)
`summary_timeout_ms`	`60000`	Timeout for summarization calls
`circuit_breaker_threshold`	`5`	Consecutive failures before circuit opens
`circuit_breaker_cooldown_ms`	`1800000`	Cooldown before retrying after circuit break (30 min)

Cache-Aware Compaction

config:
  cache_aware_compaction:
    enabled: true
    cache_ttl_seconds: 300
    max_cold_cache_catchup_passes: 2
    hot_cache_pressure_factor: 4
    hot_cache_budget_headroom_ratio: 0.2
    cold_cache_observation_threshold: 3

Dynamic Leaf Chunk Sizing

config:
  dynamic_leaf_chunk_tokens:
    enabled: true
    max: 40000

Environment Variables

All config keys can be set via environment variables with the LCM_ prefix:

export LCM_SUMMARY_MODEL="gemini-2.5-flash"
export LCM_FRESH_TAIL_COUNT=32
export LCM_CONTEXT_THRESHOLD=0.8

Tools

LCM exposes three tools that the agent can call to interact with compacted history:

`lcm_grep`

Search conversation history and summaries using FTS5 full-text search or regex.

{
  "query": "database migration strategy",
  "mode": "full_text",
  "include_messages": true,
  "include_summaries": true,
  "limit": 20
}

`lcm_describe`

Get the current LCM state: summary statistics, DAG depth, message counts, recent compaction activity.

{
  "include_stats": true,
  "include_recent": true
}

`lcm_expand`

Drill into specific content. Expand a message to see its full text, a summary to see its children and linked messages, or search for related content across conversations.

{
  "target_type": "summary",
  "target_id": "abc123"
}

{
  "target_type": "related",
  "query": "authentication flow",
  "limit": 10
}

Architecture

src/lossless_hermes/
    __init__.py          # LcmContextEngine — main plugin class, Hermes integration
    compaction.py        # CompactionEngine — leaf/condensed passes, cache-aware policy
    assembler.py         # ContextAssembler — reconstructs optimal context from DAG
    summarizer.py        # LLM summarization with circuit breaker pattern
    retrieval.py         # RetrievalEngine — FTS5 search, related content discovery
    tokens.py            # Unicode-aware token estimation (CJK, emoji)
    tools.py             # lcm_grep, lcm_describe, lcm_expand tool implementations
    db/
        config.py        # Three-tier config resolution (env > yaml > defaults)
        connection.py    # SQLite connection management (WAL mode)
        migration.py     # Schema migrations
    store/
        conversation.py  # ConversationStore — messages, parts, sequences
        summary.py       # SummaryStore — DAG nodes, edges, depth stats
        identity.py      # Content identity hashing for deduplication

Storage

SQLite with WAL mode for concurrent reads. FTS5 virtual tables for full-text search across messages and summaries. All data is local — no external services beyond the LLM provider.

Differences from the TypeScript Version

This is a Python port of the original lossless-claw TypeScript implementation. Key adaptations:

Target platform: Hermes Agent (NousResearch) Python plugin system instead of OpenClaw
Plugin interface: Implements ContextEngine base class from agent.context_engine with register() entry point
LLM calls: Provider-agnostic via configurable summarizer (supports litellm, direct API calls) rather than being tied to a specific SDK
Default summarization model: Gemini 2.5 Flash (configurable to any model)
Async model: Synchronous by default (matching the Hermes plugin interface) rather than async-first
Config resolution: Three-tier env/yaml/defaults pattern adapted for Python conventions
Token estimation: Custom Unicode-aware estimator (tokens.py) with CJK and emoji handling
Database: Same SQLite/FTS5 approach, using Python's built-in sqlite3 module
Standalone support: Works both as a Hermes plugin and as a standalone library — the ContextEngine import is optional

The core algorithm — DAG-based compaction with leaf and condensed passes, fresh-tail protection, cache-aware compaction policies — is a faithful port of the original.

Credits

lossless-claw by Josh Lehman / Martian Engineering (MIT License) — the original TypeScript implementation this project is ported from
The LCM Paper by Voltropy — the academic foundation for lossless context management
Hermes Agent by NousResearch — the target platform whose context engine plugin system this integrates with

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.5

Apr 15, 2026

0.1.4

Apr 15, 2026

0.1.3

Apr 15, 2026

0.1.2

Apr 15, 2026

0.1.1

Apr 15, 2026

This version

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lossless_hermes_py-0.1.0.tar.gz (48.8 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lossless_hermes_py-0.1.0-py3-none-any.whl (46.0 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file lossless_hermes_py-0.1.0.tar.gz.

File metadata

Download URL: lossless_hermes_py-0.1.0.tar.gz
Upload date: Apr 15, 2026
Size: 48.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for lossless_hermes_py-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`97443fa1e0ca81c9a49d9de267ef7d3a6560f7bfd6ecf326c6882506ad3c6abe`
MD5	`b2bbc03337ab6e0c41c6ceaf8a766893`
BLAKE2b-256	`713fdb1673aea4a72fb911638987484acc8c036035d1a83d09ef80b6b080196b`

See more details on using hashes here.

File details

Details for the file lossless_hermes_py-0.1.0-py3-none-any.whl.

File metadata

Download URL: lossless_hermes_py-0.1.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 46.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for lossless_hermes_py-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ec5c8e980860504424450b433d59364b629ed501544b76235f0ec145fe2e316`
MD5	`c4b6b83ecc68a5787a53b1ea01e39c54`
BLAKE2b-256	`82aa76fffaa76b76f9609cd09c795cc72b2632cbb04aed0c8bcf23a3b1eadc03`

See more details on using hashes here.

lossless-hermes-py 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

lossless-hermes-py

What It Does

How It Works

Installation

Hermes Agent Plugin (recommended)

Via pip

Standalone Usage

Configuration

Key Settings

Cache-Aware Compaction

Dynamic Leaf Chunk Sizing

Environment Variables

Tools

lcm_grep

lcm_describe

lcm_expand

Architecture

Storage

Differences from the TypeScript Version

Credits

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`lcm_grep`

`lcm_describe`

`lcm_expand`