Skip to main content

Persistent, structured memory layer for LLM agents — MCP-native, zero infra, SQLite-backed.

Project description

hermes-memory

Persistent, structured memory for LLM agents. MCP-native. Zero infrastructure. SQLite-backed. Language-agnostic.


The problem

During long sessions, context compression removes older messages. Once compressed, those messages are gone from active context, not in memory files, and not searchable — even mid-session.

The agent forgets what was decided three hours ago.

The approach

hermes-memory does not fight compression. It works before it.

Every constraint, decision, and value is extracted into a structured fact using a terse notation (MEMORY_SPEC). Facts are stored in SQLite with full-text search, a two-tier hot/cold architecture, and automatic scope lifecycle management.

The context injection stays under 180 tokens regardless of session length. Cold facts are retrieved on demand with zero token waste.

Notation (MEMORY_SPEC v1.0)

C[target]: constraint   — immutable project law
D[target]: decision     — technical choice, validated
V[target]: value        — IP, port, key, URL, variable
?[target]: unknown      — open question
✓[target]: done         — scope archivable
~[target]: obsolete     — replaces a previous fact

Examples:

C[db.id]: UUID mndtry, nvr autoincrement        (-65% vs raw message)
D[auth]: JWT 7j refresh 6j                      (-70%)
V[srv.prod]: api.example.com:3005               (-74%)
✓[auth]: deployed prod                          (-78%)

How memory stays bounded

Facts live in one of two tiers:

HOT    active facts, injected at session start, ~150 tokens
COLD   SQLite, unlimited, retrieved on demand by FTS5 search

When the hot tier fills, pressure levels act automatically:

70%    merge duplicate facts sharing same target+scope
85%    push closed-scope facts from cold to archived
95%    consolidate via LLM call (last resort) or push oldest to cold

The cold tier has no size limit. A search on cold returns max 20 facts.

Managing MEMORY.md pressure

MEMORY.md is injected into every turn — keeping it compact reduces token cost on every single call. hermes-memory acts as a relief valve: facts that accumulate in MEMORY.md can be migrated to the DB and retrieved on demand.

Rules in order when MEMORY.md fills up:

  1. Abbreviate first — 40-60% reduction with standard shorthands: pr=for, req=required, cfg=config, w/=with, =then/to, =upgrade, drop articles, filler words, and trailing punctuation.

  2. Migrate structured facts — any C/D/V entry not needed every turn belongs in hermes-memory, not MEMORY.md:

    # Before (MEMORY.md, ~110 chars):
    "Database IDs must always be UUID, never autoincrement."
    
    # After — migrated:
    memory_write("C[db.id]: UUID mndtry, nvr autoincrement")
    # Entry removed from MEMORY.md, retrieved via memory_search("db id")
    
  3. Remove duplicates — facts already in the DB don't need to be in MEMORY.md.

Automated relief (cron pattern):

Run 2x/day. Check memory_status(). If MEMORY > 55% or USER > 55%:
- Abbreviate verbose entries
- Migrate C/D/V facts to memory_write()
- Remove entries already in hermes-memory DB
Target: both stores below 45%. If already under threshold, do nothing.

Scope lifecycle

A scope is a unit of work (feature, phase, bug fix). It opens implicitly on first fact write and closes when:

  1. A closing signal appears in the message ("merged", "deployed", "it works")
  2. Six turns pass with no reference to the scope
  3. Three consecutive turns write facts for a different scope

Closed scopes move to cold automatically. Their facts never pollute future sessions.

Installation

pip install hermes-memory

MCP server

hermes-memory

Or in your MCP config:

{
  "mcpServers": {
    "hermes-memory": {
      "command": "hermes-memory"
    }
  }
}

Set HERMES_MEMORY_DB to override the default storage path (~/.hermes/memory.db).

Tools

Tool When to call
memory_write(content, scope?) Any constraint, decision, or value established
memory_search(query, scope?, limit?) Before answering on a topic with history
memory_tick(turn, message?) Every user message
memory_status() Session start
memory_reflect(topic, limit?) User asks about history on a topic
memory_export(scope?, status?) Snapshot all facts as plain notation
memory_purge(scope?, older_than_days?) After closing a scope or periodic GC

System prompt block

Add to your agent's system prompt (output of memory_status):

[MEMORY_SPEC v1.0]

NOTATION
C[t]: constraint  D[t]: decision  V[t]: value
?[t]: unknown     ✓[t]: done      ~[t]: obsolete
-> flows  ! critical  group by key

ABBREVS
cfg impl msg req usr resp prod feat dev deps auth err db btn
env doc perf init mgmt refct mvmt notif perms val async sync
mndtry nvr alw tmp idx tbl svc pkg repo api clt srv

RULES
- call memory_write() for any C/D/V/? detected
- call memory_search() before answering on known topics
- call memory_tick(turn, message) on every user message

Compatibility

Works with any MCP-compatible agent: Hermes, Claude Desktop, Cursor, Continue, and others. No cloud. No API key. No embedding model required.

Architecture

hermes_memory/
    core/
        db.py        SQLite connection, schema, constants
        facts.py     CRUD, contradiction detection, FTS5 search
        scopes.py    scope lifecycle, auto-cooling, topic shift
        gauge.py     pressure levels, merge, archive, synthesis
    mcp/
        server.py    MCP server, 7 tools
    spec/
        MEMORY_SPEC.md   notation reference

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hermes_memory-0.3.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hermes_memory-0.3.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file hermes_memory-0.3.0.tar.gz.

File metadata

  • Download URL: hermes_memory-0.3.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hermes_memory-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d6dc2424f541f26f6d2ccdd617d15c4ade29e6de08bc4b34fda73d446d2dac08
MD5 de34289ffa1654f89ead8259c093feb9
BLAKE2b-256 45ef5df10f1b7e6f622963181660a4579a06739132cab6627eff7b91d631962b

See more details on using hashes here.

File details

Details for the file hermes_memory-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: hermes_memory-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hermes_memory-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 862451d60787cfcd76adfd949d47d57a3e41a62f45614c8470e1dc0c77623c40
MD5 001df389566268015049da98c429064c
BLAKE2b-256 9bfb0c88cdb46d56b6873e8ede9354f9b2dcda72b30a9bb0d5ccf386734e7701

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page