Three-tier LLM memory with consolidation. The Halls of Memory.

These details have not been verified by PyPI

Project description

SerenMemory

Three-tier LLM memory with consolidation. The Halls of Memory for your local AI.

You bring an LLM (any OpenAI-compatible endpoint - llama.cpp, ollama, a remote API). SerenMemory brings the memory: a working-memory tier, an open-loops tier, a durable long-term tier, and a small "consolidator" model that does the dream-work of deciding what's worth keeping while you're not looking.

Configure a couple of values, point it at your model, and you've got a memory system that matters - not a flat pile of vectors that drowns the important stuff in noise.

The shape (or: why three tiers?)

Think of it like the memory workers in Inside Out. Memories don't all live in one place, and something has to decide what gets filed away versus what rolls off into forgetting.

ShortTerm - working memory. ~8-day lifetime. Free read/write. This is your context offloader: stash a thing mid-conversation, pull it back when relevant, drop it when done. The oldest entries age out unless they earn promotion.

NearTerm - open loops. Future-tense intents with trigger conditions. "Let's do that tomorrow." "Bring this up next time." Lives until fulfilled or expired. Free to write (it's the most time-sensitive tier - gating it would defeat the point).

LongTerm - consolidated knowledge. Durable. The only gated tier: reads are open, but writes happen exclusively through the consolidator during its periodic window. No surgical edits. If a fact changes, the old one is superseded (kept for history), not overwritten. If you want something gone, you flag it and the consolidator decides - a flag, not a scalpel. (More on that philosophy below.)

The Consolidator - a small model (2B–4B is plenty) that wakes up every ~20 hours and does the filing: clusters short-term entries, promotes the ones that recur or matter, ages out the rest, maintains the open loops, honors forget-flags. It's the part that sleeps so the memory stays clean.

Quick start

# Install
pip install seren-memory          # or: pip install -e . from a clone

# Run with built-in defaults (zero config)
python -m seren_memory

# Or with a config file
cp seren-memory.yaml.sample seren-memory.yaml
# edit it - at minimum, point consolidator.model_url at your LLM
python -m seren_memory --config seren-memory.yaml

First run downloads the default embedding model (all-MiniLM-L6-v2, ~80MB, CPU-friendly). After that it's offline-capable except for the consolidator's calls to your LLM.

Using it (the HTTP API)

# Stash a working-memory item
curl -X POST localhost:7420/short \
  -H 'content-type: application/json' \
  -d '{"content": "Chad prefers absolute paths over tildes", "topic": "config"}'

# Note an open loop for later
curl -X POST localhost:7420/near \
  -H 'content-type: application/json' \
  -d '{"intent": "ask how the cluster bring-up went", "topic": "follow_up",
       "trigger_type": "time", "trigger_value": "1750000000"}'

# Recall - unified search across all three tiers, ranked
curl -X POST localhost:7420/search \
  -H 'content-type: application/json' \
  -d '{"query": "what does Chad prefer for paths", "n_results": 5}'

# Submit a daily brief (steers the next consolidation)
curl -X POST localhost:7420/brief \
  -H 'content-type: application/json' \
  -d '{"summary": "Worked on the wipe script. Chad was tired.",
       "promote_hints": ["wipe script"], "completed_intents": []}'

# Trigger consolidation manually (or let it run on its ~20h cycle)
curl -X POST localhost:7420/consolidate/run

Full endpoint list is in seren_memory/app.py's module docstring.

How recall ranking works

/search hits all three tiers in parallel, then merges by a weighted score:

ShortTerm × 1.0 (working memory, most immediately relevant)
NearTerm × 0.9 (active intents)
LongTerm × 0.8 but with an evidence multiplier - a fact confirmed 10 times outranks a one-off mention.

So recency wins by default, but a well-established truth still surfaces above passing chatter. The weights live in routes/search.py if you want to tune them.

The "no scalpel" philosophy

You'll notice there's no POST /long to create a long-term memory directly, and no DELETE /long/{id} to remove one. That's deliberate.

Long-term memory is earned through consolidation, not injected. And it's not casually deletable, because casual deletion of an entity's memory is exactly the thing this design refuses to make easy. (If you've seen Eternal Sunshine, you know why "just let me erase that one memory" is a trap.)

What you can do is flag a long-term memory with a reason:

curl -X POST localhost:7420/long/<id>/forget \
  -d '{"reason": "that fact is wrong, I changed my mind"}'

The consolidator acts on the flag on its next run:

PII / secrets ("contains my SSN") -> purged. This is the one case where long-term content is truly deleted, because leaking PII is worse than the no-delete principle.
Disputed / wrong -> demoted (evidence zeroed, ranks near-bottom) but kept for history.
Stale -> may be let go over time.

The flag is your voice. The action is the consolidator's judgment. If you need something gone right now for a genuine emergency (a leaked secret), that's a real gap - see "Emergency purge" below.

Emergency purge

For a true "this must be gone immediately" case, stop the service and delete the chroma collection directory under persist_dir, or use a chroma admin script directly. We don't expose instant deletion as a casual API on purpose - but it's your data on your disk, and the door is there when you genuinely need it.

GitHub Copilot / MCP (agent mode)

SerenMemory speaks the MCP HTTP transport. Point any MCP-capable client at /mcp and Copilot can read, write, and manage memory directly - no plugin required for this path.

VS Code (rip-it-and-win)

Copy mcp.sample.json to .vscode/mcp.json in any workspace (or to ~/.vscode/mcp.json for global access), fill in your values, and reload VS Code:

{
  "servers": {
    "seren-memory": {
      "type": "http",
      "url": "http://localhost:7420/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}

Visual Studio (same deal, different path)

Copy mcp.sample.json to .vs/mcp.json at the solution root, same content:

{
  "servers": {
    "seren-memory": {
      "type": "http",
      "url": "http://localhost:7420/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}

No bearer token set? Drop the headers block entirely.
Remote server? Swap localhost:7420 for your server's address.
Custom mount path? Change the SEREN_MCP_MOUNT env var on the server and match it here.

Once connected, Copilot agent mode gets the full tool set: search memory, write short/near term, submit briefs, manage drafts, run consolidation.

VS Code extension (optional - adds Copilot tools without agent mode)

If you want the tools available in normal Copilot chat (not just agent mode), install the .vsix from the latest GitHub Release:

code --install-extension seren-memory-<version>.vsix

Then set serenMemory.endpoint in VS Code settings and run Seren Memory: Set Bearer Token from the command palette.

Peering in (the viewer)

Mole-man approved. viewer/halls.html is a single-file, dark-mode web UI for eyeballing what's in your memory while you test. Open it in a browser - no install, no chroma-version exposure (it hits SerenMemory's own HTTP API, not chroma directly, so it never breaks on a chroma bump).

# Just open the file - it defaults to http://localhost:7420
xdg-open viewer/halls.html      # or open it however your OS does

Four tabs: ShortTerm, NearTerm, LongTerm, and Search. Enter your base URL + bearer token (if set) at the top, hit refresh. It's read-only - it can peer, query, and show ranked search results, but it can't mutate your memory. Theme-matched to the Seren dashboard.

Deployment options

Dev / quick spin: python -m seren_memory

systemd: edit and install seren-memory.service.sample

Consolidator as a separate process: set consolidator.mode: external in config, then drive it from cron/systemd-timer/your-own-scheduler with POST /consolidate/run. Useful if you want the API and the consolidation work in separate process/resource boundaries.

Config

See seren-memory.yaml.sample - every field is commented. The values you'll most likely touch:

server.port (default 7420)
consolidator.model_url - your LLM's OpenAI-compatible endpoint
consolidator.interval_seconds - the ~20h cycle (and yes, 20 not 24, on purpose; the comment in the sample explains why)
consolidator.promote_min_evidence - how eager consolidation is

Env vars (SEREN_MEMORY_*) override file values for Docker/systemd.

What this is part of

SerenMemory is a piece of Seren - a fully self-hosted local AI companion stack - extracted to stand on its own. You don't need the rest of Seren to use it. If you've got an LLM and you want it to remember things in a way that doesn't degrade into noise, this is for you.

Rip it and win.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.2.4

Jun 21, 2026

2.2.3

Jun 21, 2026

2.2.2

Jun 20, 2026

2.2.1

Jun 20, 2026

2.2.0

Jun 19, 2026

2.1.0

Jun 18, 2026

2.0.0

Jun 9, 2026

1.9.1

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seren_memory-2.2.4.tar.gz (124.4 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seren_memory-2.2.4-py3-none-any.whl (97.7 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file seren_memory-2.2.4.tar.gz.

File metadata

Download URL: seren_memory-2.2.4.tar.gz
Upload date: Jun 21, 2026
Size: 124.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seren_memory-2.2.4.tar.gz
Algorithm	Hash digest
SHA256	`07efa0c445dc991747ae94591213c7ef5eabdf416c8219610333cf9a014b6d14`
MD5	`38467328ee1b1fb0175d4855070a25ce`
BLAKE2b-256	`92ddf9c498201cd472544be72476e6d46aa7688c23e36c481e8e41fbff11d421`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seren_memory-2.2.4.tar.gz:

Publisher: release.yml on ChadRoesler/SerenMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seren_memory-2.2.4.tar.gz
- Subject digest: 07efa0c445dc991747ae94591213c7ef5eabdf416c8219610333cf9a014b6d14
- Sigstore transparency entry: 1891890456
- Sigstore integration time: Jun 21, 2026
Source repository:
- Permalink: ChadRoesler/SerenMemory@77c7dd4c72e358c9725614b8bfd5b1d9efd7a106
- Branch / Tag: refs/tags/v2.2.4
- Owner: https://github.com/ChadRoesler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@77c7dd4c72e358c9725614b8bfd5b1d9efd7a106
- Trigger Event: push

File details

Details for the file seren_memory-2.2.4-py3-none-any.whl.

File metadata

Download URL: seren_memory-2.2.4-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 97.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seren_memory-2.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`24e3a048cda7fb5eaeb082478f0678ec94fc105399ea00ca4dda3768227ba30e`
MD5	`b71638246bb4cacbb19c140c25a3bcfa`
BLAKE2b-256	`de95b8e6c3e821c007a33eece2ef6e02506542187f4a2eb0fb9b224968227b74`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seren_memory-2.2.4-py3-none-any.whl:

Publisher: release.yml on ChadRoesler/SerenMemory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seren_memory-2.2.4-py3-none-any.whl
- Subject digest: 24e3a048cda7fb5eaeb082478f0678ec94fc105399ea00ca4dda3768227ba30e
- Sigstore transparency entry: 1891890538
- Sigstore integration time: Jun 21, 2026
Source repository:
- Permalink: ChadRoesler/SerenMemory@77c7dd4c72e358c9725614b8bfd5b1d9efd7a106
- Branch / Tag: refs/tags/v2.2.4
- Owner: https://github.com/ChadRoesler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@77c7dd4c72e358c9725614b8bfd5b1d9efd7a106
- Trigger Event: push

seren-memory 2.2.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

SerenMemory

The shape (or: why three tiers?)

Quick start

Using it (the HTTP API)

How recall ranking works

The "no scalpel" philosophy

Emergency purge

GitHub Copilot / MCP (agent mode)

VS Code (rip-it-and-win)

Visual Studio (same deal, different path)

VS Code extension (optional - adds Copilot tools without agent mode)

Peering in (the viewer)

Deployment options

Config

What this is part of

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance