Skip to main content

Local-first persistent memory for AI agents — store, recall, and consolidate knowledge across sessions using FAISS, SQLite, and any LLM

Project description

consolidation-memory

CI Python 3.10+

Your AI forgets everything between sessions. This fixes that.

A local-first memory system that stores, retrieves, and consolidates knowledge across conversations. Episodes go in, structured knowledge comes out — automatically, via a background LLM that clusters and synthesizes what it's learned.

No cloud dependency. No subscriptions. Your data stays on your machine.

You: "My build is failing with a linker error"
AI:  (recalls your project uses CMake + MSVC on Windows)
     (recalls you hit the same error last month — it was a missing vcpkg dependency)
     "Last time this happened it was a missing vcpkg package. Want me to
      check if your vcpkg.json changed since we fixed it?"

How It Works

 ┌─────────┐     ┌───────────┐     ┌────────────┐
 │  Store   │────▶│  Embed    │────▶│ FAISS Index │
 │ episodes │     │ (any LLM) │     │ + SQLite DB │
 └─────────┘     └───────────┘     └──────┬─────┘
                                          │
                 ┌───────────┐     ┌──────▼─────┐
                 │ Knowledge │◀────│   Recall    │
                 │   Docs    │     │ (semantic)  │
                 └─────┬─────┘     └────────────┘
                       │
                ┌──────▼──────┐
                │ Consolidate │  ← background thread
                │ (cluster +  │    clusters episodes
                │  LLM synth) │    into knowledge docs
                └─────────────┘
  1. Store — Save episodes (facts, solutions, preferences) with embeddings into SQLite + FAISS
  2. Recall — Semantic search with priority scoring (surprise, recency, access frequency)
  3. Consolidate — Background LLM clusters related episodes and synthesizes structured markdown knowledge documents

How Consolidation Works

The consolidation engine runs on a background daemon thread (default: every 6 hours). It fetches all unconsolidated episodes, embeds them, and groups them using agglomerative hierarchical clustering with a configurable distance threshold. Each cluster represents a coherent topic.

For each cluster, the engine checks existing knowledge topics for semantic overlap. If a matching topic exists (above the topic-match threshold), the cluster's episodes are merged into the existing document. Otherwise, a new knowledge document is synthesized from scratch.

The LLM receives the cluster's episodes (with prompt injection patterns sanitized) and produces a structured markdown document with YAML frontmatter (title, summary, tags, confidence score). The engine validates the output, versions the previous document, writes the new one, and updates the SQLite metadata. Episodes that have been consolidated and aged past the prune threshold are soft-deleted to keep the FAISS index lean.

All backends retry transient failures with exponential backoff. If 3 consecutive clusters fail (indicating the LLM backend is down), consolidation aborts early rather than burning through timeouts.

Quick Start

pip install consolidation-memory[fastembed]
consolidation-memory init

That's it. FastEmbed runs locally, no external services needed.

MCP Server (Claude Desktop / Claude Code / Cursor)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "consolidation_memory": {
      "command": "consolidation-memory"
    }
  }
}

Nine tools become available:

Tool What it does
memory_store Save an episode (fact, solution, preference, exchange)
memory_store_batch Store multiple episodes in one call (single embed + FAISS batch)
memory_recall Semantic search over episodes + knowledge, with optional filters
memory_search Keyword/metadata search — works without embedding backend
memory_status System stats + health diagnostics + consolidation metrics
memory_forget Soft-delete an episode
memory_export Export everything to JSON
memory_correct Fix outdated knowledge documents

memory_recall supports optional filters: content_types, tags, after, before — all applied post-vector-search so you can narrow results to specific episode types or date ranges.

memory_search does plain text LIKE matching in SQLite. No embedding backend needed. Supports the same filters (content_types, tags, after, before) plus a limit parameter.

Python API

from consolidation_memory import MemoryClient

with MemoryClient() as mem:
    mem.store("User prefers dark mode", content_type="preference", tags=["ui"])

    result = mem.recall("user interface preferences")
    for ep in result.episodes:
        print(ep["content"], ep["similarity"])

    stats = mem.status()
    print(stats.health)  # {"status": "healthy", "issues": [], "backend_reachable": true}

OpenAI Function Calling

Works with any OpenAI-compatible API (LM Studio, Ollama, OpenAI, Azure):

from consolidation_memory import MemoryClient
from consolidation_memory.schemas import openai_tools, dispatch_tool_call

mem = MemoryClient()
# Pass openai_tools to your chat completion, dispatch results with dispatch_tool_call()

REST API

pip install consolidation-memory[rest]
consolidation-memory serve --rest --port 8080
Method Path Description
GET /health Version + status
POST /memory/store Store episode
POST /memory/store/batch Store multiple episodes
POST /memory/recall Semantic search (with optional filters)
POST /memory/search Keyword/metadata search (no embedding needed)
GET /memory/status System statistics + consolidation metrics
DELETE /memory/episodes/{id} Forget episode
POST /memory/consolidate Trigger consolidation
POST /memory/correct Correct knowledge doc
POST /memory/export Export to JSON

Embedding Backends

Backend Install Model Dimensions Runs locally?
FastEmbed (default) pip install consolidation-memory[fastembed] bge-small-en-v1.5 384 Yes
LM Studio Built-in nomic-embed-text-v1.5 768 Yes
Ollama Built-in nomic-embed-text 768 Yes
OpenAI pip install consolidation-memory[openai] text-embedding-3-small 1536 No

LLM Backends (for consolidation)

The consolidation step needs a chat-capable LLM to synthesize clusters into knowledge documents. Set backend = "disabled" to skip consolidation and use store/recall only.

Backend Requirements
LM Studio (default) LM Studio running with any chat model
Ollama Ollama running with any chat model
OpenAI API key
Disabled None — no consolidation, pure vector search

Configuration

consolidation-memory init  # Interactive setup

Or edit the config directly:

Platform Path
Linux/macOS ~/.config/consolidation_memory/config.toml
Windows %APPDATA%\consolidation_memory\config.toml
Override CONSOLIDATION_MEMORY_CONFIG env var
[embedding]
backend = "fastembed"

[llm]
backend = "lmstudio"
api_base = "http://localhost:1234/v1"
model = "qwen2.5-7b-instruct"

[consolidation]
auto_run = true
interval_hours = 6
cluster_threshold = 0.72
prune_enabled = true
prune_after_days = 60

CLI

consolidation-memory serve              # MCP server (default)
consolidation-memory serve --rest       # REST API
consolidation-memory init               # Interactive setup
consolidation-memory status             # Show stats
consolidation-memory consolidate        # Manual consolidation
consolidation-memory export             # Export to JSON
consolidation-memory import PATH        # Import from JSON
consolidation-memory reindex            # Re-embed everything (after switching backends)

Data Storage

All data stays local:

Platform Path
Linux ~/.local/share/consolidation_memory/
macOS ~/Library/Application Support/consolidation_memory/
Windows %LOCALAPPDATA%\consolidation_memory\

Override with data_dir under [paths] in config.

Migrating

Already have a data directory? Point your config at it:

[paths]
data_dir = "/path/to/your/existing/data"

Switching embedding backends (different dimensions)?

consolidation-memory reindex

Development

git clone https://github.com/charliee1w/consolidation-memory
cd consolidation-memory
pip install -e ".[fastembed,dev]"
python -m pytest tests/ -v      # 88 tests, no external services needed
python -m ruff check src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consolidation_memory-0.2.0.tar.gz (70.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

consolidation_memory-0.2.0-py3-none-any.whl (64.2 kB view details)

Uploaded Python 3

File details

Details for the file consolidation_memory-0.2.0.tar.gz.

File metadata

  • Download URL: consolidation_memory-0.2.0.tar.gz
  • Upload date:
  • Size: 70.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for consolidation_memory-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3a7cc22def82721f8941eb104785032f407f32e5e4fc892bb936fd11f5a5dd93
MD5 8cf00a1edea638eb9dde447fedca134a
BLAKE2b-256 831633f49a47293fe6854254fc13cff23a7ea860324979a9bdb011b87e030eb2

See more details on using hashes here.

Provenance

The following attestation bundles were made for consolidation_memory-0.2.0.tar.gz:

Publisher: publish.yml on charliee1w/consolidation-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file consolidation_memory-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for consolidation_memory-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 18eaab818ea215bd4cc0f62ad0b73dff4e43a4e175ffdf23ffd05959555e4222
MD5 d06d8bdfdb2d5d32d9de3b07eab09b0f
BLAKE2b-256 b81636dd2f9d6b6709490ae9b6c49aa34c3632737cedfc5cd565e5a6dbacd8a1

See more details on using hashes here.

Provenance

The following attestation bundles were made for consolidation_memory-0.2.0-py3-none-any.whl:

Publisher: publish.yml on charliee1w/consolidation-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page