Local-first persistent memory for AI agents — store, recall, and consolidate knowledge across sessions using FAISS, SQLite, and any LLM

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

charliee1w

These details have not been verified by PyPI

Project description

consolidation-memory

Memory that gets smarter while your agent sleeps.

Most AI memory systems are glorified vector stores — they embed, they retrieve, they forget. consolidation-memory does something different: it runs a background process that clusters your raw episodes, synthesizes them through an LLM, and distills structured knowledge records — automatically, without agent intervention. Your memories don't just accumulate. They consolidate.

This is the same trick your brain uses. Neuroscience calls it memory consolidation: during sleep, the hippocampus replays recent experiences and transfers distilled patterns to the neocortex for long-term storage. Raw episodes become durable knowledge. consolidation-memory applies this process to AI agents — a background thread replays stored episodes, clusters them by semantic similarity, and uses an LLM to synthesize structured knowledge records (facts, solutions, preferences) that feed back into future recall.

The result: an agent that remembers not just what happened, but what it learned.

You: "My build is failing with a linker error"
AI:  (recalls your project uses CMake + MSVC on Windows)
     (recalls you hit the same error last month — it was a missing vcpkg dependency)
     "Last time this happened it was a missing vcpkg package. Want me to
      check if your vcpkg.json changed since we fixed it?"

This isn't retrieval. The agent never explicitly stored "this user's linker errors come from vcpkg." That knowledge was synthesized during consolidation from scattered episodes across multiple sessions.

Why Consolidation Matters

Vector search finds what you stored. Consolidation finds what you learned.

	Vector store	consolidation-memory
Store	Embed text, save vector	Same
Recall	Nearest-neighbor search	Semantic search + knowledge records
Over time	Index grows, recall degrades	Background LLM distills knowledge, prunes noise
Knowledge	Whatever you explicitly saved	Emergent — synthesized from episode clusters
Maintenance	Manual curation or nothing	Automatic background consolidation

Without consolidation, your memory system is a write-once archive. With it, memory compounds.

Why Not X?

There are good AI memory tools out there. Here's why consolidation-memory exists anyway.

	consolidation-memory	Mem0	Zep	Letta (MemGPT)	Cognee
Core mechanism	Background LLM consolidation — clusters episodes, synthesizes knowledge records automatically	Write-time extraction — LLM extracts facts on every `add()` call	Session summaries — compresses conversation windows into summaries	Agent self-management — the LLM decides what to store in its own context	ETL pipeline — extracts, chunks, builds knowledge graph
When synthesis happens	Background thread (async, off the hot path)	Synchronously at write time	End of session / window	During agent turns (uses agent compute)	Explicit pipeline run
Knowledge structure	Typed records (fact, solution, preference) from episode clusters	Flat extracted facts	Session summary nodes + temporal graph	Agent-managed text blocks	Knowledge graph (nodes + edges)
Infrastructure	SQLite + FAISS (two files)	Qdrant/Postgres + graph DB (self-hosted) or cloud API	Postgres + Neo4j (cloud) or Graphiti (Apache 2.0)	Postgres + agent runtime	Neo4j or Kuzu + vector DB
Local-first	Yes — runs on a laptop with no network	Partial — OSS needs Qdrant	No — cloud-first, OSS community edition deprecated	Yes — but requires running agent server	Partial — needs graph DB
MCP native	Yes	Yes (added later)	No	No	Yes (added later)
Zero config	`pip install` + `init`	Docker compose or API key	API key + cloud setup	`pip install` + server setup	`pip install` + graph DB

Mem0 extracts facts at write time — every add() call invokes the LLM to parse and store structured facts. This works, but it means your extraction quality is bounded by what the LLM can infer from a single episode in isolation. consolidation-memory's background consolidation sees clusters of related episodes together, letting it synthesize cross-session patterns that no single episode contains.

Zep summarizes conversation sessions and builds a temporal knowledge graph. It's designed for chat applications with clear session boundaries. consolidation-memory operates on individual episodes from any source — it doesn't assume a chat-session structure, and its consolidation clusters by semantic similarity rather than temporal adjacency.

Letta (MemGPT) makes the agent itself responsible for memory management — the LLM decides what to write to its core memory and archival storage during its own turns. This is elegant but uses agent compute for memory housekeeping and requires the agent to be well-prompted for self-management. consolidation-memory moves this work to a background thread that runs independently of agent sessions.

Cognee builds knowledge graphs through an ETL-style pipeline — powerful for structured reasoning over entities and relationships, but it needs graph database infrastructure (Neo4j or Kuzu). consolidation-memory's approach is deliberately simpler: SQLite + FAISS, two files, runs on a laptop.

How It Works

flowchart LR
    A["Store"] -->|episodes + embeddings| B["SQLite + FAISS"]
    B -->|semantic search| C["Recall"]
    C -->|priority scoring| D["Results"]
    B -->|background thread| E["Consolidate"]
    E -->|cluster + synthesize| F["Knowledge Records"]
    F -->|feeds back into| B

Store — Save episodes (facts, solutions, preferences) with embeddings into SQLite + FAISS
Recall — Semantic search with priority scoring (surprise, recency, access frequency)
Consolidate — Background LLM clusters related episodes and synthesizes structured knowledge records

Consolidation Detail

flowchart TD
    A["Fetch unconsolidated episodes"] --> B["Embed + cluster"]
    B --> C{"Match existing topic?"}
    C -->|Yes| D["Merge into topic"]
    C -->|No| E["Create new topic"]
    D --> F["LLM synthesizes structured records"]
    E --> F
    F --> G["Validate + version + write"]
    G --> H["Prune old episodes"]

Runs on a background thread (default: every 6 hours). Episodes are grouped by hierarchical clustering, matched to existing knowledge topics by semantic similarity, then synthesized into structured records (facts, solutions, preferences) via LLM. Three consecutive failures trigger a circuit breaker to avoid burning through timeouts.

Quick Start

pip install consolidation-memory[fastembed]
consolidation-memory init

FastEmbed runs locally — no external services needed.

Integrations

MCP Server

Add to your MCP client config (claude_desktop_config.json, .claude/settings.json, etc.):

{
  "mcpServers": {
    "consolidation_memory": {
      "command": "consolidation-memory"
    }
  }
}

Tool	Description
`memory_store`	Save an episode (fact, solution, preference, exchange)
`memory_store_batch`	Store multiple episodes in one call (single embed + FAISS batch)
`memory_recall`	Semantic search over episodes + knowledge, with optional filters
`memory_search`	Keyword/metadata search — works without embedding backend
`memory_status`	System stats, health diagnostics, and consolidation metrics
`memory_forget`	Soft-delete an episode by ID
`memory_export`	Export all episodes and knowledge to a JSON snapshot
`memory_correct`	Fix outdated knowledge documents with new information
`memory_compact`	Rebuild FAISS index, removing tombstoned vectors
`memory_consolidate`	Manually trigger a consolidation run

Python API

from consolidation_memory import MemoryClient

with MemoryClient() as mem:
    mem.store("User prefers dark mode", content_type="preference", tags=["ui"])

    result = mem.recall("user interface preferences")
    for ep in result.episodes:
        print(ep["content"], ep["similarity"])

    stats = mem.status()
    print(stats.health)  # {"status": "healthy", "issues": [], "backend_reachable": true}

OpenAI Function Calling

Works with any OpenAI-compatible API (LM Studio, Ollama, OpenAI, Azure):

from consolidation_memory import MemoryClient
from consolidation_memory.schemas import openai_tools, dispatch_tool_call

mem = MemoryClient()
# Pass openai_tools to your chat completion, dispatch results with dispatch_tool_call()

REST API

pip install consolidation-memory[rest]
consolidation-memory serve --rest --port 8080

Method	Path	Description
`GET`	`/health`	Version + status
`POST`	`/memory/store`	Store episode
`POST`	`/memory/store/batch`	Store multiple episodes
`POST`	`/memory/recall`	Semantic search (with optional filters)
`POST`	`/memory/search`	Keyword/metadata search (no embedding needed)
`GET`	`/memory/status`	System statistics + consolidation metrics
`DELETE`	`/memory/episodes/{id}`	Forget episode
`POST`	`/memory/consolidate`	Trigger consolidation
`POST`	`/memory/correct`	Correct knowledge doc
`POST`	`/memory/export`	Export to JSON

Backends

Embedding

Backend	Install	Model	Local
FastEmbed (default)	`pip install consolidation-memory[fastembed]`	bge-small-en-v1.5	Y
LM Studio	Built-in	nomic-embed-text-v1.5	Y
Ollama	Built-in	nomic-embed-text	Y
OpenAI	`pip install consolidation-memory[openai]`	text-embedding-3-small	N

LLM

Backend	Requirements
LM Studio (default)	LM Studio running with any chat model
Ollama	Ollama running with any chat model
OpenAI	API key
Disabled	None — no consolidation, pure vector search

Configuration

consolidation-memory init

Manual configuration

Platform	Path
Linux/macOS	`~/.config/consolidation_memory/config.toml`
Windows	`%APPDATA%\consolidation_memory\config.toml`
Override	`CONSOLIDATION_MEMORY_CONFIG` env var

[embedding]
backend = "fastembed"

[llm]
backend = "lmstudio"
api_base = "http://localhost:1234/v1"
model = "qwen2.5-7b-instruct"

[consolidation]
auto_run = true
interval_hours = 6
cluster_threshold = 0.72
prune_enabled = true
prune_after_days = 60

CLI

Command	Description
`consolidation-memory serve`	Start MCP server (default)
`consolidation-memory serve --rest`	Start REST API
`consolidation-memory --project work serve`	Start MCP server for a specific project
`consolidation-memory init`	Interactive setup
`consolidation-memory status`	Show stats
`consolidation-memory consolidate`	Manual consolidation
`consolidation-memory export`	Export to JSON
`consolidation-memory import PATH`	Import from JSON
`consolidation-memory reindex`	Re-embed everything (after switching backends)

Multi-Project Support

Isolate memories per project — work memories stay in work, personal stays in personal.

# CLI flag
consolidation-memory --project work status
consolidation-memory --project personal serve --rest --port 8081

# Environment variable
CONSOLIDATION_MEMORY_PROJECT=work consolidation-memory serve

MCP (Claude Desktop) — Multiple Projects

Add separate server entries per project:

{
  "mcpServers": {
    "memory-work": {
      "command": "consolidation-memory",
      "env": { "CONSOLIDATION_MEMORY_PROJECT": "work" }
    },
    "memory-personal": {
      "command": "consolidation-memory",
      "env": { "CONSOLIDATION_MEMORY_PROJECT": "personal" }
    }
  }
}

Each project gets its own database, vector index, and knowledge files. Config and embedding/LLM backends are shared. When no project is specified, default is used. Existing users are auto-migrated to projects/default/ on first run.

Data Storage

All data stays local.

Platform	Path
Linux	`~/.local/share/consolidation_memory/projects/<name>/`
macOS	`~/Library/Application Support/consolidation_memory/projects/<name>/`
Windows	`%LOCALAPPDATA%\consolidation_memory\projects\<name>\`

Migrating

Point your config at an existing data directory:

[paths]
data_dir = "/path/to/your/existing/data"

Switching embedding backends (different dimensions)?

consolidation-memory reindex

Development

git clone https://github.com/charliee1w/consolidation-memory
cd consolidation-memory
pip install -e ".[all,dev]"
pytest tests/ -v
ruff check src/ tests/

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

charliee1w

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.15.0

Mar 28, 2026

0.14.2

Mar 19, 2026

0.14.1

Mar 19, 2026

0.14.0

Mar 19, 2026

0.13.7

Mar 13, 2026

0.13.6

Mar 10, 2026

0.13.5

Mar 8, 2026

0.13.1

Mar 8, 2026

0.13.0

Mar 7, 2026

0.12.4

Mar 6, 2026

0.12.3

Mar 3, 2026

0.12.2

Mar 3, 2026

0.12.1

Mar 2, 2026

0.12.0

Mar 2, 2026

0.11.0

Mar 2, 2026

0.10.0

Mar 1, 2026

0.9.0

Mar 1, 2026

0.8.3

Mar 1, 2026

0.8.2

Mar 1, 2026

0.8.1

Mar 1, 2026

0.8.0

Mar 1, 2026

0.7.0

Feb 28, 2026

This version

0.6.0

Feb 28, 2026

0.5.0

Feb 28, 2026

0.4.0

Feb 28, 2026

0.3.0

Feb 28, 2026

0.2.0

Feb 25, 2026

0.1.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consolidation_memory-0.6.0.tar.gz (88.3 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

consolidation_memory-0.6.0-py3-none-any.whl (76.4 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file consolidation_memory-0.6.0.tar.gz.

File metadata

Download URL: consolidation_memory-0.6.0.tar.gz
Upload date: Feb 28, 2026
Size: 88.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for consolidation_memory-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`43426dfb9626eb9f9c45f7af8a19862f43b07fc54e955922c8632f8701bf7728`
MD5	`d31b89ea5f11dba9fde420feb857ddb1`
BLAKE2b-256	`85652c58ed28f1d37a027bd1a3aae48d865d50e6e96c1daea5b41b2448987d8b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for consolidation_memory-0.6.0.tar.gz:

Publisher: publish.yml on charliee1w/consolidation-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: consolidation_memory-0.6.0.tar.gz
- Subject digest: 43426dfb9626eb9f9c45f7af8a19862f43b07fc54e955922c8632f8701bf7728
- Sigstore transparency entry: 1005353970
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: charliee1w/consolidation-memory@24b9edd859e84ed5dfd734fc3892cf9c27cb7dbd
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/charliee1w
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@24b9edd859e84ed5dfd734fc3892cf9c27cb7dbd
- Trigger Event: push

File details

Details for the file consolidation_memory-0.6.0-py3-none-any.whl.

File metadata

Download URL: consolidation_memory-0.6.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 76.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for consolidation_memory-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95bdf38efb9033afd03a3fcfd78d12c316040b95c57b3a894ccc59224710853b`
MD5	`09d2de719d3bf82b38710536c718fdaf`
BLAKE2b-256	`45294b75fc9e15bcafaedafd4042237fba1c9878e58c16f9c15c978eb387045e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for consolidation_memory-0.6.0-py3-none-any.whl:

Publisher: publish.yml on charliee1w/consolidation-memory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: consolidation_memory-0.6.0-py3-none-any.whl
- Subject digest: 95bdf38efb9033afd03a3fcfd78d12c316040b95c57b3a894ccc59224710853b
- Sigstore transparency entry: 1005353974
- Sigstore integration time: Feb 28, 2026
Source repository:
- Permalink: charliee1w/consolidation-memory@24b9edd859e84ed5dfd734fc3892cf9c27cb7dbd
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/charliee1w
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@24b9edd859e84ed5dfd734fc3892cf9c27cb7dbd
- Trigger Event: push

consolidation-memory 0.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

consolidation-memory

Why Consolidation Matters

Why Not X?

How It Works

Consolidation Detail

Quick Start

Integrations

Backends

Embedding

LLM

Configuration

CLI

Multi-Project Support

MCP (Claude Desktop) — Multiple Projects

Data Storage

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance