Memory stack for AI agents: BM25 + Qdrant + Memgraph + RRF retrieval, Gemini Flash inference layer, 8-stage recall pipeline.
Project description
mnemostack
Memory stack for AI agents — durable, structured, semantically searchable.
mnemostack is a hybrid memory system combining BM25, vector search (Qdrant), and knowledge graph (Memgraph) with a unified recall pipeline, reranker, and optional LLM inference layer.
Status: 🚧 alpha — API may change between 0.1.x releases.
Features
- 🧠 Hybrid retrieval — BM25 (exact tokens) + vector (semantic), fused via Reciprocal Rank Fusion
- 🔌 Pluggable embeddings — Gemini, Ollama, or HuggingFace (local GPU), via provider registry
- 🤖 Pluggable LLM — Gemini Flash / Ollama for answer generation and reranking
- 📚 Temporal knowledge graph — facts have
valid_from/valid_until, query point-in-time state - 💬 Answer mode — inference layer synthesizes concise factual answers with source citations and confidence
- 🔁 Reranker — LLM-based reordering of top results
- ⚙ Consolidation runtime — phase orchestrator for nightly memory lifecycle
- 🔌 MCP server — expose memory tools to Claude Desktop, ChatGPT, Cursor, etc.
- 🛡 Graceful degradation — retrieval keeps working if graph is down
Installation
# From PyPI
pip install mnemostack
# Optional extras
pip install 'mnemostack[huggingface]' # local GPU embeddings
pip install 'mnemostack[mcp]' # MCP server
pip install 'mnemostack[dev]' # tests + linters
Run a local Qdrant for the vector store:
docker run -p 6333:6333 qdrant/qdrant:latest
Optionally a Memgraph for the knowledge graph:
docker run -p 7687:7687 memgraph/memgraph:latest
Quick start
CLI
# Health check
mnemostack health --provider ollama
# Index a directory of notes
mnemostack index ./my-notes/ --provider gemini --collection my-memory --recreate
# Hybrid recall
mnemostack search "what did we decide about auth" --provider gemini --collection my-memory
# Synthesize answer
mnemostack answer "what is the capital of France" --provider gemini --collection my-memory
# MCP server (for Claude Desktop, Cursor, etc.)
mnemostack mcp-serve --provider gemini --collection my-memory
Python API
from mnemostack.embeddings import get_provider
from mnemostack.vector import VectorStore
from mnemostack.recall import Recaller, AnswerGenerator
from mnemostack.llm import get_llm
emb = get_provider("gemini")
store = VectorStore(collection="my-memory", dimension=emb.dimension)
store.ensure_collection()
# ... index data here ...
recaller = Recaller(embedding_provider=emb, vector_store=store)
results = recaller.recall("what did we decide", limit=10)
# Optional: synthesize a concise answer
gen = AnswerGenerator(llm=get_llm("gemini"))
answer = gen.generate("what did we decide", results)
print(answer.text, answer.confidence, answer.sources)
Knowledge graph (optional)
from mnemostack.graph import GraphStore
graph = GraphStore(uri="bolt://localhost:7687")
graph.add_triple("alice", "works_on", "project-x", valid_from="2024-01-01")
graph.add_triple("alice", "works_on", "project-y", valid_from="2024-07-01")
# Who was alice working on in March?
march_facts = graph.query_triples(subject="alice", as_of="2024-03-15")
MCP server for Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"mnemostack": {
"command": "mnemostack",
"args": ["mcp-serve", "--provider", "gemini", "--collection", "my-memory"],
"env": {
"GEMINI_API_KEY": "your-key-here"
}
}
}
}
Claude will then be able to call mnemostack_search, mnemostack_answer, and graph tools.
Custom embedding provider
from mnemostack.embeddings import EmbeddingProvider, register_provider
class MyProvider(EmbeddingProvider):
@property
def name(self): return "my-provider"
@property
def dimension(self): return 512
def embed(self, text): ...
def embed_batch(self, texts): ...
register_provider("my-provider", MyProvider)
Design
See ARCHITECTURE.md for detailed design: pipeline stages, Qdrant schema, Memgraph temporal model, consolidation runtime, MCP tools.
Roadmap
- Embedding provider registry (Gemini / Ollama / HuggingFace)
- LLM provider registry (Gemini Flash / Ollama)
- Qdrant wrapper
- BM25 + RRF recall pipeline
- Answer mode with confidence + citations
- LLM-based reranker
- Memgraph wrapper with temporal validity
- Consolidation runtime (phase orchestrator)
- CLI (
mnemostack health/search/answer/index/mcp-serve) - MCP server (Model Context Protocol)
- Text → graph triple extractor helpers
- Config file support (YAML/JSON)
- Async variants for high-throughput servers
- Docker compose examples
License
Apache 2.0 — see LICENSE.
Contributing
Early days. Issues and PRs welcome once API stabilizes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mnemostack-0.1.0a5.tar.gz.
File metadata
- Download URL: mnemostack-0.1.0a5.tar.gz
- Upload date:
- Size: 55.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
143e66febe19884364215c9d3640d6bb37c8dd20dbb903ad6a5960d2337a3ecd
|
|
| MD5 |
a9d65c14b8484fbf0ceb572daa9f71e7
|
|
| BLAKE2b-256 |
dae79bda13147e16ef222ea67a14a6f48482fa8b3f68e3b38e7ff4fd2bf31b8b
|
File details
Details for the file mnemostack-0.1.0a5-py3-none-any.whl.
File metadata
- Download URL: mnemostack-0.1.0a5-py3-none-any.whl
- Upload date:
- Size: 55.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0be328b37ad9a2eaa7b6f5e70ef478706eea653eb97f4bf842d853041871d6fc
|
|
| MD5 |
017495a0c8f1e009f9a3597d17c32ae1
|
|
| BLAKE2b-256 |
ef00a5f6c229d8f727a5ba1fa11c38d510621253ea59c585b64f5fbc6885af15
|