Memory stack for AI agents: BM25 + Qdrant + Memgraph + RRF retrieval, Gemini Flash inference layer, 8-stage recall pipeline.

These details have not been verified by PyPI

Project description

mnemostack

Memory stack for AI agents — durable, structured, semantically searchable.

mnemostack is a hybrid memory system combining BM25, vector search (Qdrant), and knowledge graph (Memgraph) with a unified recall pipeline, reranker, and optional LLM inference layer.

Status: 🚧 alpha — API may change between 0.1.x releases.

Benchmarks

Full LoCoMo run (official SNAP-Research dataset, 10 samples / 1986 QA, clean state, judged by Gemini Flash):

Metric	mnemostack 0.1.0a11
Correct (strict)	66.4% (1319 / 1986)
Partial	12.8% (254)
Wrong	20.8% (413)
Combined (correct + partial)	79.2%

By question category:

Category	Correct
`cat_5` adversarial open-domain	90.1%
`cat_4` multi-hop reasoning	69.2%
`cat_2` temporal	64.5%
`cat_1` single-hop lists	34.8%
`cat_3` open-domain reasoning	31.2%

Honest numbers disclaimer. The table above is our full-benchmark number across all 1986 questions and all 5 categories. Some vendors report their strongest sub-category only; if we did the same we could honestly claim 90.1% on adversarial open-domain or 69.2% on multi-hop reasoning. We publish the full aggregate because that's what actually predicts how the system behaves on mixed workloads.

How that compares with reported numbers from other systems on the same benchmark (caveat: different judges, evaluation protocols, and in some cases category cherry-picking):

System	LoCoMo correct
Hindsight (leader)	78–85%
Memobase (temporal subset)	85%
Letta filesystem agent	74%
Mem0 graph variant	~68.5%
mnemostack 0.1.0a11	66.4%
Zep (independently replicated)	58.4%

Reproduce with python benchmarks/locomo_single.py --samples 10 from a clone; the runner only needs a GEMINI_API_KEY.

Features

🧠 Hybrid retrieval — BM25 (exact tokens) + vector (semantic), fused via Reciprocal Rank Fusion
🔌 Pluggable embeddings — Gemini, Ollama, or HuggingFace (local GPU), via provider registry
🤖 Pluggable LLM — Gemini Flash / Ollama for answer generation and reranking
📚 Temporal knowledge graph — facts have valid_from/valid_until, query point-in-time state
💬 Answer mode — inference layer synthesizes concise factual answers with source citations and confidence
🔁 Reranker — LLM-based reordering of top results
⚙ Consolidation runtime — phase orchestrator for nightly memory lifecycle
🔌 MCP server — expose memory tools to Claude Desktop, ChatGPT, Cursor, etc.
🛡 Graceful degradation — retrieval keeps working if graph is down

Installation

# From PyPI
pip install mnemostack

# Optional extras
pip install 'mnemostack[huggingface]'  # local GPU embeddings
pip install 'mnemostack[mcp]'          # MCP server
pip install 'mnemostack[dev]'          # tests + linters

Run a local Qdrant for the vector store:

docker run -p 6333:6333 qdrant/qdrant:latest

Optionally a Memgraph for the knowledge graph:

docker run -p 7687:7687 memgraph/memgraph:latest

Quick start

CLI

# Health check
mnemostack health --provider ollama

# Index a directory of notes
mnemostack index ./my-notes/ --provider gemini --collection my-memory --recreate

# Hybrid recall
mnemostack search "what did we decide about auth" --provider gemini --collection my-memory

# Synthesize answer
mnemostack answer "what is the capital of France" --provider gemini --collection my-memory

# MCP server (for Claude Desktop, Cursor, etc.)
mnemostack mcp-serve --provider gemini --collection my-memory

Python API

from mnemostack.embeddings import get_provider
from mnemostack.vector import VectorStore
from mnemostack.recall import Recaller, AnswerGenerator
from mnemostack.llm import get_llm

emb = get_provider("gemini")
store = VectorStore(collection="my-memory", dimension=emb.dimension)
store.ensure_collection()

# ... index data here ...

recaller = Recaller(embedding_provider=emb, vector_store=store)
results = recaller.recall("what did we decide", limit=10)

# Optional: synthesize a concise answer
gen = AnswerGenerator(llm=get_llm("gemini"))
answer = gen.generate("what did we decide", results)
print(answer.text, answer.confidence, answer.sources)

Knowledge graph (optional)

from mnemostack.graph import GraphStore

graph = GraphStore(uri="bolt://localhost:7687")
graph.add_triple("alice", "works_on", "project-x", valid_from="2024-01-01")
graph.add_triple("alice", "works_on", "project-y", valid_from="2024-07-01")

# Who was alice working on in March?
march_facts = graph.query_triples(subject="alice", as_of="2024-03-15")

MCP server for Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "mnemostack": {
      "command": "mnemostack",
      "args": ["mcp-serve", "--provider", "gemini", "--collection", "my-memory"],
      "env": {
        "GEMINI_API_KEY": "your-key-here"
      }
    }
  }
}

Claude will then be able to call mnemostack_search, mnemostack_answer, and graph tools.

Custom embedding provider

from mnemostack.embeddings import EmbeddingProvider, register_provider

class MyProvider(EmbeddingProvider):
    @property
    def name(self): return "my-provider"
    @property
    def dimension(self): return 512
    def embed(self, text): ...
    def embed_batch(self, texts): ...

register_provider("my-provider", MyProvider)

Design

See ARCHITECTURE.md for detailed design: pipeline stages, Qdrant schema, Memgraph temporal model, consolidation runtime, MCP tools.

Roadmap

Embedding provider registry (Gemini / Ollama / HuggingFace)
LLM provider registry (Gemini Flash / Ollama)
Qdrant wrapper
BM25 + RRF recall pipeline
Answer mode with confidence + citations
LLM-based reranker
Memgraph wrapper with temporal validity
Consolidation runtime (phase orchestrator)
CLI (mnemostack health/search/answer/index/mcp-serve)
MCP server (Model Context Protocol)
Text → graph triple extractor helpers
Config file support (YAML/JSON)
Async variants for high-throughput servers
Docker compose examples

License

Apache 2.0 — see LICENSE.

Contributing

Early days. Issues and PRs welcome once API stabilizes.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

May 3, 2026

0.4.0

May 3, 2026

0.3.1

May 2, 2026

0.3.0

May 2, 2026

0.2.1

May 2, 2026

0.2.0

Apr 26, 2026

0.2.0b6 pre-release

Apr 26, 2026

0.2.0b5 pre-release

Apr 26, 2026

0.2.0b4 pre-release

Apr 26, 2026

0.2.0b3 pre-release

Apr 26, 2026

0.2.0b2 pre-release

Apr 26, 2026

0.2.0b1 pre-release

Apr 26, 2026

0.2.0a6 pre-release

Apr 26, 2026

0.2.0a5 pre-release

Apr 26, 2026

0.2.0a4 pre-release

Apr 26, 2026

0.2.0a3 pre-release

Apr 25, 2026

0.2.0a2 pre-release

Apr 25, 2026

0.2.0a1 pre-release

Apr 24, 2026

0.1.0a14 pre-release

Apr 19, 2026

0.1.0a13 pre-release

Apr 18, 2026

0.1.0a12 pre-release

Apr 17, 2026

This version

0.1.0a11.post1 pre-release

Apr 17, 2026

0.1.0a11 pre-release

Apr 17, 2026

0.1.0a10 pre-release

Apr 17, 2026

0.1.0a9 pre-release

Apr 17, 2026

0.1.0a8 pre-release

Apr 17, 2026

0.1.0a7 pre-release

Apr 17, 2026

0.1.0a6 pre-release

Apr 17, 2026

0.1.0a5 pre-release

Apr 17, 2026

0.1.0a4 pre-release

Apr 17, 2026

0.1.0a3 pre-release

Apr 17, 2026

0.1.0a2 pre-release

Apr 17, 2026

0.1.0a1 pre-release yanked

Apr 17, 2026

Reason this release was yanked:

Broken dependencies

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnemostack-0.1.0a11.post1.tar.gz (84.5 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mnemostack-0.1.0a11.post1-py3-none-any.whl (81.0 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file mnemostack-0.1.0a11.post1.tar.gz.

File metadata

Download URL: mnemostack-0.1.0a11.post1.tar.gz
Upload date: Apr 17, 2026
Size: 84.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mnemostack-0.1.0a11.post1.tar.gz
Algorithm	Hash digest
SHA256	`480730c5fd6f5349acff20af07beccab76783178dc7b2f083394a44dc11802ad`
MD5	`98145f0514f8854c6b92ebce3edb8659`
BLAKE2b-256	`c53deec2d9cea562fa6c46ef533feec8d5ba4e020eb6ebda34bfa60123f1a197`

See more details on using hashes here.

File details

Details for the file mnemostack-0.1.0a11.post1-py3-none-any.whl.

File metadata

Download URL: mnemostack-0.1.0a11.post1-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 81.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for mnemostack-0.1.0a11.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4009c395c7407f2904570c64a4ca1c8ac383462bb654c95f9bd76224efca224d`
MD5	`561f20c96880cd777a4a5a3f5df132f3`
BLAKE2b-256	`07e2c24930b6dcd53ea21a8509c4d4c892e7bebf3e15dd2620fbd8f18b4c05fb`

See more details on using hashes here.

mnemostack 0.1.0a11.post1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

mnemostack

Benchmarks

Features

Installation

Quick start

CLI

Python API

Knowledge graph (optional)

MCP server for Claude Desktop

Custom embedding provider

Design

Roadmap

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes