Universal long-term memory layer for AI agents via MCP
Project description
mnemon
Universal long-term memory layer for AI agents via MCP.
mnemon gives AI agents persistent, searchable memory that survives across sessions. It uses hybrid BM25 + vector search, automatic confidence decay, and contradiction detection via the Model Context Protocol. Deploy as a remote server on Fly.io for a unified vault across all your MCP clients (Claude Code, Claude Desktop, Cursor, claude.ai), or run locally for development.
Table of Contents
- Install
- Quick Start
- MCP Tools
- Memory Types
- Claude Code Hooks
- Remote Server
- S3 Vault Sync
- Architecture
- Configuration
- Known limitations
- Development
Install
pip install mnemon-memory
With optional LLM support (local 1.7B model for query expansion, contradiction detection, and smarter session extraction):
pip install "mnemon-memory[llm]"
From source:
git clone https://github.com/cipher813/mnemon.git
cd mnemon
pip install -e ".[dev]"
Quick Start
The recommended setup is a remote vault (one vault, all clients). You have two paths to https://<your-app>.fly.dev/mcp:
- Self-host (~10 min, ~$1/mo): see Self-host on Fly.io below for the end-to-end runbook.
- Local-only mode: no remote server needed, useful for development.
1. Configure your client
# Claude Code with remote vault
mnemon setup claude-code --remote-url https://your-app.fly.dev/mcp
# Cursor with remote vault
mnemon setup cursor --remote-url https://your-app.fly.dev/mcp
# Local-only mode (development, no remote server needed)
mnemon setup claude-code
mnemon setup cursor
Verify with mnemon doctor — it runs 6 end-to-end checks against your configured remote (skip for local-only mode).
2. Use it
Once configured, mnemon works automatically:
- Context surfacing: relevant memories are injected before each prompt
- Session extraction: decisions, preferences, and observations are saved at session end
- Handoff generation: session summaries maintain continuity across sessions
You can also interact with memories directly via MCP tools or CLI:
mnemon search "deployment architecture"
mnemon save "DB migration plan" "Migrate from PostgreSQL to DynamoDB in Q3"
mnemon forget 42
mnemon status
MCP Tools
Retrieval
| Tool | Description |
|---|---|
memory_search |
Hybrid BM25 + vector search with composite scoring (relevance + recency + confidence) |
memory_get |
Fetch a specific memory by ID with full content |
memory_timeline |
Recent memories in reverse chronological order |
memory_related |
Find memories related to a given memory via the relationship graph |
Mutation
| Tool | Description |
|---|---|
memory_save |
Store a new memory with content type classification and auto-embedding |
memory_pin |
Pin a memory to boost confidence and prevent archival |
memory_forget |
Soft-delete a memory (marked as invalidated, not physically removed) |
Lifecycle
| Tool | Description |
|---|---|
memory_status |
Vault health stats — counts by type, vectors, pinned/invalidated |
memory_sweep |
Archive stale memories past their half-life (dry-run by default) |
memory_rebuild |
Re-embed all documents (use after upgrading embedding model) |
Intelligence
| Tool | Description |
|---|---|
memory_check_contradictions |
Check a memory for conflicts using vector similarity + LLM classification |
profile_get |
Synthesized user profile from stored preferences and decisions |
profile_update |
Manually add a fact to the user profile |
Memory Types
Each memory has a content type that determines its default confidence and decay half-life:
| Type | Default Confidence | Half-Life | Use for |
|---|---|---|---|
decision |
0.85 | Never | Architectural choices, design decisions |
preference |
0.80 | Never | User workflow habits, style preferences |
antipattern |
0.80 | Never | Things that failed, approaches to avoid |
observation |
0.70 | 90 days | Learned facts, discovered behaviors |
research |
0.70 | 90 days | Investigation results, findings |
project |
0.65 | 120 days | Project status, goals, context |
handoff |
0.60 | 30 days | Session summaries for continuity |
note |
0.50 | 60 days | General notes, default type |
Memories with access activity decay slower — each access extends the effective half-life by 10%, up to 3x the base value.
Claude Code Hooks
When configured via mnemon setup claude-code, three hooks are installed:
| Hook | Event | Timeout | Description |
|---|---|---|---|
| Context surfacing | UserPromptSubmit |
8s | Searches vault and injects relevant memories as context |
| Session extractor | Stop |
30s | Extracts decisions, preferences, and observations from the transcript |
| Handoff generator | Stop |
30s | Creates a session summary for the next session |
The extractor and handoff generator use LLM-based extraction when mnemon[llm] is installed, with regex/heuristic fallback otherwise.
Remote Server
Deploy mnemon as a remote Streamable HTTP server for a single vault shared across all MCP clients. This is the recommended production setup — Claude Code hooks, Claude Desktop, Cursor, and claude.ai all read and write the same memories.
Run locally (development)
MNEMON_LOCAL_TOKEN=your-secret-token mnemon serve-remote
PORT=9000 mnemon serve-remote # custom port
Self-host on Fly.io
End-to-end deploy. You'll get an OAuth-protected MCP endpoint at https://<your-app>.fly.dev/mcp with no third-party auth vendor. Takes ~10 minutes the first time.
Prerequisites. A Fly.io account, flyctl on your $PATH, and this repo cloned locally. Budget ~$0.50–$2/mo for a personal vault (auto-stop idle, 1GB volume).
1. Pick an app name and copy the template.
cp fly.toml.example fly.toml
# Edit fly.toml: replace REPLACE_ME_fly_app_name (3 occurrences) with your chosen app name.
# Pick something globally unique on Fly — e.g. "my-mnemon-vault".
The real fly.toml is gitignored — it holds your specific app identity. fly.toml.example stays in the repo as the template.
2. Create the app and the persistent volume.
fly launch --copy-config --no-deploy # creates the app from your edited fly.toml; no deploy yet
fly volume create mnemon_data --size 1 --region sjc # 1GB is enough for thousands of memories; use the same region as primary_region
Without the volume step, every restart wipes your vault — the [mounts] block in fly.toml expects mnemon_data to exist.
3. Generate and set secrets.
# Generate two independent high-entropy secrets. Do not reuse credentials.
python -c "import secrets; print('MNEMON_LOCAL_TOKEN =', secrets.token_urlsafe(32))"
python -c "import secrets; print('MNEMON_AS_PASSPHRASE =', secrets.token_urlsafe(32))"
# Store both in your password manager, then:
fly secrets set MNEMON_LOCAL_TOKEN=<value-1> \
MNEMON_AS_ENABLED=true \
MNEMON_AS_PASSPHRASE=<value-2>
MNEMON_AS_PASSPHRASE is the single-user login for browser clients (claude.ai, Claude Desktop). There is no complexity enforcement in code — use a high-entropy value. MNEMON_LOCAL_TOKEN is the static bearer for headless clients (Claude Code hooks, Cursor).
4. Deploy.
fly deploy
First deploy pulls the FastEmbed model (~15–25s on first memory_search). Subsequent deploys reuse the cached layer.
5. Verify.
# Write the remote URL + bearer token to your local client config.
echo "https://<your-app>.fly.dev/mcp" > ~/.mnemon/remote_url
echo "<value-1 from step 3>" > ~/.mnemon/local_token
chmod 600 ~/.mnemon/local_token
mnemon doctor
mnemon doctor runs 6 checks: remote URL configured, local token configured + 0600 perms, /health reachable, authenticated MCP tool call round-trips, and save + search + forget cycle. All 6 should pass green. If any fail, the error message points at the specific misconfiguration.
6. Connect clients.
# Claude Code hooks (uses MNEMON_LOCAL_TOKEN)
mnemon setup claude-code --remote-url https://<your-app>.fly.dev/mcp
# Cursor (uses MNEMON_LOCAL_TOKEN)
mnemon setup cursor --remote-url https://<your-app>.fly.dev/mcp
For claude.ai (web/mobile) and Claude Desktop — no CLI needed, these use the OAuth browser flow:
- In the client, go to Settings → Connectors → Add custom connector.
- Paste
https://<your-app>.fly.dev/mcpas the connector URL. - Click Connect. Browser redirects to your server's login page.
- Enter
MNEMON_AS_PASSPHRASEfrom step 3 above. - You're in. The client now sees
memory_search,memory_save, etc. alongside its built-in tools.
Browser clients self-register via Dynamic Client Registration (RFC 7591) — no manual client-id provisioning. Authentication uses PKCE + RS256 JWTs signed by the AS's own keypair (auto-generated on first boot, stored in the Fly volume at /data/oauth_keys/).
Troubleshooting
If mnemon doctor fails, check the specific failing line:
- Health endpoint unreachable — app may be booting (cold start takes 15–25s for FastEmbed); retry after a moment. If persistent, check
fly logs -a <your-app>andfly status. - Auth + MCP tool call returns 401 —
MNEMON_LOCAL_TOKENon your machine doesn't match the Fly secret. Re-copy from your password manager into~/.mnemon/local_token. - Round-trip fails —
MNEMON_ALLOWED_HOSTSinfly.tomldoesn't include the hostname you're connecting through. It should match the host portion ofMNEMON_PUBLIC_URL.
S3 Vault Sync
Sync your vault across machines via S3:
# Push local vault to S3
MNEMON_S3_BUCKET=my-bucket mnemon sync push
# Pull vault from S3
MNEMON_S3_BUCKET=my-bucket mnemon sync pull
| Env var | Default | Description |
|---|---|---|
MNEMON_S3_BUCKET |
(required) | S3 bucket name |
MNEMON_S3_PREFIX |
mnemon/vaults |
S3 key prefix |
MNEMON_VAULT_NAME |
default |
Vault name |
Requires the AWS CLI (aws) on your PATH with valid credentials.
Architecture
Remote (production): All clients hit a single Fly-hosted vault via Streamable HTTP. Claude Code hooks use a static bearer token (MNEMON_LOCAL_TOKEN). Browser clients (claude.ai, Claude Desktop) use OAuth.
Local (development): SQLite vault at ~/.mnemon/default.sqlite with a companion vector store. Useful for testing and offline work.
~/.mnemon/
remote_url Remote server URL (written by mnemon setup --remote-url)
local_token Bearer token for remote auth (chmod 600)
default.sqlite Local SQLite vault (FTS5 + WAL mode, development only)
default.vec.npz Companion vector store (numpy, brute-force cosine)
models/ Local LLM weights (session extraction, query expansion)
- Storage: SQLite with FTS5 full-text search, content-addressable deduplication (SHA-256)
- Search: Hybrid BM25 + vector (384d, bge-small-en-v1.5 via FastEmbed) fused with Reciprocal Rank Fusion
- Scoring: Composite score = 0.5 * relevance + 0.25 * recency + 0.25 * confidence
- Diversity: MMR filtering (Jaccard bigram similarity > 0.6 demoted by 50%)
- Intelligence (optional): Local 1.7B LLM (QMD-query-expansion) for query expansion, contradiction detection, session extraction — zero API cost
- Transport: MCP stdio (local) and Streamable HTTP (remote)
Design decisions
A small set of architectural choices shape the rest of the system. Documented here so self-host users know what they're signing up for and reviewers can evaluate the trade-offs.
Why SQLite + FTS5 (not Postgres, not a vector DB). A single-file embedded database means no operational surface area — no connection pools, no migrations against a live DB, no standalone vector store to keep in sync. FTS5 gives production-grade BM25 without a separate Elasticsearch. A numpy-backed vector store sits alongside the SQLite file; brute-force cosine over a few thousand memories is faster than any network hop to a hosted vector DB. The single-file design also makes vault portability trivial — copy one file and you've moved your entire memory.
Why hybrid BM25 + vector (not pure semantic). Pure vector search misses exact-identifier lookups; pure keyword misses paraphrase. Reciprocal Rank Fusion combines both rankings, then composite scoring folds in recency and confidence. In practice this catches both "find my note about bge-small-en-v1.5" (keyword wins) and "memory about embedding models" (vector wins) without tuning.
Why Fly.io (not AWS / GCP). mnemon is designed to idle cheaply and wake on demand. Fly's auto_stop_machines + min_machines_running=0 costs ~$0.50–0.90/mo for a personal vault; the closest AWS equivalent (ECS Fargate or App Runner) can't scale to zero and starts at ~$10/mo. Fly volumes are local-attached SSD, which matches SQLite's access pattern — AWS's equivalent (EFS) is slower and pricier. Deploy is one fly.toml and one command, vs. the VPC + ALB + ECS + IAM setup AWS requires — which matters for any future self-host user.
Why self-hosted OAuth 2.1 + PKCE + DCR (not Auth0 / Clerk / Logto). Requiring users to register an Auth0 tenant before they can try mnemon is a near-guaranteed bounce. mnemon ships with its own Authorization Server (well-known endpoints, /oauth/authorize, /oauth/token with PKCE, /oauth/register per RFC 7591, JWT issuance) — anyone can fly deploy and have a working OAuth-protected MCP endpoint with no third-party signup. The trade-off is less battle-tested auth code; the mitigation is that browser clients are the only OAuth consumers, and headless clients (Claude Code, Cursor) use a simple static bearer.
Why MCP + a separate memory server (not Claude's native memory). Claude's native memory is account-scoped and only reaches Anthropic products (claude.ai web/mobile/desktop). It doesn't reach Claude Code, Cursor, or any other MCP-speaking client. mnemon serves the cross-client case: a single vault that Claude Code hooks, Cursor, and claude.ai can all read and write. It's also self-hosted, exportable, and programmatically introspectable — the opposite of Anthropic's closed-box model. These systems are complementary, not competing.
Configuration
Client-side (hooks, CLI)
| Env var | Default | Description |
|---|---|---|
MNEMON_REMOTE_URL |
(none) | Remote server URL (or ~/.mnemon/remote_url file) |
MNEMON_LOCAL_TOKEN |
(none) | Bearer token for remote auth (or ~/.mnemon/local_token file) |
MNEMON_VAULT_DIR |
~/.mnemon |
Local vault directory |
MNEMON_MODEL_DIR |
~/.mnemon/models |
Directory for LLM model files |
Server-side (mnemon serve-remote)
| Env var | Default | Description |
|---|---|---|
MNEMON_AS_ENABLED |
false |
Enable the self-hosted OAuth Authorization Server |
MNEMON_AS_PASSPHRASE |
(none) | Single-user login passphrase (required when AS enabled) |
MNEMON_AS_KEY_DIR |
$MNEMON_VAULT_DIR/oauth_keys |
RSA keypair storage directory |
MNEMON_PUBLIC_URL |
(none) | Externally-reachable base URL (required when AS enabled) |
MNEMON_LOCAL_TOKEN |
(none) | Static bearer for headless clients (hooks, Cursor) |
MNEMON_ALLOWED_HOSTS |
(none) | Comma-separated host allowlist for DNS-rebinding protection |
PORT |
8502 |
Remote server port |
Known limitations
Client-side behaviors that affect mnemon users but are not bugs in mnemon itself. Upstream tracking linked where applicable.
Claude Code: MCP session invalidated after server restart. When the remote mnemon server restarts (via fly deploy, fly secrets set, or Fly auto-stop/auto-start), Claude Code's cached MCP session ID becomes stale. Subsequent tool calls from within an active Claude Code session return Session not found, and the client does not auto-reinitialize. Workaround: quit and re-launch Claude Code. Hooks are unaffected — they use the static bearer path and bypass the MCP session layer. Upstream: anthropics/claude-code#46533.
Claude Code: /mcp authenticate CLI hang after browser OAuth success. When authenticating a new OAuth-protected MCP connector via /mcp, the browser passphrase flow succeeds and the server issues a JWT, but the CLI prompt that should confirm completion does not respond to Enter (only Escape). Workaround: press Escape, then quit and re-launch Claude Code; the connector state persists. Upstream: anthropics/claude-code#42707.
FastEmbed cold start. The first MCP tool call after a Fly machine auto-stop takes 15–25s while the FastEmbed ONNX model loads into memory. Subsequent calls are fast. Mitigated by a polling SessionStart hook and an eager initialization step in mnemon serve-remote; Fly's http_service.checks.grace_period is set accordingly.
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests (460 tests)
pytest
# Run tests with coverage
pytest --cov=mnemon --cov-report=term-missing
# Run a specific test file
pytest tests/test_store.py -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mnemon_memory-0.4.0.tar.gz.
File metadata
- Download URL: mnemon_memory-0.4.0.tar.gz
- Upload date:
- Size: 116.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c927f52414f7037115e34e2b3632d6f631a01b7d4e4b7adfdbe16fd5312182b3
|
|
| MD5 |
1b05271777ffc405cf8f2cc4157c40fb
|
|
| BLAKE2b-256 |
869934bf17fa193cc622ed164b27d3ea34322b263e4767b87acaaea3960a8050
|
File details
Details for the file mnemon_memory-0.4.0-py3-none-any.whl.
File metadata
- Download URL: mnemon_memory-0.4.0-py3-none-any.whl
- Upload date:
- Size: 81.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd6bdbcf86380cea5888db7093c5571d5b0f4bd985ca4480daa8c39078afc172
|
|
| MD5 |
b41bd64f54d6f8f02acf62a3e723e43a
|
|
| BLAKE2b-256 |
ffd13e6b7584138a683a5a69c99010981e0efb2c16ef3e76b032fe311aa30145
|