Skip to main content

Universal long-term memory layer for AI agents via MCP

Project description

mnemon

Python License: MIT Tests Coverage MCP PyPI

Universal long-term memory layer for AI agents via MCP.

mnemon gives AI agents persistent, searchable memory that survives across sessions. It uses hybrid BM25 + vector search, automatic confidence decay, and contradiction detection via the Model Context Protocol. Deploy as a remote server on Fly.io for a unified vault across all your MCP clients (Claude Code, Claude Desktop, Cursor, claude.ai), or run locally for development.

Table of Contents


Install

pip install mnemon-memory

With optional LLM support (local 1.7B model for query expansion, contradiction detection, and smarter session extraction):

pip install "mnemon-memory[llm]"

From source:

git clone https://github.com/cipher813/mnemon.git
cd mnemon
pip install -e ".[dev]"

Quick Start

The recommended setup is a remote vault (one vault, all clients). You have two paths to https://<your-app>.fly.dev/mcp:

  • Self-host (~10 min, ~$1/mo): see Self-host on Fly.io below for the end-to-end runbook.
  • Local-only mode: no remote server needed, useful for development.

1. Configure your client

# Claude Code with remote vault
mnemon setup claude-code --remote-url https://your-app.fly.dev/mcp

# Cursor with remote vault
mnemon setup cursor --remote-url https://your-app.fly.dev/mcp

# Local-only mode (development, no remote server needed)
mnemon setup claude-code
mnemon setup cursor

Verify with mnemon doctor — it runs 6 end-to-end checks against your configured remote (skip for local-only mode).

2. Use it

Once configured, mnemon works automatically:

  • Context surfacing: relevant memories are injected before each prompt
  • Session extraction: decisions, preferences, and observations are saved at session end
  • Handoff generation: session summaries maintain continuity across sessions

You can also interact with memories directly via MCP tools or CLI:

mnemon search "deployment architecture"
mnemon save "DB migration plan" "Migrate from PostgreSQL to DynamoDB in Q3"
mnemon forget 42
mnemon status

MCP Tools

Retrieval

Tool Description
memory_search Hybrid BM25 + vector search with composite scoring (relevance + recency + confidence)
memory_get Fetch a specific memory by ID with full content
memory_timeline Recent memories in reverse chronological order
memory_related Find memories related to a given memory via the relationship graph

Mutation

Tool Description
memory_save Store a new memory with content type classification and auto-embedding
memory_pin Pin a memory to boost confidence and prevent archival
memory_forget Soft-delete a memory (marked as invalidated, not physically removed)

Lifecycle

Tool Description
memory_status Vault health stats — counts by type, vectors, pinned/invalidated
memory_sweep Archive stale memories past their half-life (dry-run by default)
memory_rebuild Re-embed all documents (use after upgrading embedding model)

Intelligence

Tool Description
memory_check_contradictions Check a memory for conflicts using vector similarity + LLM classification
profile_get Synthesized user profile from stored preferences and decisions
profile_update Manually add a fact to the user profile

Memory Types

Each memory has a content type that determines its default confidence and decay half-life:

Type Default Confidence Half-Life Use for
decision 0.85 Never Architectural choices, design decisions
preference 0.80 Never User workflow habits, style preferences
antipattern 0.80 Never Things that failed, approaches to avoid
observation 0.70 90 days Learned facts, discovered behaviors
research 0.70 90 days Investigation results, findings
project 0.65 120 days Project status, goals, context
handoff 0.60 30 days Session summaries for continuity
note 0.50 60 days General notes, default type

Memories with access activity decay slower — each access extends the effective half-life by 10%, up to 3x the base value.

Claude Code Hooks

When configured via mnemon setup claude-code, three hooks are installed:

Hook Event Timeout Description
Context surfacing UserPromptSubmit 8s Searches vault and injects relevant memories as context
Session extractor Stop 30s Extracts decisions, preferences, and observations from the transcript
Handoff generator Stop 30s Creates a session summary for the next session

The extractor and handoff generator use LLM-based extraction when mnemon[llm] is installed, with regex/heuristic fallback otherwise.

Remote Server

Deploy mnemon as a remote Streamable HTTP server for a single vault shared across all MCP clients. This is the recommended production setup — Claude Code hooks, Claude Desktop, Cursor, and claude.ai all read and write the same memories.

Run locally (development)

MNEMON_LOCAL_TOKEN=your-secret-token mnemon serve-remote
PORT=9000 mnemon serve-remote   # custom port

Self-host on Fly.io

End-to-end deploy. You'll get an OAuth-protected MCP endpoint at https://<your-app>.fly.dev/mcp with no third-party auth vendor. Takes ~10 minutes the first time.

Prerequisites. A Fly.io account, flyctl on your $PATH, and this repo cloned locally. Budget ~$0.50–$2/mo for a personal vault (auto-stop idle, 1GB volume).

1. Pick an app name and copy the template.

cp fly.toml.example fly.toml
# Edit fly.toml: replace REPLACE_ME_fly_app_name (3 occurrences) with your chosen app name.
# Pick something globally unique on Fly — e.g. "my-mnemon-vault".

The real fly.toml is gitignored — it holds your specific app identity. fly.toml.example stays in the repo as the template.

2. Create the app and the persistent volume.

fly launch --copy-config --no-deploy      # creates the app from your edited fly.toml; no deploy yet
fly volume create mnemon_data --size 1 --region sjc   # 1GB is enough for thousands of memories; use the same region as primary_region

Without the volume step, every restart wipes your vault — the [mounts] block in fly.toml expects mnemon_data to exist.

3. Generate and set secrets.

# Generate two independent high-entropy secrets. Do not reuse credentials.
python -c "import secrets; print('MNEMON_LOCAL_TOKEN   =', secrets.token_urlsafe(32))"
python -c "import secrets; print('MNEMON_AS_PASSPHRASE =', secrets.token_urlsafe(32))"

# Store both in your password manager, then:
fly secrets set MNEMON_LOCAL_TOKEN=<value-1> \
                MNEMON_AS_ENABLED=true \
                MNEMON_AS_PASSPHRASE=<value-2>

MNEMON_AS_PASSPHRASE is the single-user login for browser clients (claude.ai, Claude Desktop). There is no complexity enforcement in code — use a high-entropy value. MNEMON_LOCAL_TOKEN is the static bearer for headless clients (Claude Code hooks, Cursor).

4. Deploy.

fly deploy

First deploy pulls the FastEmbed model (~15–25s on first memory_search). Subsequent deploys reuse the cached layer.

5. Verify.

# Write the remote URL + bearer token to your local client config.
echo "https://<your-app>.fly.dev/mcp"          > ~/.mnemon/remote_url
echo "<value-1 from step 3>"                   > ~/.mnemon/local_token
chmod 600 ~/.mnemon/local_token

mnemon doctor

mnemon doctor runs 6 checks: remote URL configured, local token configured + 0600 perms, /health reachable, authenticated MCP tool call round-trips, and save + search + forget cycle. All 6 should pass green. If any fail, the error message points at the specific misconfiguration.

6. Connect clients.

# Claude Code hooks (uses MNEMON_LOCAL_TOKEN)
mnemon setup claude-code --remote-url https://<your-app>.fly.dev/mcp

# Cursor (uses MNEMON_LOCAL_TOKEN)
mnemon setup cursor --remote-url https://<your-app>.fly.dev/mcp

For claude.ai (web/mobile) and Claude Desktop — no CLI needed, these use the OAuth browser flow:

  1. In the client, go to Settings → Connectors → Add custom connector.
  2. Paste https://<your-app>.fly.dev/mcp as the connector URL.
  3. Click Connect. Browser redirects to your server's login page.
  4. Enter MNEMON_AS_PASSPHRASE from step 3 above.
  5. You're in. The client now sees memory_search, memory_save, etc. alongside its built-in tools.

Browser clients self-register via Dynamic Client Registration (RFC 7591) — no manual client-id provisioning. Authentication uses PKCE + RS256 JWTs signed by the AS's own keypair (auto-generated on first boot, stored in the Fly volume at /data/oauth_keys/).

Troubleshooting

If mnemon doctor fails, check the specific failing line:

  • Health endpoint unreachable — app may be booting (cold start takes 15–25s for FastEmbed); retry after a moment. If persistent, check fly logs -a <your-app> and fly status.
  • Auth + MCP tool call returns 401MNEMON_LOCAL_TOKEN on your machine doesn't match the Fly secret. Re-copy from your password manager into ~/.mnemon/local_token.
  • Round-trip failsMNEMON_ALLOWED_HOSTS in fly.toml doesn't include the hostname you're connecting through. It should match the host portion of MNEMON_PUBLIC_URL.

S3 Vault Sync

Sync your vault across machines via S3:

# Push local vault to S3
MNEMON_S3_BUCKET=my-bucket mnemon sync push

# Pull vault from S3
MNEMON_S3_BUCKET=my-bucket mnemon sync pull
Env var Default Description
MNEMON_S3_BUCKET (required) S3 bucket name
MNEMON_S3_PREFIX mnemon/vaults S3 key prefix
MNEMON_VAULT_NAME default Vault name

Requires the AWS CLI (aws) on your PATH with valid credentials.

Architecture

Remote (production): All clients hit a single Fly-hosted vault via Streamable HTTP. Claude Code hooks use a static bearer token (MNEMON_LOCAL_TOKEN). Browser clients (claude.ai, Claude Desktop) use OAuth.

Local (development): SQLite vault at ~/.mnemon/default.sqlite with a companion vector store. Useful for testing and offline work.

~/.mnemon/
  remote_url           Remote server URL (written by mnemon setup --remote-url)
  local_token          Bearer token for remote auth (chmod 600)
  default.sqlite       Local SQLite vault (FTS5 + WAL mode, development only)
  default.vec.npz      Companion vector store (numpy, brute-force cosine)
  models/              Local LLM weights (session extraction, query expansion)
  • Storage: SQLite with FTS5 full-text search, content-addressable deduplication (SHA-256)
  • Search: Hybrid BM25 + vector (384d, bge-small-en-v1.5 via FastEmbed) fused with Reciprocal Rank Fusion
  • Scoring: Composite score = 0.5 * relevance + 0.25 * recency + 0.25 * confidence
  • Diversity: MMR filtering (Jaccard bigram similarity > 0.6 demoted by 50%)
  • Intelligence (optional): Local 1.7B LLM (QMD-query-expansion) for query expansion, contradiction detection, session extraction — zero API cost
  • Transport: MCP stdio (local) and Streamable HTTP (remote)

Design decisions

A small set of architectural choices shape the rest of the system. Documented here so self-host users know what they're signing up for and reviewers can evaluate the trade-offs.

Why SQLite + FTS5 (not Postgres, not a vector DB). A single-file embedded database means no operational surface area — no connection pools, no migrations against a live DB, no standalone vector store to keep in sync. FTS5 gives production-grade BM25 without a separate Elasticsearch. A numpy-backed vector store sits alongside the SQLite file; brute-force cosine over a few thousand memories is faster than any network hop to a hosted vector DB. The single-file design also makes vault portability trivial — copy one file and you've moved your entire memory.

Why hybrid BM25 + vector (not pure semantic). Pure vector search misses exact-identifier lookups; pure keyword misses paraphrase. Reciprocal Rank Fusion combines both rankings, then composite scoring folds in recency and confidence. In practice this catches both "find my note about bge-small-en-v1.5" (keyword wins) and "memory about embedding models" (vector wins) without tuning.

Why Fly.io (not AWS / GCP). mnemon is designed to idle cheaply and wake on demand. Fly's auto_stop_machines + min_machines_running=0 costs ~$0.50–0.90/mo for a personal vault; the closest AWS equivalent (ECS Fargate or App Runner) can't scale to zero and starts at ~$10/mo. Fly volumes are local-attached SSD, which matches SQLite's access pattern — AWS's equivalent (EFS) is slower and pricier. Deploy is one fly.toml and one command, vs. the VPC + ALB + ECS + IAM setup AWS requires — which matters for any future self-host user.

Why self-hosted OAuth 2.1 + PKCE + DCR (not Auth0 / Clerk / Logto). Requiring users to register an Auth0 tenant before they can try mnemon is a near-guaranteed bounce. mnemon ships with its own Authorization Server (well-known endpoints, /oauth/authorize, /oauth/token with PKCE, /oauth/register per RFC 7591, JWT issuance) — anyone can fly deploy and have a working OAuth-protected MCP endpoint with no third-party signup. The trade-off is less battle-tested auth code; the mitigation is that browser clients are the only OAuth consumers, and headless clients (Claude Code, Cursor) use a simple static bearer.

Why MCP + a separate memory server (not Claude's native memory). Claude's native memory is account-scoped and only reaches Anthropic products (claude.ai web/mobile/desktop). It doesn't reach Claude Code, Cursor, or any other MCP-speaking client. mnemon serves the cross-client case: a single vault that Claude Code hooks, Cursor, and claude.ai can all read and write. It's also self-hosted, exportable, and programmatically introspectable — the opposite of Anthropic's closed-box model. These systems are complementary, not competing.

Configuration

Client-side (hooks, CLI)

Env var Default Description
MNEMON_REMOTE_URL (none) Remote server URL (or ~/.mnemon/remote_url file)
MNEMON_LOCAL_TOKEN (none) Bearer token for remote auth (or ~/.mnemon/local_token file)
MNEMON_VAULT_DIR ~/.mnemon Local vault directory
MNEMON_MODEL_DIR ~/.mnemon/models Directory for LLM model files

Server-side (mnemon serve-remote)

Env var Default Description
MNEMON_AS_ENABLED false Enable the self-hosted OAuth Authorization Server
MNEMON_AS_PASSPHRASE (none) Single-user login passphrase (required when AS enabled)
MNEMON_AS_KEY_DIR $MNEMON_VAULT_DIR/oauth_keys RSA keypair storage directory
MNEMON_PUBLIC_URL (none) Externally-reachable base URL (required when AS enabled)
MNEMON_LOCAL_TOKEN (none) Static bearer for headless clients (hooks, Cursor)
MNEMON_ALLOWED_HOSTS (none) Comma-separated host allowlist for DNS-rebinding protection
PORT 8502 Remote server port

Known limitations

Client-side behaviors that affect mnemon users but are not bugs in mnemon itself. Upstream tracking linked where applicable.

Claude Code: MCP session invalidated after server restart. When the remote mnemon server restarts (via fly deploy, fly secrets set, or Fly auto-stop/auto-start), Claude Code's cached MCP session ID becomes stale. Subsequent tool calls from within an active Claude Code session return Session not found, and the client does not auto-reinitialize. Workaround: quit and re-launch Claude Code. Hooks are unaffected — they use the static bearer path and bypass the MCP session layer. Upstream: anthropics/claude-code#46533.

Claude Code: /mcp authenticate CLI hang after browser OAuth success. When authenticating a new OAuth-protected MCP connector via /mcp, the browser passphrase flow succeeds and the server issues a JWT, but the CLI prompt that should confirm completion does not respond to Enter (only Escape). Workaround: press Escape, then quit and re-launch Claude Code; the connector state persists. Upstream: anthropics/claude-code#42707.

FastEmbed cold start. The first MCP tool call after a Fly machine auto-stop takes 15–25s while the FastEmbed ONNX model loads into memory. Subsequent calls are fast. Mitigated by a polling SessionStart hook and an eager initialization step in mnemon serve-remote; Fly's http_service.checks.grace_period is set accordingly.

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests (460 tests)
pytest

# Run tests with coverage
pytest --cov=mnemon --cov-report=term-missing

# Run a specific test file
pytest tests/test_store.py -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnemon_memory-0.4.0.tar.gz (116.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mnemon_memory-0.4.0-py3-none-any.whl (81.7 kB view details)

Uploaded Python 3

File details

Details for the file mnemon_memory-0.4.0.tar.gz.

File metadata

  • Download URL: mnemon_memory-0.4.0.tar.gz
  • Upload date:
  • Size: 116.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for mnemon_memory-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c927f52414f7037115e34e2b3632d6f631a01b7d4e4b7adfdbe16fd5312182b3
MD5 1b05271777ffc405cf8f2cc4157c40fb
BLAKE2b-256 869934bf17fa193cc622ed164b27d3ea34322b263e4767b87acaaea3960a8050

See more details on using hashes here.

File details

Details for the file mnemon_memory-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mnemon_memory-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 81.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for mnemon_memory-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd6bdbcf86380cea5888db7093c5571d5b0f4bd985ca4480daa8c39078afc172
MD5 b41bd64f54d6f8f02acf62a3e723e43a
BLAKE2b-256 ffd13e6b7584138a683a5a69c99010981e0efb2c16ef3e76b032fe311aa30145

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page