Skip to main content

Local, token-free file digestion → knowledge-graph memory for Claude. Convert any attachment to Markdown and digest it into a graph + exportable memory, entirely on-device. Tuned for Apple silicon.

Project description

Memorised them All — local, token-free file-to-knowledge-graph memory for Claude (MCP server)

Memorised them All

Local, token-free document memory for Claude — turn any folder of files into a knowledge graph you can recall, for ~0 context tokens.

An MCP server & plugin for Claude Desktop and Claude Code that converts PDFs, Office docs, images, and audio to Markdown on your machine, then digests them into a searchable knowledge graph + mind map — privately, with no cloud and no API keys.

CI Release PyPI Downloads Python License: MIT Platforms Token cost

Quickstart · Why token-free · Features · How it works · Tools · Use cases · Comparison · Config · Platforms · Privacy · FAQ

100% local · free & open-source · auto-installing · Apple-silicon optimised · by GRU-953


The idea in one line: Claude tokens are expensive; your computer's compute is free. So every heavy step — converting documents, extracting knowledge, embedding, summarising — runs locally, and Claude only ever gets back a tiny tool result. Digesting a 500-page folder costs roughly zero context tokens.

Point Memorised them All at a folder. It converts every attachment to Markdown locally, then builds a layered knowledge graph — a global synopsis, per-theme summaries, per-document notes, an exportable Markdown bundle, and an offline interactive mind map — and lets Claude recall from it for next to nothing.

"Memorise everything in ~/Documents/research."
"What did my documents say about the Q3 budget?"
"Open the mind map."

🚀 Quickstart

Claude DesktopClaude Code

Download memorised-them-all.mcpb from the latest release and double-click it (Settings → Extensions). It bootstraps the local stack on first launch.

/plugin marketplace add GRU-953/memorised-them-all
/plugin install memorised-them-all
pip (any OS)Homebrew (macOS/Linux)
pip install memorised-them-all
mta status
mta digest ~/Documents/research
brew install GRU-953/memorised-them-all/mta

Requirements: Python ≥ 3.10. The installer fetches everything else (Ollama, Tesseract, ffmpeg, the latest MarkItDown, and local models) — see Platform support.

💡 Why token-free

Most "chat with your documents" tools stream document text into the model — you pay tokens to ingest and to recall. Memorised them All never does that:

Step Where it runs Tokens to Claude
Convert PDF / Office / image / audio → Markdown your machine (MarkItDown · Tesseract · Whisper · Ollama) 0
Extract entities · relations · facts your machine (local LLM, classical fallback) 0
Embed · resolve · build graph · summarise your machine (Ollama · NetworkX) 0
digest result counts & paths only (~140 tokens)
recall result a small, citable slice — never the documents

Tool results are hard-capped in size, so the guarantee holds even on the high-accuracy path.

✨ Features

  • 📄 Universal local conversion — PDF, Word, Excel, PowerPoint, HTML, EPub, Outlook .msg, CSV/JSON/XML, images (OCR + vision captioning), and audio (on-device transcription) → clean Markdown. Scanned PDFs are OCR'd; OCR in 100+ languages when the matching Tesseract language packs are installed (e.g. eng+ben).
  • 🕸️ Layered knowledge graph (GraphRAG-style) — entities, typed relations and atomic facts, grouped into themes by community detection, with a global synopsis and per-theme summaries — all built by local models.
  • 🧭 Offline interactive mind map — a single self-contained mindmap.html (Cytoscape inlined, zero network).
  • 📝 Exportable, portable memorygraph.json, memory.md, and per-document notes you can copy to any machine and reuse.
  • Two modes — high-accuracy (local LLM) and fast mode (--fast): deterministic and much faster — it skips the per-chunk LLM, so the factor scales with corpus size and model (benchmarked ≈25–100× with the default 7B extractor: ≈98× on a 5-file set, ≈26× on 12 files) — ideal for large or frequently-refreshed corpora.
  • 🔁 Reusable named projects — accumulate many folders into one memory; forget to delete one.
  • 🍎 Apple-silicon first — performance-core parallelism, GPU Whisper via MLX, unified-memory-aware concurrency. Runs on Intel macOS, Linux, and Windows too.
  • ⚙️ Auto-installing & auto-updating — installs a pinned MarkItDown from PyPI so the first run works offline, and refreshes it on a throttled daily check (import-checked, with rollback); the latest upstream MarkItDown is opt-in (MTA_MARKITDOWN_UPSTREAM=on, pinned to a commit). Starts the model server on demand and stops it after 5 minutes idle.
  • 🌍 Multilingual — Unicode-aware entity resolution (Bengali, CJK, Cyrillic, accented Latin) and OCR in many languages.
  • 🛟 Crash-safe & reusable — memory is written atomically, so an interrupted digest never corrupts an existing project; recall reports a low_confidence signal so Claude can decline when the answer isn't in your docs.
  • 🔒 Private by design — no cloud, no API keys, no telemetry. Your files never leave your computer.

🧠 How it works

attachments ─► CONVERT ─► SEGMENT ─► EMBED ─► EXTRACT ─► RESOLVE ─► GRAPH + COMMUNITIES ─► MATERIALISE
  pdf/docx/   MarkItDown  structure  nomic-   local LLM  canonical  NetworkX +             graph.json
  xlsx/img/   +Tesseract  +semantic  embed-   triples +  entities   Leiden / Louvain        memory.md
  audio/...   +Whisper    chunking   text     facts      (embed +   community summaries     memory/<doc>.md
              +Ollama                          (+class-   fuzzy +    (local LLM)             mindmap.html
               vision                           ical      acronym)                           vectors store
                                                fallback)
                                                                              │
   recall("…")  ◄── embed query locally ──  return ONLY a small, citable slice (themes + facts + provenance)

Everything between attachments and recall happens on your machine. The local-LLM step has a dependency-free classical fallback, so a digest always succeeds — even offline, even before any model is downloaded — and gets sharper once models are present.

🛠 Tools

Eight token-free MCP tools (plus the mta CLI). Every result is metadata or a small slice — document contents never return to the conversation.

Tool What it does
digest(paths, project?, reset?, fast?) convert + digest files/dirs/globs; accumulates into the project (reset starts fresh, fast skips the LLM)
recall(query, project?, k?) answer from memory — a small, citable slice (+ top_score & low_confidence relevance signal)
memory_overview(project?) synopsis + themes
export_memory(dest, project?) export portable Markdown + graph + mind map
list_digestible(directory) list convertible files (paths/sizes only)
memory_status() local stack health (Ollama, models, Tesseract, MarkItDown version)
open_mindmap(project?) path to the offline interactive mind map
forget(project?) delete a project's memory

CLI: mta digest <paths> [--fast] [--reset] · mta recall "<q>" · mta overview · mta export <dir> · mta mindmap --open · mta forget · mta status · mta update · mta doctor. In Claude Code, the slash commands /memorise, /recall, /memory-map, /memory-status, /export-memory are also available.

🎯 Use cases

  • Private RAG / "chat with your documents" — locally, with no cloud and no per-query token cost.
  • Research & literature memory — digest a folder of papers/PDFs and ask grounded questions.
  • Knowledge base for an agent — give Claude durable, reusable memory across sessions.
  • Meeting & audio notes — transcribe recordings on-device and fold them into the graph.
  • Scanned documents & receipts — OCR image-only PDFs and images into searchable memory.
  • Visual exploration — open the offline mind map to see how entities and themes connect.

⚖️ Comparison

Memorised them All Cloud "chat with docs" / hosted RAG Stock markitdown-mcp
Runs fully locally ✅ (conversion only)
Context-token cost to ingest and recall ~0 high high (returns text)
Knowledge graph + themes sometimes
Offline interactive mind map
Works offline / no API keys
Reusable, exportable memory files varies
Free & open-source (MIT) varies

⚙️ Configuration

All optional, sensible defaults; set via environment (CLI) or the extension settings (Desktop).

Variable Default Meaning
MTA_HOME ~/.memorised-them-all where memories are stored
MTA_FAST off fast mode — skip the LLM (deterministic; benchmarked ≈25–100× faster, scales with corpus/model)
MTA_EXTRACT_MODEL qwen2.5:7b local LLM for extraction & summaries
MTA_EMBED_MODEL nomic-embed-text local embedding model
MTA_VISION_MODEL moondream image captioning
MTA_OCR_LANG eng Tesseract languages, e.g. eng+ben
MTA_WHISPER_MODEL base on-device transcription model
MTA_IDLE 300 seconds idle before Ollama is stopped
MTA_WORKERS / MTA_EXTRACT_WORKERS 0 (auto) parallel conversion / extraction workers
MTA_MAX_CHUNKS / MTA_MAX_FILE_MB 1500 / 200 workload & input-size caps (reported)
MTA_RECALL_MIN_SCORE 0 (off) drop recall hits below this cosine score (stricter grounding)
MTA_AUTO_UPDATE on daily update check: on (PyPI, default) · off · upstream (also pull the pinned upstream MarkItDown)
MTA_MARKITDOWN_UPSTREAM off pull the latest upstream MarkItDown commit (pinned to a SHA) instead of the PyPI build
MTA_NO_OLLAMA unset hard offline switch (classical + hashing)
MTA_BACKEND auto inference backend: auto/ollama, or an OpenAI-compatible server — lmstudio · llamacpp · vllm · openai (see Use a different model server)
MTA_BACKEND_URL / MTA_BACKEND_KEY auto / unset base URL + bearer key for an OpenAI-compatible backend (URL defaults to the right loopback port)
MTA_PROFILE unset tuning profile: laptop · workstation · server · offline (an explicit MTA_* variable always wins)
MTA_HTTP_* off opt-in HTTP transport — see Remote access

🌐 Remote access (HTTP transport)

The server speaks stdio by default — how Claude Desktop and Claude Code launch it — and opens no network socket. For other MCP clients, or to reach one running engine from several tools, you can additionally serve the same eight token-free tools over MCP Streamable HTTP. It is opt-in and secure by construction:

mta serve --http          # binds 127.0.0.1:8765 and prints a bearer token

On start it prints a ready-to-paste connection command:

claude mcp add --transport http memorised-them-all \
  http://127.0.0.1:8765/mcp --header "Authorization: Bearer <TOKEN>"

Loopback-only unless you pass --allow-remote; a mandatory bearer token (auto-generated and stored 0600 at $MTA_HOME/state/http_token, or set via MTA_HTTP_TOKEN); and DNS-rebinding protection on by default. An unauthenticated GET /healthz is the only open route. To expose it beyond localhost, terminate TLS at a reverse proxy first — see SECURITY.md.

Variable Default Meaning
MTA_HTTP_HOST / MTA_HTTP_PORT 127.0.0.1 / 8765 HTTP bind address (loopback-guarded)
MTA_HTTP_PATH /mcp endpoint path
MTA_HTTP_TOKEN auto bearer token (auto-generated 0600 if unset)
MTA_HTTP_ALLOW_REMOTE off permit a non-loopback bind (network-exposed)
MTA_HTTP_ALLOWED_HOSTS / MTA_HTTP_ALLOWED_ORIGINS unset extra Host/Origin allowlist entries (reverse proxy)

🔌 Use from other AIs (tool schemas)

The same eight tools can be described to non-MCP clients in the three dominant dialects, so OpenAI, Google Gemini, or any OpenAPI consumer can call this local engine. The schemas are generated from the live MCP tool registry — never hand-maintained — so they can't drift from the tools the server actually serves:

mta export-schema                    # all three dialects → stdout (JSON)
mta export-schema --format openai    # OpenAI tools / function-calling array
mta export-schema --format gemini    # Gemini function_declarations (OpenAPI-subset)
mta export-schema --format openapi   # OpenAPI 3.1 document (POST /tools/{name})
mta export-schema --out ./schemas    # write openai.json · gemini.json · openapi.json

Pure and offline: it only reads the in-process tool definitions and prints JSON — it starts no server and returns nothing through the model.

To actually call the tools without MCP, run the local REST gateway — it serves exactly that OpenAPI 3.1 surface:

mta serve --rest     # POST http://127.0.0.1:8765/tools/{name} with a JSON arg body

Same hardening as mta serve --http (loopback-only, one mandatory bearer token shared with the MCP transport, Host allowlist); GET /openapi.json returns the live schema and GET /healthz is an unauthenticated liveness probe. Example:

curl -s http://127.0.0.1:8765/tools/recall \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"query":"deadlines","project":"contracts"}'

Not sure which snippet you need? mta recipes prints ready-to-paste connection setup for every surface — Claude Code (stdio/HTTP), Claude Desktop, the REST gateway, and OpenAI/Gemini — all pointing at the same eight tools.

🧩 Use a different model server

By default the LLM (extraction + summaries) and embeddings run on Ollama, started on demand and stopped when idle. To point them at another local model server instead, set MTA_BACKEND (and usually MTA_BACKEND_URL):

# LM Studio (defaults to http://127.0.0.1:1234/v1)
MTA_BACKEND=lmstudio MTA_EXTRACT_MODEL=your-chat-model MTA_EMBED_MODEL=your-embed-model mta digest ./docs

# llama.cpp server (http://127.0.0.1:8080/v1), or any OpenAI-compatible endpoint
MTA_BACKEND=openai MTA_BACKEND_URL=http://127.0.0.1:8080/v1 mta digest ./docs

The server must speak the OpenAI API (/v1/chat/completions + /v1/embeddings). Set MTA_EXTRACT_MODEL / MTA_EMBED_MODEL to that server's model IDs, and MTA_BACKEND_KEY if it needs a token. Image OCR and audio transcription still use Ollama / local tools. The backend defaults to loopback — pointing it at a non-local URL sends content off your machine (your explicit choice; a one-time warning is printed). If the backend is unreachable, a digest still succeeds via the classical/offline fallback.

🐳 Run in Docker

A multi-arch image (linux/amd64 + linux/arm64) is published to GHCR. It serves the eight tools over MCP Streamable HTTP; a bearer token is generated and printed on first start:

docker run -d --name mta -p 127.0.0.1:8765:8765 -v mta-data:/data \
  ghcr.io/gru-953/memorised-them-all:latest
docker logs mta          # copy the printed token + the ready-to-paste `claude mcp add …`

Ollama isn't bundled — set MTA_BACKEND_URL to a model server, or run fully offline (classical + hashing). The /data volume holds your memory; mount documents read-only (-v /path/to/docs:/docs:ro) and digest the in-container path.

💻 Platform support

Apple M-series is the primary, most-optimised target; other platforms use portable fallbacks.

Platform Status Notes
macOS (Apple silicon) ✅ optimised performance-core pool, MLX GPU Whisper, unified-memory-aware
macOS (Intel) ✅ supported physical-core sizing via psutil, CPU Whisper
Linux ✅ supported apt/dnf/pacman install paths, CUDA Whisper if a GPU is present
Windows 🧪 experimental pip install memorised-them-all + mta serve (or python launch.py). The .mcpb bundle is macOS/Linux only.

CI runs the test suite across Ubuntu, macOS, and Windows on Python 3.10 & 3.12.

📦 Generated files & reuse

Each project under MTA_HOME/projects/<name>/ is self-contained and portable:

File What it is
graph.json source of truth — nodes, edges, communities, layered summaries (version-stamped, no absolute paths)
memory.md compact, layered digest for reading / pasting
memory/<doc>.md one note per source document
mindmap.html offline interactive graph (Cytoscape inlined)
vectors.npz + vectors.json local embeddings for recall

A memory built once can be copied to another machine and reused read-only. export_memory bundles all of the above (including the vector store) into a folder you choose.

Versioning & migration: graph.json is a versioned schema (the project follows SemVer and Keep a Changelog). On upgrade, an older store is migrated in place so existing memories stay recall-readable; a store written by a newer build is backed up under projects/<name>/backups/ before anything overwrites it, so a downgrade never loses data. Public CLI flags and MCP tool signatures are preserved across minor versions.

🔒 Privacy & security

100% local — no cloud APIs, no telemetry, no API keys. Your documents, the graph, the embeddings, and the memory files never leave your machine. The only network access is (a) downloading open-source dependencies/models on install and (b) a throttled once-a-day dependency-update check (disable with MTA_AUTO_UPDATE=off).

Hardened for processing untrusted files: argv-only subprocesses (no curl | sh), path-safe and collision-free output names, per-file size + decompression-bomb caps, prompt-injection data-delimiting, allow_pickle=False, and hard-capped recall results. See the full threat model in the docs.

❓ FAQ

Does it really cost no tokens? Conversion and digestion cost zero Claude tokens. recall returns a small, hard-capped slice (a few summaries/facts), so answers are far cheaper than pasting documents into chat.

Do I need a GPU or downloaded models? No. The classical extractor and hashing embeddings keep the pipeline working with no models and offline; quality improves once Ollama and the models are present.

Where are my files? Under ~/.memorised-them-all/projects/<project>/. export_memory copies them anywhere.

Is my existing Ollama affected? No — a running Ollama is reused and left alone. Only an instance this tool starts is stopped on idle.

What's "fast mode"? --fast (or MTA_FAST=on) skips the local LLM for a fully deterministic, much faster digest (benchmarked ≈25–100× — ≈98× on a 5-file set, ≈26× on 12 files; scales with corpus and model) that still builds the graph and keeps semantic recall — ideal for large or frequently-updated corpora.

What if the answer isn't in my documents? Each recall result includes top_score and a low_confidence flag (and you can set MTA_RECALL_MIN_SCORE to drop weak hits), so Claude can say "that's not in your memory" instead of inventing an answer.

Does it work with non-English documents? Yes — entity resolution is Unicode-aware (Bengali, CJK, Cyrillic, accented Latin) and OCR supports many languages via MTA_OCR_LANG (e.g. eng+ben).

Which file types are supported? PDF (incl. scanned), DOCX, XLSX/XLS, PPTX, HTML, EPub, Outlook .msg, RTF, CSV/TSV, JSON/XML, Markdown/text, images (PNG/JPG/…), and audio (WAV/MP3/M4A/…).

✅ Quality & testing

Exercised hard: a multi-format corpus (Office, PDF, scanned PDF, OCR images, audio), a growing regression suite, green CI on three OSes, and repeated multi-agent + GitHub Copilot reviews covering accuracy, reliability, token-safety, reusability, cross-platform, and security. The token-free guarantee is enforced (recall slices are hard-capped) and the digest never returns document contents to the model. A committed offline eval harness (eval/run_eval.py) digests a reference corpus and gates retrieval recall@k in CI, so quality regressions fail the build.

🙏 Acknowledgements

Built on the shoulders of excellent open-source work — see ACKNOWLEDGEMENTS.md. In particular: Microsoft MarkItDown, Ollama, Tesseract, OpenAI Whisper / faster-whisper / Apple MLX, NetworkX, Leiden / igraph, and Cytoscape.js. Design inspiration from graphify and the author's own markitdown-mcp and mnemo-mcp.

📄 License

MIT © 2026 Aninda Sundar Howlader (GRU-953).


Keywords: Claude · MCP · Model Context Protocol · MCP server · Claude Desktop · Claude Code · local RAG · knowledge graph · GraphRAG · document memory · chat with your documents · token-free · offline · privacy · Ollama · MarkItDown · OCR · PDF to Markdown · Word/Excel/PowerPoint to Markdown · vector search · mind map · Apple silicon



If this is useful, a ⭐ helps others find it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memorised_them_all-1.5.0.tar.gz (319.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memorised_them_all-1.5.0-py3-none-any.whl (314.8 kB view details)

Uploaded Python 3

File details

Details for the file memorised_them_all-1.5.0.tar.gz.

File metadata

  • Download URL: memorised_them_all-1.5.0.tar.gz
  • Upload date:
  • Size: 319.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for memorised_them_all-1.5.0.tar.gz
Algorithm Hash digest
SHA256 9678ac6f7151f0f5c1759c5bfe047c90699f3ff6087cde4d5c1512af0a5b81ed
MD5 17b021e8a0c642e328148ff95771329a
BLAKE2b-256 943ea99ee846407cb1d3648828d02cbb03c957ef63422be178bb48e9beb71ae2

See more details on using hashes here.

Provenance

The following attestation bundles were made for memorised_them_all-1.5.0.tar.gz:

Publisher: release.yml on GRU-953/memorised-them-all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file memorised_them_all-1.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for memorised_them_all-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eaa7b27b42a9506218e5559e85723a54c7e82329ddd3cced83670c33074e6c41
MD5 9be45ebda12c720d4f7be460c316bfa6
BLAKE2b-256 be8898ea8e8a981bdbac82c72b46b13a9f28c901e68a0e1df86293a220f4a8a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for memorised_them_all-1.5.0-py3-none-any.whl:

Publisher: release.yml on GRU-953/memorised-them-all

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page