Skip to main content

Generic markdown vault MCP server with FTS5 + semantic search

Project description

markdown-vault-mcp

CI codecov PyPI Python License Docker Docs llms.txt

A generic markdown collection MCP server with FTS5 full-text search, semantic vector search, frontmatter-aware indexing, incremental reindexing, and non-markdown attachment support.

Documentation | PyPI | Docker

Point it at a directory of Markdown files (an Obsidian vault, a docs folder, a Zettelkasten) and it exposes search, read, write, and edit tools over the Model Context Protocol.

Features

  • Full-text search — SQLite FTS5 with BM25 scoring, porter stemming
  • Semantic search — cosine similarity over embedding vectors (Ollama, OpenAI, or Sentence Transformers)
  • Hybrid search — Reciprocal Rank Fusion combining FTS5 and vector results
  • Frontmatter-aware — indexes YAML frontmatter fields, supports required field enforcement
  • Incremental reindexing — hash-based change detection, only re-processes modified files
  • Write operations — create, edit, delete, rename documents with automatic index updates
  • Attachment support — read, write, delete, and list non-markdown files (PDFs, images, etc.)
  • Git integration — optional auto-commit and push on every write via GIT_ASKPASS
  • OIDC authentication — optional token-based auth for HTTP deployments (Authelia, Keycloak, etc.)
  • MCP tools — 13 tools including search, read, write, edit, delete, rename, and admin operations
  • MCP resources — 6 resources exposing vault configuration, statistics, tags, folders, and document outlines
  • MCP prompts — 5 prompt templates for summarizing, researching, discussing, comparing, and finding related notes

Installation

From PyPI

pip install markdown-vault-mcp

With optional dependencies:

pip install markdown-vault-mcp[mcp]            # FastMCP server
pip install markdown-vault-mcp[embeddings-api]  # Ollama/OpenAI embeddings via HTTP
pip install markdown-vault-mcp[all]             # MCP + API embeddings (lightweight, no PyTorch)
pip install markdown-vault-mcp[all-local]       # + sentence-transformers + PyTorch (large)

[all] vs [all-local]: The [all] extra is lightweight and does not include sentence-transformers or PyTorch. Use [all-local] if you want local CPU/GPU embeddings without Ollama. The Docker image uses [all].

From source

git clone https://github.com/pvliesdonk/markdown-vault-mcp.git
cd markdown-vault-mcp
pip install -e ".[all,dev]"

Docker

docker pull ghcr.io/pvliesdonk/markdown-vault-mcp:latest

The Docker image uses [all] (MCP + API embeddings). It does not include sentence-transformers or PyTorch — use Ollama or OpenAI for embeddings. For local sentence-transformers, build from source with [all-local].

Quick Start

As a library

from pathlib import Path
from markdown_vault_mcp import Collection

collection = Collection(source_dir=Path("/path/to/vault"))
results = collection.search("query text", limit=10)

As an MCP server

export MARKDOWN_VAULT_MCP_SOURCE_DIR=/path/to/vault
markdown-vault-mcp serve

With Docker Compose

  1. Copy an example env file:

    cp examples/obsidian-readonly.env .env
    
  2. Edit .env to set MARKDOWN_VAULT_MCP_SOURCE_DIR to the absolute path of your vault on the host.

  3. Start the service:

    docker compose up -d
    
  4. Check the logs:

    docker compose logs -f markdown-vault-mcp
    

Example env files

File Description
examples/obsidian-readonly.env Obsidian vault, read-only, Ollama embeddings
examples/obsidian-readwrite.env Obsidian vault, read-write with git auto-commit
examples/obsidian-oidc.env Obsidian vault, read-only, OIDC authentication (Authelia)
examples/ifcraftcorpus.env Strict frontmatter enforcement, read-only corpus

For reverse proxy (Traefik) and deployment setup, see docs/deployment.md.

Configuration

All configuration is via environment variables with the MARKDOWN_VAULT_MCP_ prefix (except embedding provider settings, which use their own conventions).

Core

Variable Default Required Description
MARKDOWN_VAULT_MCP_SOURCE_DIR Yes Path to the markdown vault directory
MARKDOWN_VAULT_MCP_READ_ONLY true No Set to false to enable write operations
MARKDOWN_VAULT_MCP_INDEX_PATH in-memory No Path to the SQLite FTS5 index file; set for persistence across restarts
MARKDOWN_VAULT_MCP_EMBEDDINGS_PATH disabled No Path to the numpy embeddings file; required to enable semantic search
MARKDOWN_VAULT_MCP_STATE_PATH {SOURCE_DIR}/.markdown_vault_mcp/state.json No Path to the change-tracking state file
MARKDOWN_VAULT_MCP_INDEXED_FIELDS No Comma-separated frontmatter fields to promote to the tag index for structured filtering
MARKDOWN_VAULT_MCP_REQUIRED_FIELDS No Comma-separated frontmatter fields required on every document; documents missing any are excluded from the index
MARKDOWN_VAULT_MCP_EXCLUDE No Comma-separated glob patterns to exclude from scanning (e.g. .obsidian/**,.trash/**)

Server identity

Variable Default Description
MARKDOWN_VAULT_MCP_SERVER_NAME markdown-vault-mcp MCP server name shown to clients; useful for multi-instance setups
MARKDOWN_VAULT_MCP_INSTRUCTIONS (auto) System-level instructions injected into LLM context; defaults to a description that reflects read-only vs read-write state

Search and embeddings

Variable Default Description
EMBEDDING_PROVIDER auto-detect Embedding provider: ollama, openai, or sentence-transformers (not MARKDOWN_VAULT_MCP_-prefixed)
OLLAMA_HOST http://localhost:11434 Ollama server URL (not MARKDOWN_VAULT_MCP_-prefixed)
OPENAI_API_KEY OpenAI API key for the OpenAI embedding provider (not MARKDOWN_VAULT_MCP_-prefixed)
MARKDOWN_VAULT_MCP_OLLAMA_MODEL nomic-embed-text Ollama embedding model name
MARKDOWN_VAULT_MCP_OLLAMA_CPU_ONLY false Force Ollama to use CPU only

Git integration

Git integration supports:

  • Periodic pull (ff-only): keeps the server's working tree up to date with the remote. Works in read-only mode.
  • Auto-commit + push on write: commits each MCP write and pushes after an idle delay. Requires MARKDOWN_VAULT_MCP_READ_ONLY=false.
Variable Default Description
MARKDOWN_VAULT_MCP_GIT_PULL_INTERVAL_S 600 Seconds between git fetch + ff-only update attempts; 0 disables periodic pull
MARKDOWN_VAULT_MCP_GIT_TOKEN GitHub/GitLab PAT; when set, every write triggers a git commit and deferred push via GIT_ASKPASS
MARKDOWN_VAULT_MCP_GIT_PUSH_DELAY_S 30 Seconds of write-idle time before pushing; 0 = push only on shutdown
MARKDOWN_VAULT_MCP_GIT_COMMIT_NAME markdown-vault-mcp Git committer name for auto-commits; set this in Docker where git config user.name is empty
MARKDOWN_VAULT_MCP_GIT_COMMIT_EMAIL noreply@markdown-vault-mcp Git committer email for auto-commits
MARKDOWN_VAULT_MCP_GIT_LFS true Enable Git LFS — runs git lfs pull on startup to fetch LFS-tracked attachments (PDFs, images). Set to false for repos without LFS.

Attachments

Non-markdown file support. See Attachments for details.

Variable Default Description
MARKDOWN_VAULT_MCP_ATTACHMENT_EXTENSIONS (built-in list) Comma-separated allowed extensions without dot (e.g. pdf,png,jpg); use * to allow all non-.md files
MARKDOWN_VAULT_MCP_MAX_ATTACHMENT_SIZE_MB 10.0 Maximum attachment size in MB for reads and writes; 0 disables the limit

OIDC authentication

Optional token-based authentication for HTTP deployments. OIDC activates when all four required variables are set. See Authentication for setup details.

Variable Required Description
MARKDOWN_VAULT_MCP_BASE_URL Yes Public base URL of the server (e.g. https://mcp.example.com)
MARKDOWN_VAULT_MCP_OIDC_CONFIG_URL Yes OIDC discovery endpoint (e.g. https://auth.example.com/.well-known/openid-configuration)
MARKDOWN_VAULT_MCP_OIDC_CLIENT_ID Yes OIDC client ID registered with your provider
MARKDOWN_VAULT_MCP_OIDC_CLIENT_SECRET Yes OIDC client secret
MARKDOWN_VAULT_MCP_OIDC_JWT_SIGNING_KEY No JWT signing key; required on Linux/Docker — the default is ephemeral and invalidates tokens on restart. Generate with openssl rand -hex 32
MARKDOWN_VAULT_MCP_OIDC_AUDIENCE No Expected JWT audience claim; leave unset if your provider does not set one
MARKDOWN_VAULT_MCP_OIDC_REQUIRED_SCOPES No Comma-separated required scopes; default openid

CLI Reference

markdown-vault-mcp <command> [options]

serve

Start the MCP server.

markdown-vault-mcp serve [--transport {stdio|sse|http}] [--host HOST] [--port PORT]
Flag Default Description
--transport stdio MCP transport: stdio (stdin/stdout, default), sse (Server-Sent Events), http (streamable-HTTP). Use http for Docker with a reverse proxy or when OIDC is enabled.
--host 0.0.0.0 Bind host for the http transport (ignored for stdio and sse)
--port 8000 Port for the http transport (ignored for stdio and sse)

index

Build the full-text search index.

markdown-vault-mcp index [--source-dir PATH] [--index-path PATH] [--force]

search

Search the collection from the CLI.

markdown-vault-mcp search <query> [-n LIMIT] [-m {keyword|semantic|hybrid}] [--folder PATH] [--json]

reindex

Incrementally reindex the vault (only processes changed files).

markdown-vault-mcp reindex [--source-dir PATH] [--index-path PATH]

MCP Tools

Tool Description
search Hybrid full-text + semantic search with optional frontmatter filters
read Read a document or attachment by relative path
write Create or overwrite a document or attachment
edit Replace a unique text span in a document (notes only)
delete Delete a document or attachment and its index entries
rename Rename/move a document or attachment, updating all index entries
list_documents List indexed documents; pass include_attachments=true to also list non-markdown files
list_folders List all folder paths in the vault
list_tags List all unique frontmatter tag values
reindex Force a full reindex of the vault
stats Get collection statistics (document count, chunk count, etc.)
build_embeddings Build or rebuild vector embeddings for semantic search
embeddings_status Check embedding provider and index status

Write tools (write, edit, delete, rename) are only available when MARKDOWN_VAULT_MCP_READ_ONLY=false.

Resources

MCP resources expose vault metadata as structured JSON that clients can read directly without invoking tools.

URI Description
config://vault Current collection configuration (source dir, indexed fields, read-only state, etc.)
stats://vault Collection statistics (document count, chunk count, embedding count, etc.)
tags://vault All frontmatter tag values grouped by indexed field
tags://vault/{field} Tag values for a specific indexed frontmatter field (template)
folders://vault All folder paths in the vault
toc://vault/{path} Table of contents (heading outline) for a specific document (template)

Prompts

Prompt templates guide the LLM through multi-step workflows using the vault tools.

Prompt Parameters Description
summarize path Read a document and produce a structured summary with key themes and takeaways
research topic Search for a topic, synthesize findings, and create a new note at research/{topic}.md
discuss path Analyze a document and suggest improvements using edit (not write)
related path Find related notes via search and suggest cross-references as markdown links
compare path1, path2 Read two documents and produce a side-by-side comparison

Write prompts (research, discuss) are only available when MARKDOWN_VAULT_MCP_READ_ONLY=false.

Attachments

In addition to Markdown notes, the server can read, write, delete, rename, and list non-markdown files (PDFs, images, spreadsheets, etc.). All existing tools are overloaded — no new tool names.

How it works

Path dispatch is extension-based: a path ending in .md is treated as a note; any other path is treated as an attachment if the extension is in the allowlist. The kind field on returned objects distinguishes the two: "note" or "attachment".

Reading attachments

read returns base64-encoded content for binary attachments:

{
  "path": "assets/diagram.pdf",
  "mime_type": "application/pdf",
  "size_bytes": 12345,
  "content_base64": "<base64 string>",
  "modified_at": 1741564800.0
}

Writing attachments

write accepts a content_base64 parameter for binary content:

{ "path": "assets/diagram.pdf", "content_base64": "<base64 string>" }

Listing attachments

list_documents with include_attachments=true returns both notes and attachments:

[
  { "path": "notes/intro.md", "kind": "note", "title": "Intro", "folder": "notes", "frontmatter": {}, "modified_at": 1741564800.0 },
  { "path": "assets/diagram.pdf", "kind": "attachment", "folder": "assets", "mime_type": "application/pdf", "size_bytes": 12345, "modified_at": 1741564800.0 }
]

Default allowed extensions

pdf, docx, xlsx, pptx, odt, ods, odp, png, jpg, jpeg, gif, webp, svg, bmp, tiff, zip, tar, gz, mp3, mp4, wav, ogg, txt, csv, tsv, json, yaml, toml, xml, html, css, js, ts

Override with MARKDOWN_VAULT_MCP_ATTACHMENT_EXTENSIONS. Use * to allow all non-.md files.

Hidden directories: Attachments inside hidden directories (.git/, .obsidian/, .markdown_vault_mcp/, etc.) are never listed, regardless of extension settings. MARKDOWN_VAULT_MCP_EXCLUDE patterns are also applied to attachments.

Authentication

OIDC authentication is optional and activates automatically when all four required variables (BASE_URL, OIDC_CONFIG_URL, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET) are set.

OIDC requires --transport http (or sse). It has no effect with --transport stdio.

Setup with Authelia

Note: Authelia does not support Dynamic Client Registration (RFC 7591). Clients must be registered manually in configuration.yml.

  1. Register the client in Authelia:

    identity_providers:
      oidc:
        clients:
          - client_id: markdown-vault-mcp
            client_secret: '$pbkdf2-sha512$...'   # authelia crypto hash generate
            redirect_uris:
              - https://mcp.example.com/auth/callback
            grant_types: [authorization_code]
            response_types: [code]
            pkce_challenge_method: S256
            scopes: [openid, profile, email]
    
  2. Set the environment variables (see also examples/obsidian-oidc.env):

    MARKDOWN_VAULT_MCP_BASE_URL=https://mcp.example.com
    MARKDOWN_VAULT_MCP_OIDC_CONFIG_URL=https://auth.example.com/.well-known/openid-configuration
    MARKDOWN_VAULT_MCP_OIDC_CLIENT_ID=markdown-vault-mcp
    MARKDOWN_VAULT_MCP_OIDC_CLIENT_SECRET=your-client-secret
    MARKDOWN_VAULT_MCP_OIDC_JWT_SIGNING_KEY=$(openssl rand -hex 32)
    
  3. Start with HTTP transport:

    markdown-vault-mcp serve --transport http --port 8000
    

JWT signing key

The FastMCP default signing key is ephemeral (regenerated on startup), which forces clients to re-authenticate after every restart. Set MARKDOWN_VAULT_MCP_OIDC_JWT_SIGNING_KEY to a stable random secret to avoid this:

# Generate once, store in your .env file
openssl rand -hex 32

Development

git clone https://github.com/pvliesdonk/markdown-vault-mcp.git
cd markdown-vault-mcp
uv pip install -e ".[all,dev]"

# Run tests
uv run python -m pytest tests/ -x -q

# Lint and format
ruff check src/ tests/
ruff format src/ tests/

# Type check
mypy src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdown_vault_mcp-1.6.0.tar.gz (374.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdown_vault_mcp-1.6.0-py3-none-any.whl (70.0 kB view details)

Uploaded Python 3

File details

Details for the file markdown_vault_mcp-1.6.0.tar.gz.

File metadata

  • Download URL: markdown_vault_mcp-1.6.0.tar.gz
  • Upload date:
  • Size: 374.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for markdown_vault_mcp-1.6.0.tar.gz
Algorithm Hash digest
SHA256 57816336145f6cabcb8c97a505f0617f1f5543af352e530228922ed8ccc9942b
MD5 48f3e439ffcef828476612f5ee99c260
BLAKE2b-256 c68ffda496ac046bd538868aab2460cd5424ad10f0ba98fdcd9928776cb3369f

See more details on using hashes here.

Provenance

The following attestation bundles were made for markdown_vault_mcp-1.6.0.tar.gz:

Publisher: release.yml on pvliesdonk/markdown-vault-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markdown_vault_mcp-1.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for markdown_vault_mcp-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a624030dbce01376d1554f5c8367c66660fe07ca7c1fd49a56f1ab0166490d8
MD5 140e8e4820d3f74df355ea5394367195
BLAKE2b-256 202e940f105766522d55c2abb573a972b51d9a11d308d7cac81022c18f93355a

See more details on using hashes here.

Provenance

The following attestation bundles were made for markdown_vault_mcp-1.6.0-py3-none-any.whl:

Publisher: release.yml on pvliesdonk/markdown-vault-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page