Semantic RAG for Obsidian vaults via MCP
Project description
Archiver RAG
The agent-agnostic memory management system through Obsidian-esque techniques
Archiver RAG turns your Obsidian vault into a live, queryable knowledge graph that any MCP-compatible AI agent can search, update, and reorganize — without ever leaving its native interface.
Connect it once. Every agent you use (Claude Code, Cursor, Gemini CLI, or your own) gets semantic search, automatic knowledge logging, wikilink-aware graph traversal, and vault health monitoring out of the box.
How it works
Your Obsidian vault (.md files)
↓ file watcher + ingest pipeline
ChromaDB (persistent vector store)
↓ MCP server
Any MCP-compatible agent
Three layers make search smarter than plain embeddings:
- Contextual prefix — each chunk is embedded with its note's metadata (folder, tags, wikilinks), so vectors carry structural context
- Rich metadata filtering — ChromaDB stores folder, tags, incoming link count, and wikilinks for filtered retrieval
- Graph reranking — after vector search, results are re-scored by wikilink proximity to a context note and hub importance
The file watcher runs as a background service. Edit a note in Obsidian, save it, and it's indexed and auto-linked within seconds — no manual sync needed.
Features
- Semantic search with graph reranking — finds notes by meaning, then boosts results connected via wikilinks
- Auto-linking — after every ingest, appends a
## Relatedsection with[[wikilinks]]to build the knowledge graph automatically - Knowledge logging — create dated, categorized notes (
decision,lesson,gotcha,pattern, …) from any agent - Vault health — single call returns orphaned notes, broken links, missing frontmatter, tag stats, and recent activity
- Wikilink-aware reorganization — move files and every
[[link]]across the vault is rewritten automatically - Smart clustering — label-propagation algorithm groups notes by wikilink structure and suggests folder organization
- Agent-agnostic — exposes a standard MCP interface; works with any MCP-compatible client
Requirements
- Python >= 3.10
- pipx (recommended for installation)
- An Obsidian vault (local
.mdfiles) - An MCP-compatible agent (Claude Code, Cursor, etc.)
Installation
pipx install --editable .
Use
pipx, notpip install -e .— pipx creates an isolated environment and exposes the CLI globally onPATH, which is required for MCP registration to find the correct executable.
Setup
Run the one-time setup wizard:
archiver-rag init
This will:
- Ask for your vault path
- Index your vault into ChromaDB
- Register the MCP server in
~/.claude.json(or prompt you to do it manually for other clients) - Install the background watcher as a launchd agent (Mac) or systemd service (Linux)
MCP registration (manual)
If you prefer to register manually, add this to your MCP client config:
{
"mcpServers": {
"archiver-rag": {
"command": "/path/to/archiver-rag",
"args": ["serve"]
}
}
}
Find the executable path with which archiver-rag.
For Claude Code specifically, use:
claude mcp add --scope user archiver-rag $(which archiver-rag) serve
Agent instructions (skills)
Registering the MCP server gives an agent access to the tools — but agents tend to fall back on their own internal memory instead of reaching for the vault. The instruction files in skill/ fix that: they enforce a vault-first rule so the agent searches and stores knowledge in your vault before anything else.
What the skill enforces:
- Before answering or reading source files — call
search_vaultfirst; only fall back to internal memory if the vault returns nothing relevant - When something important is missing from the vault — proactively
log_noteit. If a fact, decision, or piece of context matters to the overall picture and asearch_vaultcame back empty, record it so the knowledge graph grows instead of letting that context die in a single session - After solving a non-trivial problem — call
log_noteto capture the decision/lesson/gotcha back into the vault - The vault is the authoritative memory system — internal agent memory is a fallback only
A version is provided for each agent, since each loads instructions differently:
| Agent | File | Install to |
|---|---|---|
| Claude Code | skill/claude-code/SKILL.md |
~/.claude/skills/archiver-rag/SKILL.md (on-demand skill) |
| OpenCode | skill/opencode/AGENTS.md |
project root AGENTS.md or ~/.config/opencode/AGENTS.md |
| Codex CLI | skill/codex/AGENTS.md |
project root AGENTS.md or ~/.codex/AGENTS.md |
| GitHub Copilot | skill/copilot/copilot-instructions.md |
.github/copilot-instructions.md |
Each file is self-contained — it includes the MCP registration snippet for that agent plus the full vault-first rules and tool reference. For Claude Code the file is an on-demand skill; for the others it's an always-on instruction file (loaded into every session), which makes the vault-first behavior unconditional.
CLI reference
archiver-rag init # one-time setup wizard
archiver-rag start # start the background watcher service
archiver-rag stop # stop the service
archiver-rag restart # restart the service
archiver-rag status # check if service is running
archiver-rag index # force re-index the entire vault
archiver-rag search "query" # test semantic search from the terminal
archiver-rag health # chunk count and index peek
archiver-rag logs # tail the service log
# Knowledge logging
archiver-rag log "Title" --type decision --tag arch --related NoteA
# Clustering
archiver-rag cluster # suggest folder groupings
archiver-rag cluster --apply # move files automatically
archiver-rag place <note> # suggest folder for a single note
archiver-rag place <note> --apply # move it immediately
# Config
archiver-rag config --auto-cluster # enable auto-clustering in the watcher
archiver-rag config --cluster-threshold 5 # notes before a full re-cluster
archiver-rag uninstall # remove all data, service, and MCP registration
MCP tools (for agents)
Once registered, agents have access to 7 tools:
| Tool | What it does |
|---|---|
search_vault |
Semantic search with graph reranking. Accepts a context_note to boost wikilink neighbors. |
vault_status |
Vault structure, health diagnostics, tag stats, and recent activity in one call. |
get_connections |
BFS wikilink traversal — outgoing and incoming links up to depth 3. |
move_notes |
Move files and auto-rewrite all [[wikilinks]] across the vault. |
log_note |
Create a dated knowledge note; watcher indexes and auto-links it immediately. |
cluster_note |
Suggest a folder for one note based on where its wikilink neighbors live. |
cluster_vault |
Label-propagation clustering of the entire vault with folder suggestions. |
Configuration
All runtime config lives at ~/.archiver-rag/config.json:
{
"vault_path": "/path/to/your/vault",
"install_path": "/Users/you/.archiver-rag",
"chroma_path": "/Users/you/.archiver-rag/chroma_db",
"auto_cluster": false,
"cluster_threshold": 5
}
auto_cluster — automatically suggest and apply folder placement for new notes via the watcher.
cluster_threshold — number of new notes created before triggering a full cluster_vault run.
The knowledge graph model
The vault is treated as a knowledge graph, not a file hierarchy. Notes are nodes; wikilinks are edges. Relationships range from tight (direct links) to loose (semantic proximity surfaced by search).
Note types are expressed through frontmatter, not folder structure:
---
type: decision
tags: [architecture, async]
related: [[AsyncLocalStorage]], [[PrismaExtensions]]
date: 2026-04-27
---
The ## Related section at the bottom of each note is managed automatically by the auto-linker after every ingest. Don't edit it manually — it will be overwritten.
Roadmap
Features on the way:
- RAG-Anything integration — extend ingestion beyond Markdown to handle PDFs, Office documents, images, and other file types, so the vault can become a true multi-format knowledge base rather than
.md-only. - Archiver subagents — dedicated subagents that take over vault management (search, logging, reorganization, clustering) on the main agent's behalf, so the primary agent can delegate knowledge work instead of context-switching into it.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file archiver_rag-1.0.0.tar.gz.
File metadata
- Download URL: archiver_rag-1.0.0.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1892454a70e014059c7c1ca3f617ae275819067b604facfc5f4ecbb69cabb42e
|
|
| MD5 |
e1e1c5bdb51b1fea61d883a917b6c26d
|
|
| BLAKE2b-256 |
daed3f13f1c8ec13e03c99ed84bc98d99a1fe8bff607cfd4077f7073b5fa4c4f
|
File details
Details for the file archiver_rag-1.0.0-py3-none-any.whl.
File metadata
- Download URL: archiver_rag-1.0.0-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78141d791db01a8bf69a1d5f820ee66cc02229cf10d0afc474b94a122f256bb2
|
|
| MD5 |
16a9d1d3dfe326f255e383bcfd1a1937
|
|
| BLAKE2b-256 |
b2988515bb9d95d8769091249f19266d216f471eae8b84f3c45d8f3b7755f5c0
|