Skip to main content

Semantic RAG for Obsidian vaults via MCP

Project description

Archiver RAG

Archiver RAG

The agent-agnostic memory management system through Obsidian-esque techniques

Archiver RAG turns your Obsidian vault into a live, queryable knowledge graph that any MCP-compatible AI agent can search, update, and reorganize — without ever leaving its native interface.

Connect it once. Every agent you use (Claude Code, Cursor, Gemini CLI, or your own) gets semantic search, automatic knowledge logging, wikilink-aware graph traversal, and vault health monitoring out of the box.


How it works

Your Obsidian vault (.md files)
         ↓  file watcher + ingest pipeline
     ChromaDB  (persistent vector store)
         ↓  MCP server
     Any MCP-compatible agent

Three layers make search smarter than plain embeddings:

  1. Contextual prefix — each chunk is embedded with its note's metadata (folder, tags, wikilinks), so vectors carry structural context
  2. Rich metadata filtering — ChromaDB stores folder, tags, incoming link count, and wikilinks for filtered retrieval
  3. Graph reranking — after vector search, results are re-scored by wikilink proximity to a context note and hub importance

The file watcher runs as a background service. Edit a note in Obsidian, save it, and it's indexed and auto-linked within seconds — no manual sync needed.


Features

  • Semantic search with graph reranking — finds notes by meaning, then boosts results connected via wikilinks
  • Auto-linking — after every ingest, appends a ## Related section with [[wikilinks]] to build the knowledge graph automatically
  • Knowledge logging — create dated, categorized notes (decision, lesson, gotcha, pattern, …) from any agent
  • Vault health — single call returns orphaned notes, broken links, missing frontmatter, tag stats, and recent activity
  • Wikilink-aware reorganization — move files and every [[link]] across the vault is rewritten automatically
  • Smart clustering — label-propagation algorithm groups notes by wikilink structure and suggests folder organization
  • Agent-agnostic — exposes a standard MCP interface; works with any MCP-compatible client

Requirements

  • Python >= 3.10
  • pipx (recommended for installation)
  • An Obsidian vault (local .md files)
  • An MCP-compatible agent (Claude Code, Cursor, etc.)

Installation

pipx install --editable .

Use pipx, not pip install -e . — pipx creates an isolated environment and exposes the CLI globally on PATH, which is required for MCP registration to find the correct executable.


Setup

Run the one-time setup wizard:

archiver-rag init

This will:

  1. Ask for your vault path
  2. Index your vault into ChromaDB
  3. Register the MCP server in ~/.claude.json (or prompt you to do it manually for other clients)
  4. Install the background watcher as a launchd agent (Mac) or systemd service (Linux)

MCP registration (manual)

If you prefer to register manually, add this to your MCP client config:

{
  "mcpServers": {
    "archiver-rag": {
      "command": "/path/to/archiver-rag",
      "args": ["serve"]
    }
  }
}

Find the executable path with which archiver-rag.

For Claude Code specifically, use:

claude mcp add --scope user archiver-rag $(which archiver-rag) serve

Agent instructions (skills)

Registering the MCP server gives an agent access to the tools — but agents tend to fall back on their own internal memory instead of reaching for the vault. The instruction files in skill/ fix that: they enforce a vault-first rule so the agent searches and stores knowledge in your vault before anything else.

What the skill enforces:

  • Before answering or reading source files — call search_vault first; only fall back to internal memory if the vault returns nothing relevant
  • When something important is missing from the vault — proactively log_note it. If a fact, decision, or piece of context matters to the overall picture and a search_vault came back empty, record it so the knowledge graph grows instead of letting that context die in a single session
  • After solving a non-trivial problem — call log_note to capture the decision/lesson/gotcha back into the vault
  • The vault is the authoritative memory system — internal agent memory is a fallback only

A version is provided for each agent, since each loads instructions differently:

Agent File Install to
Claude Code skill/claude-code/SKILL.md ~/.claude/skills/archiver-rag/SKILL.md (on-demand skill)
OpenCode skill/opencode/AGENTS.md project root AGENTS.md or ~/.config/opencode/AGENTS.md
Codex CLI skill/codex/AGENTS.md project root AGENTS.md or ~/.codex/AGENTS.md
GitHub Copilot skill/copilot/copilot-instructions.md .github/copilot-instructions.md

Each file is self-contained — it includes the MCP registration snippet for that agent plus the full vault-first rules and tool reference. For Claude Code the file is an on-demand skill; for the others it's an always-on instruction file (loaded into every session), which makes the vault-first behavior unconditional.


CLI reference

archiver-rag init              # one-time setup wizard
archiver-rag start             # start the background watcher service
archiver-rag stop              # stop the service
archiver-rag restart           # restart the service
archiver-rag status            # check if service is running
archiver-rag index             # force re-index the entire vault
archiver-rag search "query"    # test semantic search from the terminal
archiver-rag health            # chunk count and index peek
archiver-rag logs              # tail the service log

# Knowledge logging
archiver-rag log "Title" --type decision --tag arch --related NoteA

# Clustering
archiver-rag cluster                      # suggest folder groupings
archiver-rag cluster --apply              # move files automatically
archiver-rag place <note>                 # suggest folder for a single note
archiver-rag place <note> --apply         # move it immediately

# Config
archiver-rag config --auto-cluster        # enable auto-clustering in the watcher
archiver-rag config --cluster-threshold 5 # notes before a full re-cluster

archiver-rag uninstall         # remove all data, service, and MCP registration

MCP tools (for agents)

Once registered, agents have access to 7 tools:

Tool What it does
search_vault Semantic search with graph reranking. Accepts a context_note to boost wikilink neighbors.
vault_status Vault structure, health diagnostics, tag stats, and recent activity in one call.
get_connections BFS wikilink traversal — outgoing and incoming links up to depth 3.
move_notes Move files and auto-rewrite all [[wikilinks]] across the vault.
log_note Create a dated knowledge note; watcher indexes and auto-links it immediately.
cluster_note Suggest a folder for one note based on where its wikilink neighbors live.
cluster_vault Label-propagation clustering of the entire vault with folder suggestions.

Configuration

All runtime config lives at ~/.archiver-rag/config.json:

{
  "vault_path": "/path/to/your/vault",
  "install_path": "/Users/you/.archiver-rag",
  "chroma_path": "/Users/you/.archiver-rag/chroma_db",
  "auto_cluster": false,
  "cluster_threshold": 5
}

auto_cluster — automatically suggest and apply folder placement for new notes via the watcher.
cluster_threshold — number of new notes created before triggering a full cluster_vault run.


The knowledge graph model

The vault is treated as a knowledge graph, not a file hierarchy. Notes are nodes; wikilinks are edges. Relationships range from tight (direct links) to loose (semantic proximity surfaced by search).

Note types are expressed through frontmatter, not folder structure:

---
type: decision
tags: [architecture, async]
related: [[AsyncLocalStorage]], [[PrismaExtensions]]
date: 2026-04-27
---

The ## Related section at the bottom of each note is managed automatically by the auto-linker after every ingest. Don't edit it manually — it will be overwritten.


Roadmap

Features on the way:

  • RAG-Anything integration — extend ingestion beyond Markdown to handle PDFs, Office documents, images, and other file types, so the vault can become a true multi-format knowledge base rather than .md-only.
  • Archiver subagents — dedicated subagents that take over vault management (search, logging, reorganization, clustering) on the main agent's behalf, so the primary agent can delegate knowledge work instead of context-switching into it.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archiver_rag-1.0.0.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

archiver_rag-1.0.0-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file archiver_rag-1.0.0.tar.gz.

File metadata

  • Download URL: archiver_rag-1.0.0.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for archiver_rag-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1892454a70e014059c7c1ca3f617ae275819067b604facfc5f4ecbb69cabb42e
MD5 e1e1c5bdb51b1fea61d883a917b6c26d
BLAKE2b-256 daed3f13f1c8ec13e03c99ed84bc98d99a1fe8bff607cfd4077f7073b5fa4c4f

See more details on using hashes here.

File details

Details for the file archiver_rag-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: archiver_rag-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for archiver_rag-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78141d791db01a8bf69a1d5f820ee66cc02229cf10d0afc474b94a122f256bb2
MD5 16a9d1d3dfe326f255e383bcfd1a1937
BLAKE2b-256 b2988515bb9d95d8769091249f19266d216f471eae8b84f3c45d8f3b7755f5c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page