archiver-rag

Semantic RAG for Obsidian vaults via MCP

Project description

Archiver RAG

A finding aid for your knowledge graph

The agent-agnostic memory management system for your Obsidian vault

Archiver RAG turns your Obsidian vault into a live, queryable knowledge graph that any MCP-compatible AI agent can search, update, and reorganize — without ever leaving its native interface.

Connect it once. Every agent you use (Claude Code, Cursor, Gemini CLI, or your own) gets semantic search, automatic knowledge logging, wikilink-aware graph traversal, and vault health monitoring out of the box.

How it works

Your Obsidian vault (.md files)
         ↓  file watcher + ingest pipeline
     ChromaDB  (persistent vector store)
         ↓  MCP server
     Any MCP-compatible agent

Three layers make search smarter than plain embeddings:

Contextual prefix — each chunk is embedded with its note's metadata (folder, tags, wikilinks), so vectors carry structural context
Rich metadata filtering — ChromaDB stores folder, tags, incoming link count, and wikilinks for filtered retrieval
Graph reranking — after vector search, results are re-scored by wikilink proximity to a context note and hub importance

The file watcher runs as a background service. Edit a note in Obsidian, save it, and it's indexed and auto-linked within seconds — no manual sync needed.

Features

Semantic search with graph reranking — finds notes by meaning, then boosts results connected via wikilinks
Auto-linking — after every ingest, appends a ## Related section with [[wikilinks]] to build the knowledge graph automatically
Knowledge logging — create dated, categorized notes (decision, lesson, gotcha, pattern, …) from any agent
Vault health — single call returns orphaned notes, broken links, missing frontmatter, tag stats, and recent activity
Wikilink-aware reorganization — move files and every [[link]] across the vault is rewritten automatically
Smart clustering — label-propagation algorithm groups notes by wikilink structure and suggests folder organization
Agent-agnostic — exposes a standard MCP interface; works with any MCP-compatible client

Requirements

Python >= 3.10
pipx (recommended for installation)
An Obsidian vault (local .md files)
An MCP-compatible agent (Claude Code, Cursor, etc.)

Installation

pipx install archiver-rag

Use pipx, not pip install — pipx creates an isolated environment and exposes the CLI globally on PATH, which is required for MCP registration to find the correct executable.

For local development from a clone of this repo, use pipx install --editable . instead.

Setup

Run the one-time setup wizard:

archiver-rag init

This will:

Ask for your vault path
Index your vault into ChromaDB
Register the MCP server in ~/.claude.json (or prompt you to do it manually for other clients)
Install the background watcher as a launchd agent (Mac) or systemd service (Linux)

MCP registration (manual)

If you prefer to register manually, add this to your MCP client config:

{
  "mcpServers": {
    "archiver-rag": {
      "command": "/path/to/archiver-rag",
      "args": ["serve"]
    }
  }
}

Find the executable path with which archiver-rag.

For Claude Code specifically, use:

claude mcp add --scope user archiver-rag $(which archiver-rag) serve

Agent instructions (skills)

Registering the MCP server gives an agent access to the tools — but agents tend to fall back on their own internal memory instead of reaching for the vault. The instruction files in skill/ fix that: they enforce a vault-first rule so the agent searches and stores knowledge in your vault before anything else.

What the skill enforces:

Before answering or reading source files — call search_vault first; only fall back to internal memory if the vault returns nothing relevant
When something important is missing from the vault — proactively log_note it. If a fact, decision, or piece of context matters to the overall picture and a search_vault came back empty, record it so the knowledge graph grows instead of letting that context die in a single session
After solving a non-trivial problem — call log_note to capture the decision/lesson/gotcha back into the vault
The vault is the authoritative memory system — internal agent memory is a fallback only

A version is provided for each agent, since each loads instructions differently:

Agent	File	Install to
Claude Code	`skill/claude-code/SKILL.md`	`~/.claude/skills/archiver-rag/SKILL.md` (on-demand skill)
OpenCode	`skill/opencode/AGENTS.md`	project root `AGENTS.md` or `~/.config/opencode/AGENTS.md`
Codex CLI	`skill/codex/AGENTS.md`	project root `AGENTS.md` or `~/.codex/AGENTS.md`
GitHub Copilot	`skill/copilot/copilot-instructions.md`	`.github/copilot-instructions.md`

Each file is self-contained — it includes the MCP registration snippet for that agent plus the full vault-first rules and tool reference. For Claude Code the file is an on-demand skill; for the others it's an always-on instruction file (loaded into every session), which makes the vault-first behavior unconditional.

CLI reference

archiver-rag init              # one-time setup wizard
archiver-rag start             # start the background watcher service
archiver-rag stop              # stop the service
archiver-rag restart           # restart the service
archiver-rag status            # check if service is running
archiver-rag index             # force re-index the entire vault
archiver-rag search "query"    # test semantic search from the terminal
archiver-rag health            # chunk count and index peek
archiver-rag logs              # tail the service log

# Knowledge logging
archiver-rag log "Title" --type decision --tag arch --related NoteA

# Clustering
archiver-rag cluster                      # suggest folder groupings
archiver-rag cluster --apply              # move files automatically
archiver-rag place <note>                 # suggest folder for a single note
archiver-rag place <note> --apply         # move it immediately

# Config
archiver-rag config --auto-cluster        # enable auto-clustering in the watcher
archiver-rag config --cluster-threshold 5 # notes before a full re-cluster

archiver-rag uninstall         # remove all data, service, and MCP registration

MCP tools (for agents)

Once registered, agents have access to 7 tools:

Tool	What it does
`search_vault`	Semantic search with graph reranking. Accepts a `context_note` to boost wikilink neighbors.
`vault_status`	Vault structure, health diagnostics, tag stats, and recent activity in one call.
`get_connections`	BFS wikilink traversal — outgoing and incoming links up to depth 3.
`move_notes`	Move files and auto-rewrite all `[[wikilinks]]` across the vault.
`log_note`	Create a dated knowledge note; watcher indexes and auto-links it immediately.
`cluster_note`	Suggest a folder for one note based on where its wikilink neighbors live.
`cluster_vault`	Label-propagation clustering of the entire vault with folder suggestions.

Configuration

All runtime config lives at ~/.archiver-rag/config.json:

{
  "vault_path": "/path/to/your/vault",
  "install_path": "/Users/you/.archiver-rag",
  "chroma_path": "/Users/you/.archiver-rag/chroma_db",
  "auto_cluster": false,
  "cluster_threshold": 5
}

auto_cluster — automatically suggest and apply folder placement for new notes via the watcher.
cluster_threshold — number of new notes created before triggering a full cluster_vault run.

The knowledge graph model

The vault is treated as a knowledge graph, not a file hierarchy. Notes are nodes; wikilinks are edges. Relationships range from tight (direct links) to loose (semantic proximity surfaced by search).

Note types are expressed through frontmatter, not folder structure:

---
type: decision
tags: [architecture, async]
related: [[AsyncLocalStorage]], [[PrismaExtensions]]
date: 2026-04-27
---

The ## Related section at the bottom of each note is managed automatically by the auto-linker after every ingest. Don't edit it manually — it will be overwritten.

Roadmap

Features on the way:

RAG-Anything integration — extend ingestion beyond Markdown to handle PDFs, Office documents, images, and other file types, so the vault can become a true multi-format knowledge base rather than .md-only.
Archiver subagents — dedicated subagents that take over vault management (search, logging, reorganization, clustering) on the main agent's behalf, so the primary agent can delegate knowledge work instead of context-switching into it.

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

archiver_rag-0.1.0.tar.gz (3.9 MB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

archiver_rag-0.1.0-py3-none-any.whl (30.2 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file archiver_rag-0.1.0.tar.gz.

File metadata

Download URL: archiver_rag-0.1.0.tar.gz
Upload date: Jun 12, 2026
Size: 3.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for archiver_rag-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`bc8faa4b8f607d1650bb44f4df4d61f7021d4041c400db5e1beef406cad4acc8`
MD5	`f5b927cf8019e74a5d6cfb9681770a76`
BLAKE2b-256	`622288079b68a79586b5a05c0eac43d75c2af44d301820aa1465865f768d0f0d`

See more details on using hashes here.

File details

Details for the file archiver_rag-0.1.0-py3-none-any.whl.

File metadata

Download URL: archiver_rag-0.1.0-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 30.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for archiver_rag-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88e9c549edca303c7bdae40e8addd4cdfe2e0bdce1ce183df6fab6fc632a2f10`
MD5	`054fd3417be662b55ddff078a90bd13b`
BLAKE2b-256	`42d31e0d9a12349ce1a435a9fbd98a9b3fb630fada26820c38a8471a688f955d`

See more details on using hashes here.

archiver-rag 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

How it works

Features

Requirements

Installation

Setup

MCP registration (manual)

Agent instructions (skills)

CLI reference

MCP tools (for agents)

Configuration

The knowledge graph model

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes