Configuration-driven RAG MCP server: ingest, watch, index, and search arbitrary knowledge sources behind a stable MCP tool surface.
Project description
ragsync
A configuration-driven Model Context Protocol server that ingests data from arbitrary sources, watches them for changes, indexes them into vector stores, and exposes a small, stable set of tools an LLM agent can call to search and retrieve that knowledge.
Guiding principle: everything that varies between deployments lives in a YAML file; the tool surface the LLM sees never changes. Adding a source, swapping an embedding model, or pointing at a different vector store is a config edit, not a code change.
See DESIGN.md for the full specification and
AGENTS.md for contributor guidance.
Install
uv sync # core dependencies
uv sync --extra openai # optional hosted embedding provider
uv sync --extra dev # test dependencies (pytest)
The default embedding provider, fastembed, runs locally (ONNX, CPU) and needs
no API key — the server works out of the box. Model weights download on first
run.
Run
uv run ragsync --config examples/config.example.yaml
The server reads the config, builds one pipeline per source, runs an initial index, starts a change watcher for each watched source, and begins serving MCP tools over stdio.
Live config reload
The config file itself is watched. Editing it applies changes without a restart: new sources are built and indexed, removed sources are dropped, and changed sources are rebuilt — unchanged sources keep running untouched. An edit that fails validation is logged and ignored; the running server is never left in a broken state.
Use from an MCP client (Cursor, Claude, …)
The server speaks MCP over stdio, so any MCP-compatible client launches it as
a subprocess. Clients share the same mcpServers JSON shape; only the file
location differs:
| Client | Config file |
|---|---|
| Cursor | .cursor/mcp.json (project) or ~/.cursor/mcp.json (global) |
| Claude Desktop | claude_desktop_config.json |
| Claude Code | .mcp.json (or claude mcp add) |
| Windsurf / others | their mcpServers config |
Paths inside the config are resolved against the config file's directory (not the client's working directory), so a config can live in the repo and reference repo content with relative paths like
path: ./docs. Give--configitself an absolute path, though — the client chooses where it launches the server from, so that's the one path it must be able to find unambiguously.
Option A — run in place with uv (no install)
uv run --directory runs the server from the cloned repo without installing it:
{
"mcpServers": {
"ragsync": {
"command": "uv",
"args": [
"run",
"--directory",
"/abs/path/to/ragsync-mcp",
"ragsync",
"--config",
"/abs/path/to/ragsync-mcp/examples/config.example.yaml"
]
}
}
}
Option B — install the command, then reference it
uv tool install /abs/path/to/ragsync-mcp # provides the `ragsync` command
{
"mcpServers": {
"ragsync": {
"command": "ragsync",
"args": ["--config", "/abs/path/to/config.yaml"]
}
}
}
(Equivalently, "command": "python", "args": ["-m", "ragsync_mcp", "--config", "…"]
if the package is installed in the active environment.)
Hosted embedding keys
For openai/voyage sources, the config names an env var (api_key_env) rather
than the key itself. Provide that variable to the subprocess via env:
{
"mcpServers": {
"ragsync": {
"command": "ragsync",
"args": ["--config", "/abs/path/to/config.yaml"],
"env": { "OPENAI_API_KEY": "sk-..." }
}
}
}
After saving, restart/reload the client. It will list the five tools (search,
list_sources, get_document, get_index_status, reindex); the agent calls
search to answer questions from your indexed sources. First launch downloads
the local embedding model, so initial startup can take a little longer.
Configuration
A single YAML file defines global defaults and a list of sources. Each
source becomes one searchable collection with its own loader, chunking,
embedding model, vector-store collection, and watcher. Per-source isolation lets
different sources use different embedding models safely.
defaults:
chunking:
{ strategy: recursive_character, chunk_size: 800, chunk_overlap: 100 }
embedding: { provider: fastembed, model: BAAI/bge-small-en-v1.5 }
vector_store: { backend: chroma, persist_directory: ./vector_db }
sources:
- name: product-docs
type: folder
description: Product documentation and how-to guides.
connection:
path: ./docs # relative to the config file's directory
include: ["**/*.md"]
exclude: ["**/internal/**"]
watch: { enabled: true, mode: filesystem }
chunking: { strategy: markdown, chunk_size: 1000, chunk_overlap: 150 }
vector_store: { collection: product_docs }
metadata: { product: example, audience: public }
The examples/ directory has runnable configs:
config.example.yaml— a complete multi-source example (pointed at the sample content underexamples/docsandexamples/playbooks).folder.yaml— a singlefoldersource.website.yaml— a singlewebsitesource.
Source types
| type | description | watch modes |
|---|---|---|
folder |
local/mounted directory of files (text, PDF) | filesystem, poll |
website |
fixed list of web pages (fetched, not crawled) | poll |
Include/exclude globs use gitignore-style matching (e.g. **/internal/**).
Embedding providers
fastembed (local, default), openai, and voyage (hosted). Hosted providers
read their API key from the environment variable named by api_key_env — keys
are never written into config.
MCP tools
Five tools, deliberately small and source-agnostic. They never change as sources are added:
search— semantic search across one or all sources, with optional metadata filtering. Returns results with normalized[0, 1]scores.list_sources— discover available sources and their health/metadata.get_document— fetch a full document aftersearchsurfaces a chunk.get_index_status— indexing freshness/health for one source or all.reindex— force a full re-scan of a source.
Tools return structured {"error": "..."} objects rather than raising, so the
calling agent can recover conversationally.
Access scoping
Per-source isolation is a security boundary: scope access by running separate server instances with separate configs. There is no cross-instance "search everything" path.
Development
uv run pytest
Tests run fully offline by injecting a deterministic embedder in place of
fastembed (see tests/conftest.py). The architecture and extension contract —
how to add a new source type — are documented in AGENTS.md.
Releasing
Releases are automated from Conventional Commits.
CI (.github/workflows/ci.yml) runs the test suite on every pull request. On
merge to main, the release workflow (.github/workflows/release.yml) runs the
tests again, then python-semantic-release
inspects the commits since the last tag and decides the next version:
| Commit type | Example | Version bump |
|---|---|---|
fix: |
fix: handle empty PDF pages |
patch — 0.1.0 → 0.1.1 |
feat: |
feat: add notion loader |
minor — 0.1.0 → 0.2.0 |
feat!: / BREAKING CHANGE: |
feat!: drop python 3.9 |
major — 0.1.0 → 1.0.0 |
docs: / chore: / test: / ci: / refactor: |
— | no release |
When there is a releasable change it bumps version in pyproject.toml, updates
CHANGELOG.md, tags the commit, creates a GitHub release, and publishes the
package to PyPI. Once published, anyone can run it with
uvx ragsync --config <path> (or pip install ragsync).
One-time setup (repo maintainer):
- On PyPI, add a Trusted
Publisher to the
ragsyncproject: ownerjsbroks, repositoryragsync-mcp, workflowrelease.yml. This lets the workflow publish via OIDC with no stored token. (Alternatively, add aPYPI_API_TOKENsecret and setpassword:in the publish step.) - If
mainis a protected branch, allow the release workflow to push the version-bump commit (a repository ruleset bypass forgithub-actions[bot], or a PAT with push access).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragsync-0.2.0.tar.gz.
File metadata
- Download URL: ragsync-0.2.0.tar.gz
- Upload date:
- Size: 314.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2adb76fc2ca5934a2537c55ff5076c9d186cc93c4da52b86ab284738fcf7ae35
|
|
| MD5 |
57a5731e8c254f398855a44653ff207f
|
|
| BLAKE2b-256 |
f81e4da7c669c845ea6237091a0658779a5e9e6f79e424be21d58c02093afe7c
|
Provenance
The following attestation bundles were made for ragsync-0.2.0.tar.gz:
Publisher:
release.yml on jsbroks/ragsync-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragsync-0.2.0.tar.gz -
Subject digest:
2adb76fc2ca5934a2537c55ff5076c9d186cc93c4da52b86ab284738fcf7ae35 - Sigstore transparency entry: 1827608085
- Sigstore integration time:
-
Permalink:
jsbroks/ragsync-mcp@2c1e13e7c50f7cf2352423b19b1294385cda2620 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jsbroks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2c1e13e7c50f7cf2352423b19b1294385cda2620 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ragsync-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ragsync-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dd3ca46a470b79a563472ffbbc6f212fae54291d3bc6f264c026c1011beba2b
|
|
| MD5 |
ecf2369267f9400c74c079ffe4214ff4
|
|
| BLAKE2b-256 |
611e4a60e3e3ea232100408333489b0396a3d12d57ab06c1d3a7fc5f830120c8
|
Provenance
The following attestation bundles were made for ragsync-0.2.0-py3-none-any.whl:
Publisher:
release.yml on jsbroks/ragsync-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ragsync-0.2.0-py3-none-any.whl -
Subject digest:
8dd3ca46a470b79a563472ffbbc6f212fae54291d3bc6f264c026c1011beba2b - Sigstore transparency entry: 1827608219
- Sigstore integration time:
-
Permalink:
jsbroks/ragsync-mcp@2c1e13e7c50f7cf2352423b19b1294385cda2620 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jsbroks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2c1e13e7c50f7cf2352423b19b1294385cda2620 -
Trigger Event:
push
-
Statement type: