Configuration-driven RAG MCP server: ingest, watch, index, and search arbitrary knowledge sources behind a stable MCP tool surface.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jsbroks

These details have not been verified by PyPI

Project description

ragsync

A configuration-driven Model Context Protocol server that ingests data from arbitrary sources, watches them for changes, indexes them into vector stores, and exposes a small, stable set of tools an LLM agent can call to search and retrieve that knowledge.

Guiding principle: everything that varies between deployments lives in a YAML file; the tool surface the LLM sees never changes. Adding a source, swapping an embedding model, or pointing at a different vector store is a config edit, not a code change.

See DESIGN.md for the full specification and AGENTS.md for contributor guidance.

Install

uv sync                      # core dependencies
uv sync --extra openai       # optional hosted embedding provider
uv sync --extra dev          # test dependencies (pytest)

The default embedding provider, fastembed, runs locally (ONNX, CPU) and needs no API key — the server works out of the box. Model weights download on first run.

Run

uv run ragsync --config examples/config.example.yaml

The server reads the config, builds one pipeline per source, runs an initial index, starts a change watcher for each watched source, and begins serving MCP tools over stdio.

Live config reload

The config file itself is watched. Editing it applies changes without a restart: new sources are built and indexed, removed sources are dropped, and changed sources are rebuilt — unchanged sources keep running untouched. An edit that fails validation is logged and ignored; the running server is never left in a broken state.

Use from an MCP client (Cursor, Claude, …)

The server speaks MCP over stdio, so any MCP-compatible client launches it as a subprocess. Clients share the same mcpServers JSON shape; only the file location differs:

Client	Config file
Cursor	`.cursor/mcp.json` (project) or `~/.cursor/mcp.json` (global)
Claude Desktop	`claude_desktop_config.json`
Claude Code	`.mcp.json` (or `claude mcp add`)
Windsurf / others	their `mcpServers` config

Paths inside the config are resolved against the config file's directory (not the client's working directory), so a config can live in the repo and reference repo content with relative paths like path: ./docs. Give --config itself an absolute path, though — the client chooses where it launches the server from, so that's the one path it must be able to find unambiguously.

Option A — run in place with uv (no install)

uv run --directory runs the server from the cloned repo without installing it:

{
  "mcpServers": {
    "ragsync": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/abs/path/to/ragsync-mcp",
        "ragsync",
        "--config",
        "/abs/path/to/ragsync-mcp/examples/config.example.yaml"
      ]
    }
  }
}

Option B — install the command, then reference it

uv tool install /abs/path/to/ragsync-mcp        # provides the `ragsync` command

{
  "mcpServers": {
    "ragsync": {
      "command": "ragsync",
      "args": ["--config", "/abs/path/to/config.yaml"]
    }
  }
}

(Equivalently, "command": "python", "args": ["-m", "ragsync_mcp", "--config", "…"] if the package is installed in the active environment.)

Hosted embedding keys

For openai/voyage sources, the config names an env var (api_key_env) rather than the key itself. Provide that variable to the subprocess via env:

{
  "mcpServers": {
    "ragsync": {
      "command": "ragsync",
      "args": ["--config", "/abs/path/to/config.yaml"],
      "env": { "OPENAI_API_KEY": "sk-..." }
    }
  }
}

After saving, restart/reload the client. It will list the five tools (search, list_sources, get_document, get_index_status, reindex); the agent calls search to answer questions from your indexed sources. First launch downloads the local embedding model, so initial startup can take a little longer.

Configuration

A single YAML file defines global defaults and a list of sources. Each source becomes one searchable collection with its own loader, chunking, embedding model, vector-store collection, and watcher. Per-source isolation lets different sources use different embedding models safely.

defaults:
  chunking:
    { strategy: recursive_character, chunk_size: 800, chunk_overlap: 100 }
  embedding: { provider: fastembed, model: BAAI/bge-small-en-v1.5 }
  vector_store: { backend: chroma, persist_directory: ./vector_db }

sources:
  - name: product-docs
    type: folder
    description: Product documentation and how-to guides.
    connection:
      path: ./docs # relative to the config file's directory
      include: ["**/*.md"]
      exclude: ["**/internal/**"]
    watch: { enabled: true, mode: filesystem }
    chunking: { strategy: markdown, chunk_size: 1000, chunk_overlap: 150 }
    vector_store: { collection: product_docs }
    metadata: { product: example, audience: public }

The examples/ directory has runnable configs:

config.example.yaml — a complete multi-source example (pointed at the sample content under examples/docs and examples/playbooks).
folder.yaml — a single folder source.
website.yaml — a single website source.

Source types

type	description	watch modes
`folder`	local/mounted directory of files (text, PDF)	`filesystem`, `poll`
`website`	fixed list of web pages (fetched, not crawled)	`poll`

Include/exclude globs use gitignore-style matching (e.g. **/internal/**).

Embedding providers

fastembed (local, default), openai, and voyage (hosted). Hosted providers read their API key from the environment variable named by api_key_env — keys are never written into config.

MCP tools

Five tools, deliberately small and source-agnostic. They never change as sources are added:

search — semantic search across one or all sources, with optional metadata filtering. Returns results with normalized [0, 1] scores.
list_sources — discover available sources and their health/metadata.
get_document — fetch a full document after search surfaces a chunk.
get_index_status — indexing freshness/health for one source or all.
reindex — force a full re-scan of a source.

Tools return structured {"error": "..."} objects rather than raising, so the calling agent can recover conversationally.

Access scoping

Per-source isolation is a security boundary: scope access by running separate server instances with separate configs. There is no cross-instance "search everything" path.

Development

uv run pytest

Tests run fully offline by injecting a deterministic embedder in place of fastembed (see tests/conftest.py). The architecture and extension contract — how to add a new source type — are documented in AGENTS.md.

Releasing

Releases are automated from Conventional Commits. CI (.github/workflows/ci.yml) runs the test suite on every pull request. On merge to main, the release workflow (.github/workflows/release.yml) runs the tests again, then python-semantic-release inspects the commits since the last tag and decides the next version:

Commit type	Example	Version bump
`fix:`	`fix: handle empty PDF pages`	patch — `0.1.0 → 0.1.1`
`feat:`	`feat: add notion loader`	minor — `0.1.0 → 0.2.0`
`feat!:` / `BREAKING CHANGE:`	`feat!: drop python 3.9`	major — `0.1.0 → 1.0.0`
`docs:` / `chore:` / `test:` / `ci:` / `refactor:`	—	no release

When there is a releasable change it bumps version in pyproject.toml, updates CHANGELOG.md, tags the commit, creates a GitHub release, and publishes the package to PyPI. Once published, anyone can run it with uvx ragsync --config <path> (or pip install ragsync).

One-time setup (repo maintainer):

On PyPI, add a Trusted Publisher to the ragsync project: owner jsbroks, repository ragsync-mcp, workflow release.yml. This lets the workflow publish via OIDC with no stored token. (Alternatively, add a PYPI_API_TOKEN secret and set password: in the publish step.)
If main is a protected branch, allow the release workflow to push the version-bump commit (a repository ruleset bypass for github-actions[bot], or a PAT with push access).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jsbroks

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragsync-0.2.0.tar.gz (314.5 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ragsync-0.2.0-py3-none-any.whl (32.3 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file ragsync-0.2.0.tar.gz.

File metadata

Download URL: ragsync-0.2.0.tar.gz
Upload date: Jun 15, 2026
Size: 314.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragsync-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`2adb76fc2ca5934a2537c55ff5076c9d186cc93c4da52b86ab284738fcf7ae35`
MD5	`57a5731e8c254f398855a44653ff207f`
BLAKE2b-256	`f81e4da7c669c845ea6237091a0658779a5e9e6f79e424be21d58c02093afe7c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragsync-0.2.0.tar.gz:

Publisher: release.yml on jsbroks/ragsync-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragsync-0.2.0.tar.gz
- Subject digest: 2adb76fc2ca5934a2537c55ff5076c9d186cc93c4da52b86ab284738fcf7ae35
- Sigstore transparency entry: 1827608085
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: jsbroks/ragsync-mcp@2c1e13e7c50f7cf2352423b19b1294385cda2620
- Branch / Tag: refs/heads/main
- Owner: https://github.com/jsbroks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2c1e13e7c50f7cf2352423b19b1294385cda2620
- Trigger Event: push

File details

Details for the file ragsync-0.2.0-py3-none-any.whl.

File metadata

Download URL: ragsync-0.2.0-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ragsync-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8dd3ca46a470b79a563472ffbbc6f212fae54291d3bc6f264c026c1011beba2b`
MD5	`ecf2369267f9400c74c079ffe4214ff4`
BLAKE2b-256	`611e4a60e3e3ea232100408333489b0396a3d12d57ab06c1d3a7fc5f830120c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragsync-0.2.0-py3-none-any.whl:

Publisher: release.yml on jsbroks/ragsync-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ragsync-0.2.0-py3-none-any.whl
- Subject digest: 8dd3ca46a470b79a563472ffbbc6f212fae54291d3bc6f264c026c1011beba2b
- Sigstore transparency entry: 1827608219
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: jsbroks/ragsync-mcp@2c1e13e7c50f7cf2352423b19b1294385cda2620
- Branch / Tag: refs/heads/main
- Owner: https://github.com/jsbroks
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2c1e13e7c50f7cf2352423b19b1294385cda2620
- Trigger Event: push

ragsync 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ragsync

Install

Run

Live config reload

Use from an MCP client (Cursor, Claude, …)

Option A — run in place with uv (no install)

Option B — install the command, then reference it

Hosted embedding keys

Configuration

Source types

Embedding providers

MCP tools

Access scoping

Development

Releasing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance