Skip to main content

MCP server for vector search with Qdrant and Ollama embeddings

Project description

W3 MCP Qdrant Server

Python MCP server for vector search using Qdrant vector database and Ollama embeddings.

Status: ✅ Working with Qdrant vector search and Ollama embeddings + Advanced query techniques

Features

  • qdrant_search - Search for similar documents using text queries (auto-embedded via Ollama)
    • ✨ Query Expansion - Generate N query variations, search all, merge with RRF
    • ✨ HyDE - Hypothetical Document Embeddings for semantic enrichment
    • ✨ Reranking - Use LLM to reorder results by relevance
  • qdrant_list_collections - List and manage Qdrant collections

Supports flexible output formats (Markdown or JSON) with configurable similarity thresholds and advanced search options.

Quick Start

1. Prerequisites Setup

Qdrant Server

# Using Docker (Recommended)
docker run -p 6333:6333 qdrant/qdrant:latest

Or install locally: Qdrant Quick Start

Ollama Server

# Install: https://ollama.ai
ollama pull bge-m3
ollama pull mistral
ollama serve

Available embedding models:

  • bge-m3 (384 dims) - ⭐ recommended - best quality-speed balance
  • nomic-embed-text (768 dims) - balanced, good for general use
  • mxbai-embed-large (1024 dims) - highest quality
  • all-minilm (384 dims) - ultra-lightweight, good for mobile

2. Clean Setup (Important!)

cd /path/to/w3-mcp-server-qdrant

# Remove old lockfile and venv
rm -rf uv.lock .venv venv

# Unset old environment variable
unset VIRTUAL_ENV

3. Install Dependencies with uv

# Install all Python dependencies using uv
uv sync

That's it! uv sync installs all dependencies including MCP, pydantic, qdrant-client, and httpx.

4. Configure Environment

Create a .env file from template:

cp .env.example .env

Edit .env:

# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=  # Optional if using API key auth

# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_EMBED_MODEL=bge-m3:latest
OLLAMA_RERANK_MODEL=mistral  # For query expansion and reranking

Or export environment variables:

export QDRANT_URL=http://localhost:6333
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBED_MODEL=bge-m3:latest
export OLLAMA_RERANK_MODEL=mistral

5. Verify Installation

# Check Qdrant
curl http://localhost:6333/health

# Check Ollama
curl http://localhost:11434/api/tags

# Check Python env
uv run python -c "from mcp.server.fastmcp import FastMCP; print('✓ MCP ready')"

6. Test with MCP Inspector

# Start MCP Inspector (interactive web UI)
uv run mcp dev server.py

Opens URL like:

http://localhost:6274/?MCP_PROXY_AUTH_TOKEN=...

Features:

  • ✅ Available tools listed in sidebar
  • ✅ Test each tool interactively with JSON input
  • ✅ Real-time request/response viewing
  • ✅ Server logs and debugging
  • ✅ No extra dependencies needed

Usage

Option A: MCP Inspector (Development)

Best way to test and debug:

cd /path/to/w3-mcp-server-qdrant

# Start inspector
uv run mcp dev server.py

Opens web UI at http://localhost:5173:

  • See available tools
  • Test each tool with JSON input
  • View request/response in real-time
  • See server logs

Option B: Direct Python

# Run server (stdio mode)
uv run python server.py

Option C: Claude Code Integration

Method 1: Local Source (Development)

Edit ~/.claude/claude_config.json:

{
  "mcpServers": {
    "qdrant": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "server.py"],
      "cwd": "/path/to/w3-mcp-server-qdrant",
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBED_MODEL": "bge-m3:latest",
        "OLLAMA_RERANK_MODEL": "mistral"
      }
    }
  }
}

Advantages:

  • ✅ Run latest development version
  • ✅ Easy to modify and test changes
  • ✅ Direct access to source code

Method 2: PyPI Installation (When Published)

Install from PyPI (always fetch latest version):

uv run --with w3-mcp-server-qdrant --refresh w3-mcp-server-qdrant

Edit ~/.claude/claude_config.json:

{
  "mcpServers": {
    "qdrant": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "--with", "w3-mcp-server-qdrant", "--refresh", "w3-mcp-server-qdrant"],
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_EMBED_MODEL": "bge-m3:latest",
        "OLLAMA_RERANK_MODEL": "mistral"
      }
    }
  }
}

Advantages:

  • ✅ No need to clone repository
  • ✅ Easy version management
  • ✅ Automatic dependency isolation

Then restart Claude Code.

Tools Documentation

qdrant_search

Search for similar documents in a collection using text query (auto-embedded via Ollama).

Supports advanced search techniques: query expansion, hypothetical document embeddings (HyDE), and LLM-based reranking.

Basic Parameters

Parameter Type Default Description
collection_name string required Name of the collection to search
query_text string required Text to search for (auto-embedded via Ollama)
limit integer 5 Max results to return (1-100)
score_threshold float 0.0 Minimum similarity threshold (0.0-1.0)
fields string "" Comma-separated metadata fields to return (empty = all)
response_format string "markdown" "markdown" or "json"

Advanced Parameters - Query Expansion

Generate N query variations, search all in parallel, merge results with Reciprocal Rank Fusion:

Parameter Type Default Description
expand_query boolean false Enable query expansion
expand_query_count integer 3 Number of variations to generate (1-10)

Advanced Parameters - HyDE

Generate a hypothetical document matching the query intent, then embed it:

Parameter Type Default Description
use_hyde boolean false Enable HyDE
hyde_combine_original boolean true Also search original query + HyDE doc

Advanced Parameters - Reranking

Use LLM to reorder results by relevance to the original query:

Parameter Type Default Description
rerank boolean false Enable LLM reranking
rerank_top_n integer 10 Number of results to rerank (1-100)

Examples

Example 1: Basic search

{
  "collection_name": "docs",
  "query_text": "machine learning",
  "limit": 5
}

Example 2: Query expansion (good recall)

{
  "collection_name": "docs",
  "query_text": "machine learning",
  "expand_query": true,
  "expand_query_count": 5,
  "limit": 5
}

Example 3: HyDE (semantic understanding)

{
  "collection_name": "docs",
  "query_text": "machine learning",
  "use_hyde": true,
  "hyde_combine_original": true,
  "limit": 5
}

Example 4: Full combo (best quality, slower)

{
  "collection_name": "docs",
  "query_text": "machine learning",
  "expand_query": true,
  "expand_query_count": 3,
  "use_hyde": true,
  "rerank": true,
  "rerank_top_n": 15,
  "limit": 5
}

Output Format

Returns JSON with search metadata and ranked results:

{
  "query": "machine learning",
  "collection": "docs",
  "total": 3,
  "search_method": "rrf+hyde+expand+rerank",
  "results": [
    {
      "index": 1,
      "id": "doc_123",
      "score": 0.0273,
      "metadata": {
        "title": "Machine Learning Basics",
        "author": "Jane Doe"
      }
    }
  ]
}

Note: search_method field indicates which techniques were applied:

  • basic - simple vector search
  • rrf - multiple searches merged with Reciprocal Rank Fusion
  • rrf+hyde - RRF with HyDE
  • rrf+expand - RRF with query expansion
  • rrf+hyde+expand+rerank - all techniques combined

qdrant_list_collections

List all collections in Qdrant with metadata.

Parameters:

  • response_format (string): "markdown" or "json" (default: "markdown")

Example:

{
  "response_format": "json"
}

Output:

{
  "collections": [
    {
      "name": "tech_docs",
      "points_count": 1250,
      "vector_size": 768
    },
    {
      "name": "papers",
      "points_count": 3840,
      "vector_size": 1024
    }
  ]
}

Configuration

QDRANT_URL

Specifies the URL of your Qdrant server.

Set via:

  1. Environment variable:

    export QDRANT_URL=http://localhost:6333
    uv run python server.py
    
  2. .env file:

    QDRANT_URL=http://localhost:6333
    
  3. In claude_config.json:

    "env": {
      "QDRANT_URL": "http://localhost:6333"
    }
    

OLLAMA_BASE_URL

Specifies the URL of your Ollama server.

Default: http://localhost:11434

OLLAMA_EMBED_MODEL

Specifies which embedding model to use for embedding search queries and documents.

Default: bge-m3:latest

Recommended embedding models:

  • bge-m3 (384 dims) - ⭐ Recommended - best quality-to-speed ratio
  • nomic-embed-text (768 dims) - balanced, good for most use cases
  • all-minilm (384 dims) - fast, lightweight
  • mxbai-embed-large (1024 dims) - highest quality but slower

OLLAMA_RERANK_MODEL

Specifies which LLM model to use for advanced features (query expansion, HyDE, reranking).

Default: mistral

Recommended models:

  • mistral (7B) - ⭐ Recommended - good quality, reasonable speed
  • qwen2.5-coder (7B) - high quality but optimized for code
  • llama3.2 (3B) - smaller, faster but lower quality
  • neural-chat (7B) - good for instruction-following

Note: Only used when expand_query=true, use_hyde=true, or rerank=true

Project Structure

w3-mcp-server-qdrant/
├── server.py              # MCP server entry point
├── pyproject.toml         # Project config
├── .env.example           # Environment variables template
├── README.md              # This file
└── tests/
    └── test_mcp_server.py # Integration tests

How It Works

Architecture

MCP Client (Claude, IDE, etc.)
    ↓
MCP Server (server.py)
    ├── Ollama: text → embedding vector
    └── Qdrant: vector search

Search Flow

  1. User provides text query
  2. Ollama embeds query → embedding vector
  3. Qdrant searches for similar vectors
  4. Results returned with scores and metadata

Examples

Search documents

# Via Claude/MCP interface
qdrant_search(
    collection_name="tech_docs",
    query_text="machine learning algorithms",
    limit=5,
    score_threshold=0.6,
    response_format="markdown"
)

List collections

# Via Claude/MCP interface
qdrant_list_collections(response_format="json")

Development

Run tests using uv

uv run pytest tests/

Code formatting with uv

uv run black server.py
uv run ruff check server.py

Testing with MCP Inspector

uv run mcp dev server.py

Web UI at http://localhost:5173 shows:

  • Available tools and schemas
  • Real-time request/response
  • Server logs
  • Interactive testing

Performance Tips

Basic Search Optimization

  • Score threshold: Use score_threshold to filter low-relevance results and reduce noise
  • Result limit: Adjust limit parameter (1-100) to balance quality vs. speed
  • Embedding model: Choose based on quality vs. speed tradeoff:
    • nomic-embed-text: balanced (recommended)
    • all-minilm: fast, lightweight
    • mxbai-embed-large: higher quality but slower

Advanced Features Trade-offs

Feature Quality Speed Use Case
Basic search ⭐⭐ ⚡⚡⚡ Clear, specific queries
Query expansion ⭐⭐⭐ ⚡⚡ Ambiguous queries, high recall needed
HyDE ⭐⭐⭐ ⚡⚡ Semantic understanding important
Reranking ⭐⭐⭐⭐ Precision critical, can wait 1-2s
All combined ⭐⭐⭐⭐⭐ Best quality, time not critical

Performance Strategy

  • Fast path: Basic search with limit=5
  • Balanced: expand_query=true, expand_query_count=3
  • High quality: Add use_hyde=true
  • Maximum quality: Add rerank=true (slowest, ~5-10s)

Troubleshooting

Qdrant connection error

# Check if Qdrant is running
curl http://localhost:6333/health

# Start Qdrant with Docker
docker run -p 6333:6333 qdrant/qdrant:latest

Ollama embedding failed

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Pull embedding model
ollama pull nomic-embed-text

# Start Ollama
ollama serve

Collection not found

  • Ensure collection exists in Qdrant
  • Create collection through Qdrant UI or external tools
  • Verify collection name matches exactly

MCP module not found

# Install dependencies with uv
uv sync

Server hangs on startup

  • Check if Qdrant server is running and accessible
  • Check if Ollama server is running
  • Try: curl http://localhost:6333/health and curl http://localhost:11434/api/tags

Implemented Features

  • Query expansion with LLM-generated variations
  • HyDE (Hypothetical Document Embeddings)
  • Reciprocal Rank Fusion (RRF) for result merging
  • LLM-based result reranking
  • Parallel async embedding and search

Future Enhancements

  • Support for additional embedding models
  • Batch vector operations
  • Collection creation/deletion tools
  • Vector update and delete operations
  • Semantic search filters
  • Caching for query expansions
  • Custom RRF weights configuration

References

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

w3_mcp_server_qdrant-0.1.7.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

w3_mcp_server_qdrant-0.1.7-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file w3_mcp_server_qdrant-0.1.7.tar.gz.

File metadata

  • Download URL: w3_mcp_server_qdrant-0.1.7.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.5

File hashes

Hashes for w3_mcp_server_qdrant-0.1.7.tar.gz
Algorithm Hash digest
SHA256 5d5b976bef8b43df53976c7cece84bd9af503a612e987dd5c7431279fc5cb662
MD5 28a8aac3c3b003ef7489fc16fdce7452
BLAKE2b-256 b20249bf897517f0b36fef4183de9f2b6f23b483930d106c3f96b896ab04233f

See more details on using hashes here.

File details

Details for the file w3_mcp_server_qdrant-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for w3_mcp_server_qdrant-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 563c6e507e4db83306fc3112d078cf4a4863c051b5ee15c5568161157e7c6fa3
MD5 2c3414eeee91cfe12b0a1edc2cbf8123
BLAKE2b-256 207fe8109100c4daffdce5734502d28360a1e79684bf463f0d3aa427ac46fb9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page