Skip to main content

Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG

Project description

WET - Web Extended Toolkit MCP Server

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

PyPI Docker License: MIT

Features

  • Web Search - Search via embedded SearXNG (metasearch: Google, Bing, DuckDuckGo, Brave)
  • Academic Research - Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs - Auto-discover and index documentation with FTS5 hybrid search
  • Content Extract - Extract clean content (Markdown/Text)
  • Deep Crawl - Crawl multiple pages from a root URL with depth control
  • Site Map - Discover website URL structure
  • Media - List and download images, videos, audio files
  • Anti-bot - Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Local Cache - TTL-based caching for all web operations
  • Docs Sync - Sync indexed docs across machines via rclone

Quick Start

Prerequisites

  • Python 3.13 (required -- Python 3.14+ is not supported due to SearXNG incompatibility)

Add to mcp.json

uvx (Recommended)

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"],
      "env": {
        // Optional: API keys for embedding and media analysis
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // Optional: GitHub token for higher rate limits on library docs discovery
        "GITHUB_TOKEN": "ghp_..."
      }
    }
  }
}

Warning: You must specify --python 3.13 when using uvx. Without it, uvx may pick Python 3.14+ which causes SearXNG search to fail silently (RuntimeError: can't register atexit after shutdown in DNS resolution).

That's it! On first run:

  1. Automatically installs SearXNG from GitHub
  2. Automatically installs Playwright chromium + system dependencies
  3. Starts embedded SearXNG subprocess
  4. Runs the MCP server

Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "n24q02m/wet-mcp:latest"
      ],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        "GITHUB_TOKEN": "ghp_..."
      }
    }
  }
}

The -v wet-data:/data volume mount persists cached web pages, indexed library docs, and downloaded media across container restarts.

With docs sync (Google Drive)

Step 1: Get a drive token (one-time, requires browser):

uvx --python 3.13 wet-mcp setup-sync drive

This downloads rclone, opens a browser for Google Drive auth, and outputs a base64-encoded token for RCLONE_CONFIG_GDRIVE_TOKEN.

Step 2: Copy the token and add it to your MCP config:

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza...", // optional: enables media analysis & docs embedding
        "SYNC_ENABLED": "true",               // required for sync
        "SYNC_REMOTE": "gdrive",               // required: rclone remote name
        "SYNC_INTERVAL": "300",                // optional: auto-sync seconds (default: 0 = manual)
        // "SYNC_FOLDER": "wet-mcp",            // optional: remote folder (default: wet-mcp)
        "RCLONE_CONFIG_GDRIVE_TYPE": "drive",  // required: rclone backend type
        "RCLONE_CONFIG_GDRIVE_TOKEN": "<paste base64 token>" // required: from setup-sync
      }
    }
  }
}

Both raw JSON and base64-encoded tokens are supported. Base64 is recommended — it avoids nested JSON escaping issues.

Remote is configured via env vars — works in any environment (local, Docker, CI).

With sync in Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "SYNC_ENABLED",
        "-e", "SYNC_REMOTE",
        "-e", "SYNC_INTERVAL",              // optional: remove if manual sync only
        "-e", "RCLONE_CONFIG_GDRIVE_TYPE",
        "-e", "RCLONE_CONFIG_GDRIVE_TOKEN",
        "n24q02m/wet-mcp:latest"
      ],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza...", // optional: enables media analysis & docs embedding
        "SYNC_ENABLED": "true",               // required for sync
        "SYNC_REMOTE": "gdrive",               // required: rclone remote name
        "SYNC_INTERVAL": "300",                // optional: auto-sync seconds (default: 0 = manual)
        // "SYNC_FOLDER": "wet-mcp",            // optional: remote folder (default: wet-mcp)
        "RCLONE_CONFIG_GDRIVE_TYPE": "drive",  // required: rclone backend type
        "RCLONE_CONFIG_GDRIVE_TOKEN": "<paste base64 token>" // required: from setup-sync
      }
    }
  }
}

Without uvx

# Standard (cloud embedding via LiteLLM)
pip install wet-mcp

# With local Qwen3 ONNX embedding & reranking (no API keys needed)
pip install wet-mcp[local]

# With local GGUF embedding & reranking (GPU support via llama-cpp-python)
pip install wet-mcp[gguf]

# Full (local + all optional dependencies)
pip install wet-mcp[full]

wet-mcp

Tools

Tool Actions Description
search search, research, docs Web search, academic research, library documentation
extract extract, crawl, map Content extraction, deep crawling, site mapping
media list, download, analyze Media discovery & download
config status, set, cache_clear, docs_reindex Server configuration and cache management
help - Full documentation for any tool

Usage Examples

// search tool
{"action": "search", "query": "python web scraping", "max_results": 10}
{"action": "research", "query": "transformer attention mechanism"}
{"action": "docs", "query": "how to create routes", "library": "fastapi"}
{"action": "docs", "query": "dependency injection", "library": "spring-boot", "language": "java"}

// extract tool
{"action": "extract", "urls": ["https://example.com"]}
{"action": "crawl", "urls": ["https://docs.python.org"], "depth": 2}
{"action": "map", "urls": ["https://example.com"]}

// media tool
{"action": "list", "url": "https://github.com/python/cpython"}
{"action": "download", "media_urls": ["https://example.com/image.png"]}

Configuration

Variable Default Description
WET_AUTO_SEARXNG true Auto-start embedded SearXNG subprocess
WET_SEARXNG_PORT 41592 SearXNG port (optional)
SEARXNG_URL http://localhost:41592 External SearXNG URL (optional, when auto disabled)
SEARXNG_TIMEOUT 30 SearXNG request timeout in seconds (optional)
API_KEYS - LLM API keys (optional, format: ENV_VAR:key,...)
LLM_MODELS gemini/gemini-3-flash-preview LiteLLM model for media analysis (optional)
EMBEDDING_BACKEND (auto-detect) litellm (cloud API) or local (Qwen3 ONNX/GGUF). Auto: local > litellm > FTS5-only
EMBEDDING_MODEL (auto-detect) LiteLLM embedding model, or Qwen/Qwen3-Embedding-0.6B-GGUF for GGUF (optional)
EMBEDDING_DIMS 0 (auto=768) Embedding dimensions (optional)
RERANK_ENABLED true Enable reranking after search (auto-disabled if no backend)
RERANK_BACKEND (follows embedding) litellm or local. Defaults to match EMBEDDING_BACKEND
RERANK_MODEL (auto-detect) LiteLLM rerank model, e.g. cohere/rerank-v3.5 (optional)
RERANK_TOP_N 10 Return top N results after reranking
CACHE_DIR ~/.wet-mcp Data directory for cache DB, docs DB, downloads (optional)
DOCS_DB_PATH ~/.wet-mcp/docs.db Docs database location (optional)
DOWNLOAD_DIR ~/.wet-mcp/downloads Media download directory (optional)
TOOL_TIMEOUT 120 Tool execution timeout in seconds, 0=no timeout (optional)
WET_CACHE true Enable/disable web cache (optional)
GITHUB_TOKEN - GitHub personal access token for library discovery (optional, increases rate limit from 60 to 5000 req/hr)
SYNC_ENABLED false Enable rclone sync
SYNC_REMOTE - rclone remote name (required when sync enabled)
SYNC_FOLDER wet-mcp Remote folder name (optional)
SYNC_INTERVAL 0 Auto-sync interval in seconds, 0=manual (optional)
LOG_LEVEL INFO Logging level (optional)

LLM Configuration (Optional)

For media analysis and docs embedding, configure API keys:

API_KEYS=GOOGLE_API_KEY:AIza...
LLM_MODELS=gemini/gemini-3-flash-preview

The server auto-detects embedding models from configured API keys (Gemini > OpenAI > Mistral > Cohere).

Local Embedding & Reranking (Optional)

Run embedding and reranking entirely offline using Qwen3 ONNX models — no API keys needed:

# Install with local ONNX support
pip install wet-mcp[local]

# Or full (local + all dependencies)
pip install wet-mcp[full]

With uvx:

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp[local]@latest"]
      // No API_KEYS needed — local Qwen3-Embedding-0.6B runs on CPU
    }
  }
}

The server auto-detects qwen3-embed when installed and uses it for both embedding and reranking. Override with EMBEDDING_BACKEND=litellm to force cloud API.


Architecture

┌─────────────────────────────────────────────────────────┐
│                    MCP Client                           │
│            (Claude, Cursor, Windsurf)                   │
└─────────────────────┬───────────────────────────────────┘
                      │ MCP Protocol
                      v
┌─────────────────────────────────────────────────────────┐
│                   WET MCP Server                        │
│  ┌──────────┐  ┌──────────┐  ┌───────┐  ┌────────┐      │
│  │  search  │  │ extract  │  │ media │  │ config │      │
│  │ (search, │  │(extract, │  │(list, │  │(status,│      │
│  │ research,│  │ crawl,   │  │downld,│  │ set,   │      │
│  │ docs)    │  │ map)     │  │analyz)│  │ cache) │      │
│  └──┬───┬───┘  └────┬─────┘  └──┬────┘  └────────┘      │
│     │   │           │           │        + help tool     │
│     v   v           v           v                       │
│  ┌──────┐ ┌──────┐ ┌──────────┐ ┌──────────┐             │
│  │SearX │ │DocsDB│ │ Crawl4AI │ │ Reranker │             │
│  │NG    │ │FTS5+ │ │(Playwrgt)│ │(LiteLLM/ │             │
│  │      │ │sqlite│ │          │ │ Qwen3    │             │
│  │      │ │-vec  │ │          │ │ local)   │             │
│  └──────┘ └──────┘ └──────────┘ └──────────┘             │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │  WebCache (SQLite, TTL)  │  rclone sync (docs)   │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Build from Source

git clone https://github.com/n24q02m/wet-mcp
cd wet-mcp

# Setup (requires mise: https://mise.jdx.dev/)
mise run setup

# Run
uv run wet-mcp

Docker Build

docker build -t n24q02m/wet-mcp:latest .

Requirements: Python 3.13 (not 3.14+)


Contributing

See CONTRIBUTING.md

License

MIT - See LICENSE

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wet_mcp-2.6.0b4.tar.gz (89.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wet_mcp-2.6.0b4-py3-none-any.whl (101.1 kB view details)

Uploaded Python 3

File details

Details for the file wet_mcp-2.6.0b4.tar.gz.

File metadata

  • Download URL: wet_mcp-2.6.0b4.tar.gz
  • Upload date:
  • Size: 89.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.6.0b4.tar.gz
Algorithm Hash digest
SHA256 80f1789c667d7cf9cb48969b42aec4185b8dfad4a0b3a583346f787f0cb8c4dd
MD5 5c1f59d11bef1a50e8f6986163c0f604
BLAKE2b-256 e4aaa8726f5033789999c2578c2b33e873adf620630c10c83495f0abeb17ef00

See more details on using hashes here.

File details

Details for the file wet_mcp-2.6.0b4-py3-none-any.whl.

File metadata

  • Download URL: wet_mcp-2.6.0b4-py3-none-any.whl
  • Upload date:
  • Size: 101.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.6.0b4-py3-none-any.whl
Algorithm Hash digest
SHA256 2f604faaf59ebb2c165a3487bd4274441dbd60989cb01fe72980dc5affcf6eca
MD5 6bdf54b0733bad29d3577ac35de62142
BLAKE2b-256 3c216eb96dc049457b031212cfaee8c2ad4eaa025ba3005a74c818ca2ba5e299

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page