Skip to main content

Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG

Project description

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

CI codecov PyPI Docker License: MIT

Python SearXNG MCP semantic-release Renovate

WET MCP server

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List, download, and analyze images, videos, audio files
  • Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
  • Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

claude plugin add n24q02m/wet-mcp

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"],
      "env": {
        // -- optional: cloud embedding + reranking (Jina AI recommended)
        "API_KEYS": "JINA_AI_API_KEY:jina_...",
        // -- or: "API_KEYS": "GOOGLE_API_KEY:AIza...,COHERE_API_KEY:co-...",
        // -- without API_KEYS, uses built-in local Qwen3 ONNX models (CPU, ~570MB first download)
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: higher rate limits for docs discovery (60 -> 5000 req/hr)
        "GITHUB_TOKEN": "ghp_...",
        // -- optional: restrict local file conversion to specific directories
        // "CONVERT_ALLOWED_DIRS": "/home/user/docs,/tmp/uploads",
        // -- optional: sync indexed docs across machines via rclone
        "SYNC_ENABLED": "true",                    // default: false
        "SYNC_INTERVAL": "300"                     // auto-sync every 5min (0 = manual only)
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "-e", "SYNC_INTERVAL",
        "n24q02m/wet-mcp:latest"
      ],
      "env": {
        "API_KEYS": "JINA_AI_API_KEY:jina_...",
        "GITHUB_TOKEN": "ghp_...",
        "SYNC_ENABLED": "true",
        "SYNC_INTERVAL": "300"
      }
    }
  }
}

Pre-install (optional)

Use the setup MCP tool to warmup models and install dependencies:

# Via MCP tool call (recommended):
setup(action="warmup")

# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.

The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

  1. First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
  2. Token saved: OAuth token is stored locally at ~/.wet-mcp/tokens/ (600 permissions)
  3. Subsequent runs: Token is loaded automatically -- no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",
  "SYNC_REMOTE": "dropbox"
}

Tools

Tool Actions Description
search search, research, docs, similar Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar
extract extract, batch, crawl, map, convert, extract_structured Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema)
media list, download, analyze Media discovery, download, and analysis
config status, set, cache_clear, docs_reindex Server configuration and cache management
setup warmup, setup_sync Pre-download models, configure cloud sync
help -- Full documentation for any tool

Configuration

Variable Required Default Description
API_KEYS No -- LLM API keys for SDK mode (format: ENV_VAR:key,...). Enables cloud embedding + reranking
LITELLM_PROXY_URL No -- LiteLLM Proxy URL. Enables proxy mode
LITELLM_PROXY_KEY No -- LiteLLM Proxy virtual key
GITHUB_TOKEN No auto-detect GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token
EMBEDDING_BACKEND No auto-detect litellm (cloud) or local (Qwen3). Auto: API_KEYS -> litellm, else local
EMBEDDING_MODEL No auto-detect LiteLLM embedding model name
EMBEDDING_DIMS No 0 (auto=768) Embedding dimensions
RERANK_ENABLED No true Enable reranking after search
RERANK_BACKEND No auto-detect litellm or local. Auto: Cohere/Jina key -> litellm, else local
RERANK_MODEL No auto-detect LiteLLM rerank model name
RERANK_TOP_N No 10 Return top N results after reranking
LLM_MODELS No gemini/gemini-3-flash-preview LiteLLM model for media analysis
WET_AUTO_SEARXNG No true Auto-start embedded SearXNG subprocess
WET_SEARXNG_PORT No 41592 SearXNG port
SEARXNG_URL No http://localhost:41592 External SearXNG URL (when auto disabled)
SEARXNG_TIMEOUT No 30 SearXNG request timeout in seconds
CONVERT_MAX_FILE_SIZE No 104857600 Max file size for local conversion in bytes (100MB)
CONVERT_ALLOWED_DIRS No -- Comma-separated paths to restrict local file conversion
CACHE_DIR No ~/.wet-mcp Data directory for cache, docs, downloads
DOCS_DB_PATH No ~/.wet-mcp/docs.db Docs database location
DOWNLOAD_DIR No ~/.wet-mcp/downloads Media download directory
TOOL_TIMEOUT No 120 Tool execution timeout in seconds (0=no timeout)
WET_CACHE No true Enable/disable web cache
SYNC_ENABLED No false Enable rclone sync
SYNC_PROVIDER No drive rclone provider type (drive, dropbox, s3, etc.)
SYNC_REMOTE No gdrive rclone remote name
SYNC_FOLDER No wet-mcp Remote folder name
SYNC_INTERVAL No 300 Auto-sync interval in seconds (0=manual)
LOG_LEVEL No INFO Logging level

Embedding & Reranking

Both embedding and reranking are always available -- local models are built-in and require no configuration.

  • Jina AI (recommended): A single JINA_AI_API_KEY enables both embedding and reranking
  • Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
  • Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
  • GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
  • All embeddings stored at 768 dims. Switching providers never breaks the vector table

LLM Configuration (3-Mode Architecture)

Priority Mode Config Use case
1 Proxy LITELLM_PROXY_URL + LITELLM_PROXY_KEY Production (selfhosted gateway)
2 SDK API_KEYS Dev/local with direct API access
3 Local Nothing needed Offline, embedding/rerank only (no LLM)

SearXNG Configuration (2-Mode)

Mode Config Description
Embedded (default) WET_AUTO_SEARXNG=true Auto-installs and manages SearXNG as subprocess
External WET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:port Connects to pre-existing SearXNG instance

Build from Source

git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

Compatible With

Claude Code Claude Desktop Cursor VS Code Copilot Antigravity Gemini CLI OpenAI Codex OpenCode

Also by n24q02m

Server Description
mnemo-mcp Persistent AI memory with hybrid search and cross-machine sync
better-notion-mcp Markdown-first Notion API with 9 composite tools
better-email-mcp Email (IMAP/SMTP) with multi-account and auto-discovery
better-godot-mcp Godot Engine 4.x with 18 tools for scenes, scripts, and shaders
better-telegram-mcp Telegram dual-mode (Bot API + MTProto) with 6 composite tools
better-code-review-graph Knowledge graph for token-efficient code reviews

Contributing

See CONTRIBUTING.md.

License

MIT -- See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wet_mcp-2.15.0.tar.gz (117.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wet_mcp-2.15.0-py3-none-any.whl (132.3 kB view details)

Uploaded Python 3

File details

Details for the file wet_mcp-2.15.0.tar.gz.

File metadata

  • Download URL: wet_mcp-2.15.0.tar.gz
  • Upload date:
  • Size: 117.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.15.0.tar.gz
Algorithm Hash digest
SHA256 06d5c839d3d4daabce462a44d5d4d24fc934f2fc4dc15633896b6f846ab581bd
MD5 4c95162e9fc924e9df4df7b2a6778fb7
BLAKE2b-256 f288bcfb19fb91e911d100892dc909581b892c4616149ef12e5141805d6a58dd

See more details on using hashes here.

File details

Details for the file wet_mcp-2.15.0-py3-none-any.whl.

File metadata

  • Download URL: wet_mcp-2.15.0-py3-none-any.whl
  • Upload date:
  • Size: 132.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd87dc647f36d4498a8e5971496026b83d3a847c7c1d8171c9b689d1d0b14007
MD5 bc093632e976e712c259a854cac5a1aa
BLAKE2b-256 bcf5b5e9ba3b79dd71c4fc6710e6787a2879f781a1c902ce327c64e3658de8d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page