Skip to main content

Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG

Project description

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

CI codecov PyPI Docker License: MIT

Python SearXNG MCP semantic-release Renovate

WET MCP server

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List, download, and analyze images, videos, audio files
  • Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
  • Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

Via marketplace (includes skills: /fact-check, /compare):

/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@claude-plugins

Or install this plugin only:

/plugin marketplace add n24q02m/wet-mcp
/plugin install wet-mcp

Configure env vars in ~/.claude/settings.local.json or shell profile. See Environment Variables.

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}
Other MCP clients (Cursor, Codex, Gemini CLI)
// Cursor (~/.cursor/mcp.json), Windsurf, Cline, Amp, OpenCode
{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}
# Codex (~/.codex/config.toml)
[mcp_servers.wet]
command = "uvx"
args = ["--python", "3.13", "wet-mcp@latest"]

Option 2: Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "n24q02m/wet-mcp:latest"
      ]
    }
  }
}

Configure env vars in ~/.claude/settings.local.json or your shell profile. See Environment Variables below.

Pre-install (optional)

Use the setup MCP tool to warmup models and install dependencies:

# Via MCP tool call (recommended):
setup(action="warmup")

# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.

The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

  1. First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
  2. Token saved: OAuth token is stored locally at ~/.wet-mcp/tokens/ (600 permissions)
  3. Subsequent runs: Token is loaded automatically -- no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",
  "SYNC_REMOTE": "dropbox"
}

Tools

Tool Actions Description
search search, research, docs, similar Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar
extract extract, batch, crawl, map, convert, extract_structured Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema)
media list, download, analyze Media discovery, download, and analysis
config status, set, cache_clear, docs_reindex Server configuration and cache management
setup warmup, setup_sync Pre-download models, configure cloud sync
help -- Full documentation for any tool

MCP Prompts

Prompt Parameters Description
research_topic topic Research a topic using academic search
library_docs library, question Find library documentation

Configuration

Variable Required Default Description
API_KEYS No -- LLM API keys for SDK mode (format: ENV_VAR:key,...). Enables cloud embedding + reranking
LITELLM_PROXY_URL No -- LiteLLM Proxy URL. Enables proxy mode
LITELLM_PROXY_KEY No -- LiteLLM Proxy virtual key
GITHUB_TOKEN No auto-detect GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token
EMBEDDING_BACKEND No auto-detect litellm (cloud) or local (Qwen3). Auto: API_KEYS -> litellm, else local
EMBEDDING_MODEL No auto-detect LiteLLM embedding model name
EMBEDDING_DIMS No 0 (auto=768) Embedding dimensions
RERANK_ENABLED No true Enable reranking after search
RERANK_BACKEND No auto-detect litellm or local. Auto: Cohere/Jina key -> litellm, else local
RERANK_MODEL No auto-detect LiteLLM rerank model name
RERANK_TOP_N No 10 Return top N results after reranking
LLM_MODELS No gemini/gemini-3-flash-preview LiteLLM model for media analysis
WET_AUTO_SEARXNG No true Auto-start embedded SearXNG subprocess
WET_SEARXNG_PORT No 41592 SearXNG port
SEARXNG_URL No http://localhost:41592 External SearXNG URL (when auto disabled)
SEARXNG_TIMEOUT No 30 SearXNG request timeout in seconds
CONVERT_MAX_FILE_SIZE No 104857600 Max file size for local conversion in bytes (100MB)
CONVERT_ALLOWED_DIRS No -- Comma-separated paths to restrict local file conversion
CACHE_DIR No ~/.wet-mcp Data directory for cache, docs, downloads
DOCS_DB_PATH No ~/.wet-mcp/docs.db Docs database location
DOWNLOAD_DIR No ~/.wet-mcp/downloads Media download directory
TOOL_TIMEOUT No 120 Tool execution timeout in seconds (0=no timeout)
WET_CACHE No true Enable/disable web cache
SYNC_ENABLED No false Enable rclone sync
SYNC_PROVIDER No drive rclone provider type (drive, dropbox, s3, etc.)
SYNC_REMOTE No gdrive rclone remote name
SYNC_FOLDER No wet-mcp Remote folder name
SYNC_INTERVAL No 300 Auto-sync interval in seconds (0=manual)
LOG_LEVEL No INFO Logging level

Embedding & Reranking

Both embedding and reranking are always available -- local models are built-in and require no configuration.

  • Jina AI (recommended): A single JINA_AI_API_KEY enables both embedding and reranking
  • Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
  • Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
  • GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
  • All embeddings stored at 768 dims. Switching providers never breaks the vector table

LLM Configuration (3-Mode Architecture)

Priority Mode Config Use case
1 Proxy LITELLM_PROXY_URL + LITELLM_PROXY_KEY Production (selfhosted gateway)
2 SDK API_KEYS Dev/local with direct API access
3 Local Nothing needed Offline, embedding/rerank only (no LLM)

SearXNG Configuration (2-Mode)

Mode Config Description
Embedded (default) WET_AUTO_SEARXNG=true Auto-installs and manages SearXNG as subprocess
External WET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:port Connects to pre-existing SearXNG instance

Security

  • SSRF prevention -- URL validation on crawl targets
  • Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
  • Error sanitization -- No credentials in error messages
  • File conversion sandboxing -- Optional CONVERT_ALLOWED_DIRS restriction

Build from Source

git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

Compatible With

Claude Code Claude Desktop Cursor VS Code Copilot Antigravity Gemini CLI OpenAI Codex OpenCode

Also by n24q02m

Server Description
mnemo-mcp Persistent AI memory with hybrid search and cross-machine sync
better-notion-mcp Markdown-first Notion API with 9 composite tools
better-email-mcp Email (IMAP/SMTP) with multi-account and auto-discovery
better-godot-mcp Godot Engine 4.x with 18 tools for scenes, scripts, and shaders
better-telegram-mcp Telegram dual-mode (Bot API + MTProto) with 6 composite tools
better-code-review-graph Knowledge graph for token-efficient code reviews

Contributing

See CONTRIBUTING.md.

License

MIT -- See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wet_mcp-2.2.0b1.tar.gz (120.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wet_mcp-2.2.0b1-py3-none-any.whl (135.3 kB view details)

Uploaded Python 3

File details

Details for the file wet_mcp-2.2.0b1.tar.gz.

File metadata

  • Download URL: wet_mcp-2.2.0b1.tar.gz
  • Upload date:
  • Size: 120.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.2.0b1.tar.gz
Algorithm Hash digest
SHA256 60d695fdff3cbba67467d7c18f540d0cd454abfc91425ed15fbe77bce14c8e0c
MD5 084c6b53d0201d2152226c25dfd8973b
BLAKE2b-256 85a9fba73d3b8a8b51255c2928a6fbfd803b467b18988c440169adb520a27ceb

See more details on using hashes here.

File details

Details for the file wet_mcp-2.2.0b1-py3-none-any.whl.

File metadata

  • Download URL: wet_mcp-2.2.0b1-py3-none-any.whl
  • Upload date:
  • Size: 135.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.2.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 ff86c423a68edb6c8f09bb9d285b77fd240273b7bd1b50f7c29ebcbb4c6b3191
MD5 7c71031e4d217900695999262bc3ea02
BLAKE2b-256 92c989c6acf078b757f4b8c822135eb26028eab3a67ba82bf95e88b3db4dbb02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page