Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG
Project description
WET - Web Extended Toolkit MCP Server
mcp-name: io.github.n24q02m/wet-mcp
Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.
Features
- Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
- Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
- Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
- Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
- Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
- Media -- List, download, and analyze images, videos, audio files
- Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
- Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
- Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)
Quick Start
Claude Code Plugin (Recommended)
claude plugin add n24q02m/wet-mcp
MCP Server
Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify
--python 3.13when usinguvx.
On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.
Option 1: uvx
{
"mcpServers": {
"wet": {
"command": "uvx",
"args": ["--python", "3.13", "wet-mcp@latest"],
"env": {
// -- optional: cloud embedding + reranking (Jina AI recommended)
"API_KEYS": "JINA_AI_API_KEY:jina_...",
// -- or: "API_KEYS": "GOOGLE_API_KEY:AIza...,COHERE_API_KEY:co-...",
// -- without API_KEYS, uses built-in local Qwen3 ONNX models (CPU, ~570MB first download)
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: higher rate limits for docs discovery (60 -> 5000 req/hr)
"GITHUB_TOKEN": "ghp_...",
// -- optional: restrict local file conversion to specific directories
// "CONVERT_ALLOWED_DIRS": "/home/user/docs,/tmp/uploads",
// -- optional: sync indexed docs across machines via rclone
"SYNC_ENABLED": "true", // default: false
"SYNC_INTERVAL": "300" // auto-sync every 5min (0 = manual only)
}
}
}
}
Option 2: Docker
{
"mcpServers": {
"wet": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"--name", "mcp-wet",
"-v", "wet-data:/data",
"-e", "API_KEYS",
"-e", "GITHUB_TOKEN",
"-e", "SYNC_ENABLED",
"-e", "SYNC_INTERVAL",
"n24q02m/wet-mcp:latest"
],
"env": {
"API_KEYS": "JINA_AI_API_KEY:jina_...",
"GITHUB_TOKEN": "ghp_...",
"SYNC_ENABLED": "true",
"SYNC_INTERVAL": "300"
}
}
}
}
Pre-install (optional)
Use the setup MCP tool to warmup models and install dependencies:
# Via MCP tool call (recommended):
setup(action="warmup")
# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.
The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.
Sync setup
Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:
- First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
- Token saved: OAuth token is stored locally at
~/.wet-mcp/tokens/(600 permissions) - Subsequent runs: Token is loaded automatically -- no manual steps needed
For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:
{
"SYNC_ENABLED": "true",
"SYNC_PROVIDER": "dropbox",
"SYNC_REMOTE": "dropbox"
}
Tools
| Tool | Actions | Description |
|---|---|---|
search |
search, research, docs, similar |
Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar |
extract |
extract, batch, crawl, map, convert, extract_structured |
Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema) |
media |
list, download, analyze |
Media discovery, download, and analysis |
config |
status, set, cache_clear, docs_reindex |
Server configuration and cache management |
setup |
warmup, setup_sync |
Pre-download models, configure cloud sync |
help |
-- | Full documentation for any tool |
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
API_KEYS |
No | -- | LLM API keys for SDK mode (format: ENV_VAR:key,...). Enables cloud embedding + reranking |
LITELLM_PROXY_URL |
No | -- | LiteLLM Proxy URL. Enables proxy mode |
LITELLM_PROXY_KEY |
No | -- | LiteLLM Proxy virtual key |
GITHUB_TOKEN |
No | auto-detect | GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token |
EMBEDDING_BACKEND |
No | auto-detect | litellm (cloud) or local (Qwen3). Auto: API_KEYS -> litellm, else local |
EMBEDDING_MODEL |
No | auto-detect | LiteLLM embedding model name |
EMBEDDING_DIMS |
No | 0 (auto=768) |
Embedding dimensions |
RERANK_ENABLED |
No | true |
Enable reranking after search |
RERANK_BACKEND |
No | auto-detect | litellm or local. Auto: Cohere/Jina key -> litellm, else local |
RERANK_MODEL |
No | auto-detect | LiteLLM rerank model name |
RERANK_TOP_N |
No | 10 |
Return top N results after reranking |
LLM_MODELS |
No | gemini/gemini-3-flash-preview |
LiteLLM model for media analysis |
WET_AUTO_SEARXNG |
No | true |
Auto-start embedded SearXNG subprocess |
WET_SEARXNG_PORT |
No | 41592 |
SearXNG port |
SEARXNG_URL |
No | http://localhost:41592 |
External SearXNG URL (when auto disabled) |
SEARXNG_TIMEOUT |
No | 30 |
SearXNG request timeout in seconds |
CONVERT_MAX_FILE_SIZE |
No | 104857600 |
Max file size for local conversion in bytes (100MB) |
CONVERT_ALLOWED_DIRS |
No | -- | Comma-separated paths to restrict local file conversion |
CACHE_DIR |
No | ~/.wet-mcp |
Data directory for cache, docs, downloads |
DOCS_DB_PATH |
No | ~/.wet-mcp/docs.db |
Docs database location |
DOWNLOAD_DIR |
No | ~/.wet-mcp/downloads |
Media download directory |
TOOL_TIMEOUT |
No | 120 |
Tool execution timeout in seconds (0=no timeout) |
WET_CACHE |
No | true |
Enable/disable web cache |
SYNC_ENABLED |
No | false |
Enable rclone sync |
SYNC_PROVIDER |
No | drive |
rclone provider type (drive, dropbox, s3, etc.) |
SYNC_REMOTE |
No | gdrive |
rclone remote name |
SYNC_FOLDER |
No | wet-mcp |
Remote folder name |
SYNC_INTERVAL |
No | 300 |
Auto-sync interval in seconds (0=manual) |
LOG_LEVEL |
No | INFO |
Logging level |
Embedding & Reranking
Both embedding and reranking are always available -- local models are built-in and require no configuration.
- Jina AI (recommended): A single
JINA_AI_API_KEYenables both embedding and reranking - Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
- Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
- GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
- All embeddings stored at 768 dims. Switching providers never breaks the vector table
LLM Configuration (3-Mode Architecture)
| Priority | Mode | Config | Use case |
|---|---|---|---|
| 1 | Proxy | LITELLM_PROXY_URL + LITELLM_PROXY_KEY |
Production (selfhosted gateway) |
| 2 | SDK | API_KEYS |
Dev/local with direct API access |
| 3 | Local | Nothing needed | Offline, embedding/rerank only (no LLM) |
SearXNG Configuration (2-Mode)
| Mode | Config | Description |
|---|---|---|
| Embedded (default) | WET_AUTO_SEARXNG=true |
Auto-installs and manages SearXNG as subprocess |
| External | WET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:port |
Connects to pre-existing SearXNG instance |
Build from Source
git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp
Compatible With
Also by n24q02m
| Server | Description |
|---|---|
| mnemo-mcp | Persistent AI memory with hybrid search and cross-machine sync |
| better-notion-mcp | Markdown-first Notion API with 9 composite tools |
| better-email-mcp | Email (IMAP/SMTP) with multi-account and auto-discovery |
| better-godot-mcp | Godot Engine 4.x with 18 tools for scenes, scripts, and shaders |
| better-telegram-mcp | Telegram dual-mode (Bot API + MTProto) with 6 composite tools |
| better-code-review-graph | Knowledge graph for token-efficient code reviews |
Contributing
See CONTRIBUTING.md.
License
MIT -- See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wet_mcp-2.15.0b1.tar.gz.
File metadata
- Download URL: wet_mcp-2.15.0b1.tar.gz
- Upload date:
- Size: 117.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca5c633b5ecd5ec8707a1b23c4f493a8c4f7bfae33d9ac16bdac6e3d96493b7d
|
|
| MD5 |
f28f657230f6eea78f3922def103b21a
|
|
| BLAKE2b-256 |
840ece82fe25499bd6e32c3738009fe116779e5552789b0dcf2cf47f640f1d1a
|
File details
Details for the file wet_mcp-2.15.0b1-py3-none-any.whl.
File metadata
- Download URL: wet_mcp-2.15.0b1-py3-none-any.whl
- Upload date:
- Size: 132.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b663159162fe8702740f5807c06deb13aee0a2ffcb8f1728cb240ce8ca836ee7
|
|
| MD5 |
b3a01c56a965b16d124469a226ff77eb
|
|
| BLAKE2b-256 |
6694d2879e97d49b9d78031a4687a8197f7ae50a4f6d331a1b500018c1fa5b04
|