Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG
Project description
WET - Web Extended Toolkit MCP Server
mcp-name: io.github.n24q02m/wet-mcp
Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.
Features
- Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
- Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
- Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
- Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
- Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
- Media -- List, download, and analyze images, videos, audio files
- Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
- Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
- Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)
Quick Start
Claude Code Plugin (Recommended)
Via marketplace (includes skills: /fact-check, /compare):
/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@n24q02m-plugins
Configure env vars in ~/.claude/settings.local.json or shell profile. See Environment Variables.
Gemini CLI Extension
gemini extensions install https://github.com/n24q02m/wet-mcp
Codex CLI
Add to ~/.codex/config.toml:
[mcp_servers.wet]
command = "uvx"
args = ["--python", "3.13", "wet-mcp"]
MCP Server
Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify
--python 3.13when usinguvx.
On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.
Option 1: uvx
{
"mcpServers": {
"wet": {
"command": "uvx",
"args": ["--python", "3.13", "wet-mcp@latest"]
}
}
}
Option 2: Docker
{
"mcpServers": {
"wet": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"--name", "mcp-wet",
"-v", "wet-data:/data",
"-e", "API_KEYS",
"-e", "GITHUB_TOKEN",
"-e", "SYNC_ENABLED",
"n24q02m/wet-mcp:latest"
]
}
}
}
Configure env vars in ~/.claude/settings.local.json or your shell profile. See Environment Variables below.
Tools
| Tool | Actions | Description |
|---|---|---|
search |
search, research, docs, similar |
Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar |
extract |
extract, batch, crawl, map, convert, extract_structured |
Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema) |
media |
list, download, analyze |
Media discovery, download, and analysis |
config |
status, set, cache_clear, docs_reindex |
Server configuration and cache management |
setup |
warmup, setup_sync |
Pre-download models, configure cloud sync |
help |
-- | Full documentation for any tool |
MCP Prompts
| Prompt | Parameters | Description |
|---|---|---|
research_topic |
topic |
Research a topic using academic search |
library_docs |
library, question |
Find library documentation |
Zero-Config Setup
No environment variables needed. On first start, the server opens a setup page in your browser:
- Start the server (via plugin,
uvx, or Docker) - A setup URL appears -- open it in any browser
- Fill in your credentials on the guided form
- Credentials are encrypted and stored locally
Your credentials never leave your machine. The relay server only sees encrypted data.
For CI/automation, you can still use environment variables (see below).
Configuration
Pre-install (optional)
Use the setup MCP tool to warmup models and install dependencies:
# Via MCP tool call (recommended):
setup(action="warmup")
# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.
The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.
Sync setup
Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:
- First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
- Token saved: OAuth token is stored locally at
~/.wet-mcp/tokens/(600 permissions) - Subsequent runs: Token is loaded automatically -- no manual steps needed
For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:
{
"SYNC_ENABLED": "true",
"SYNC_PROVIDER": "dropbox",
"SYNC_REMOTE": "dropbox"
}
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
API_KEYS |
No | -- | API keys for cloud providers (format: ENV_VAR:key,...). Enables cloud embedding + reranking |
COHERE_API_KEY |
No | -- | Cohere API key (embedding + reranking) |
JINA_AI_API_KEY |
No | -- | Jina AI API key (embedding + reranking) |
GEMINI_API_KEY |
No | -- | Google Gemini API key (LLM + embedding) |
OPENAI_API_KEY |
No | -- | OpenAI API key (LLM + embedding) |
GITHUB_TOKEN |
No | auto-detect | GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token |
EMBEDDING_BACKEND |
No | auto-detect | cloud or local (Qwen3). Auto: API_KEYS -> cloud, else local |
EMBEDDING_MODEL |
No | auto-detect | Cloud embedding model name |
EMBEDDING_DIMS |
No | 0 (auto=768) |
Embedding dimensions |
RERANK_ENABLED |
No | true |
Enable reranking after search |
RERANK_BACKEND |
No | auto-detect | cloud or local. Auto: Cohere/Jina key -> cloud, else local |
RERANK_MODEL |
No | auto-detect | Cloud rerank model name |
RERANK_TOP_N |
No | 10 |
Return top N results after reranking |
LLM_MODELS |
No | gemini-3-flash-preview |
LLM model for media analysis (google-genai or openai) |
WET_AUTO_SEARXNG |
No | true |
Auto-start embedded SearXNG subprocess |
WET_SEARXNG_PORT |
No | 41592 |
SearXNG port |
SEARXNG_URL |
No | http://localhost:41592 |
External SearXNG URL (when auto disabled) |
SEARXNG_TIMEOUT |
No | 30 |
SearXNG request timeout in seconds |
CONVERT_MAX_FILE_SIZE |
No | 104857600 |
Max file size for local conversion in bytes (100MB) |
CONVERT_ALLOWED_DIRS |
No | -- | Comma-separated paths to restrict local file conversion |
CACHE_DIR |
No | ~/.wet-mcp |
Data directory for cache, docs, downloads |
DOCS_DB_PATH |
No | ~/.wet-mcp/docs.db |
Docs database location |
DOWNLOAD_DIR |
No | ~/.wet-mcp/downloads |
Media download directory |
TOOL_TIMEOUT |
No | 120 |
Tool execution timeout in seconds (0=no timeout) |
WET_CACHE |
No | true |
Enable/disable web cache |
SYNC_ENABLED |
No | false |
Enable rclone sync |
SYNC_PROVIDER |
No | drive |
rclone provider type (drive, dropbox, s3, etc.) |
SYNC_REMOTE |
No | gdrive |
rclone remote name |
SYNC_FOLDER |
No | wet-mcp |
Remote folder name |
SYNC_INTERVAL |
No | 300 |
Auto-sync interval in seconds (0=manual) |
LOG_LEVEL |
No | INFO |
Logging level |
Embedding & Reranking
Both embedding and reranking are always available -- local models are built-in and require no configuration.
- Jina AI (recommended): A single
JINA_AI_API_KEYenables both embedding and reranking - Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
- Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
- GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
- All embeddings stored at 768 dims. Switching providers never breaks the vector table
LLM Configuration (2-Mode Architecture)
| Priority | Mode | Config | Use case |
|---|---|---|---|
| 1 | SDK | GEMINI_API_KEY or OPENAI_API_KEY |
Direct API access (google-genai, openai) |
| 2 | Disabled | Nothing needed | Offline, embedding/rerank only (no LLM) |
SearXNG Configuration (2-Mode)
| Mode | Config | Description |
|---|---|---|
| Embedded (default) | WET_AUTO_SEARXNG=true |
Auto-installs and manages SearXNG as subprocess |
| External | WET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:port |
Connects to pre-existing SearXNG instance |
Security
- SSRF prevention -- URL validation on crawl targets
- Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
- Error sanitization -- No credentials in error messages
- File conversion sandboxing -- Optional
CONVERT_ALLOWED_DIRSrestriction
Build from Source
git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp
License
MIT -- See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wet_mcp-2.18.0b1.tar.gz.
File metadata
- Download URL: wet_mcp-2.18.0b1.tar.gz
- Upload date:
- Size: 117.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
feafbcc5b7759de919a9f9a95b19b6a8edd1ed4adbc2065e34c7e97a2392893f
|
|
| MD5 |
451e42ff6485edfc3583c6a609eee35b
|
|
| BLAKE2b-256 |
1f4d6bae22861ce0a2f7529e3fdf47375640f82c7170098fb677813d4b9521f3
|
File details
Details for the file wet_mcp-2.18.0b1-py3-none-any.whl.
File metadata
- Download URL: wet_mcp-2.18.0b1-py3-none-any.whl
- Upload date:
- Size: 129.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1db6756d58b1ed3fac32890a67d6da2ca3f97ad3ff3d3c51298e4e0d160622f7
|
|
| MD5 |
f8b4a00f85776377a2225e38026efb66
|
|
| BLAKE2b-256 |
b3a7755b6e89c68a6bf2f57e10ed864b438301e7faa5a99beac82d32d64c93f4
|