Skip to main content

Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG

Project description

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

CI codecov PyPI Docker License: MIT

Python SearXNG MCP semantic-release Renovate

WET MCP server

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List, download, and analyze images, videos, audio files
  • Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
  • Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

Via marketplace (includes skills: /fact-check, /compare):

/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@n24q02m-plugins

Configure env vars in ~/.claude/settings.local.json or shell profile. See Environment Variables.

Gemini CLI Extension

gemini extensions install https://github.com/n24q02m/wet-mcp

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.wet]
command = "uvx"
args = ["--python", "3.13", "wet-mcp"]

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "n24q02m/wet-mcp:latest"
      ]
    }
  }
}

Configure env vars in ~/.claude/settings.local.json or your shell profile. See Environment Variables below.

Tools

Tool Actions Description
search search, research, docs, similar Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar
extract extract, batch, crawl, map, convert, extract_structured Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema)
media list, download, analyze Media discovery, download, and analysis
config status, set, cache_clear, docs_reindex Server configuration and cache management
setup warmup, setup_sync Pre-download models, configure cloud sync
help -- Full documentation for any tool

MCP Prompts

Prompt Parameters Description
research_topic topic Research a topic using academic search
library_docs library, question Find library documentation

Zero-Config Setup

No environment variables needed. On first start, the server opens a setup page in your browser:

  1. Start the server (via plugin, uvx, or Docker)
  2. A setup URL appears -- open it in any browser
  3. Fill in your credentials on the guided form
  4. Credentials are encrypted and stored locally

Your credentials never leave your machine. The relay server only sees encrypted data.

For CI/automation, you can still use environment variables (see below).

Configuration

Pre-install (optional)

Use the setup MCP tool to warmup models and install dependencies:

# Via MCP tool call (recommended):
setup(action="warmup")

# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.

The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

  1. First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
  2. Token saved: OAuth token is stored locally at ~/.wet-mcp/tokens/ (600 permissions)
  3. Subsequent runs: Token is loaded automatically -- no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",
  "SYNC_REMOTE": "dropbox"
}

Environment Variables

Variable Required Default Description
API_KEYS No -- API keys for cloud providers (format: ENV_VAR:key,...). Enables cloud embedding + reranking
COHERE_API_KEY No -- Cohere API key (embedding + reranking)
JINA_AI_API_KEY No -- Jina AI API key (embedding + reranking)
GEMINI_API_KEY No -- Google Gemini API key (LLM + embedding)
OPENAI_API_KEY No -- OpenAI API key (LLM + embedding)
GITHUB_TOKEN No auto-detect GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token
EMBEDDING_BACKEND No auto-detect cloud or local (Qwen3). Auto: API_KEYS -> cloud, else local
EMBEDDING_MODEL No auto-detect Cloud embedding model name
EMBEDDING_DIMS No 0 (auto=768) Embedding dimensions
RERANK_ENABLED No true Enable reranking after search
RERANK_BACKEND No auto-detect cloud or local. Auto: Cohere/Jina key -> cloud, else local
RERANK_MODEL No auto-detect Cloud rerank model name
RERANK_TOP_N No 10 Return top N results after reranking
LLM_MODELS No gemini-3-flash-preview LLM model for media analysis (google-genai or openai)
WET_AUTO_SEARXNG No true Auto-start embedded SearXNG subprocess
WET_SEARXNG_PORT No 41592 SearXNG port
SEARXNG_URL No http://localhost:41592 External SearXNG URL (when auto disabled)
SEARXNG_TIMEOUT No 30 SearXNG request timeout in seconds
CONVERT_MAX_FILE_SIZE No 104857600 Max file size for local conversion in bytes (100MB)
CONVERT_ALLOWED_DIRS No -- Comma-separated paths to restrict local file conversion
CACHE_DIR No ~/.wet-mcp Data directory for cache, docs, downloads
DOCS_DB_PATH No ~/.wet-mcp/docs.db Docs database location
DOWNLOAD_DIR No ~/.wet-mcp/downloads Media download directory
TOOL_TIMEOUT No 120 Tool execution timeout in seconds (0=no timeout)
WET_CACHE No true Enable/disable web cache
SYNC_ENABLED No false Enable rclone sync
SYNC_PROVIDER No drive rclone provider type (drive, dropbox, s3, etc.)
SYNC_REMOTE No gdrive rclone remote name
SYNC_FOLDER No wet-mcp Remote folder name
SYNC_INTERVAL No 300 Auto-sync interval in seconds (0=manual)
LOG_LEVEL No INFO Logging level

Embedding & Reranking

Both embedding and reranking are always available -- local models are built-in and require no configuration.

  • Jina AI (recommended): A single JINA_AI_API_KEY enables both embedding and reranking
  • Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
  • Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
  • GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
  • All embeddings stored at 768 dims. Switching providers never breaks the vector table

LLM Configuration (2-Mode Architecture)

Priority Mode Config Use case
1 SDK GEMINI_API_KEY or OPENAI_API_KEY Direct API access (google-genai, openai)
2 Disabled Nothing needed Offline, embedding/rerank only (no LLM)

SearXNG Configuration (2-Mode)

Mode Config Description
Embedded (default) WET_AUTO_SEARXNG=true Auto-installs and manages SearXNG as subprocess
External WET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:port Connects to pre-existing SearXNG instance

Security

  • SSRF prevention -- URL validation on crawl targets
  • Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
  • Error sanitization -- No credentials in error messages
  • File conversion sandboxing -- Optional CONVERT_ALLOWED_DIRS restriction

Build from Source

git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

License

MIT -- See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wet_mcp-2.18.0.tar.gz (116.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wet_mcp-2.18.0-py3-none-any.whl (128.9 kB view details)

Uploaded Python 3

File details

Details for the file wet_mcp-2.18.0.tar.gz.

File metadata

  • Download URL: wet_mcp-2.18.0.tar.gz
  • Upload date:
  • Size: 116.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.18.0.tar.gz
Algorithm Hash digest
SHA256 bd41454bc78047722c3c987c3c3624da288b8a15f2e8a53b6ff9e5b12834538d
MD5 066c58b0afd96798493dbd0de8244ff3
BLAKE2b-256 d7afc53907a28b7b618bd5331f5588283b98357ddae9aa38d079aae8121a4244

See more details on using hashes here.

File details

Details for the file wet_mcp-2.18.0-py3-none-any.whl.

File metadata

  • Download URL: wet_mcp-2.18.0-py3-none-any.whl
  • Upload date:
  • Size: 128.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.18.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10023abd3ad0ee13fa50fdbf6a21d7fdef6b92a3f4572544946780ffe385e2a7
MD5 b6b252a4b27e973b74e7e832715b532c
BLAKE2b-256 ccb73eb20eb448e519d096fb210114ca3b8b181caec8b085df2b6a6216f85796

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page