Skip to main content

Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG

Project description

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

CI codecov PyPI Docker License: MIT

Python SearXNG MCP semantic-release Renovate

WET MCP server

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List, download, and analyze images, videos, audio files
  • Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
  • Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

Via marketplace (includes skills: /fact-check, /compare):

/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@n24q02m-plugins

Configure env vars in ~/.claude/settings.local.json or shell profile. See Environment Variables.

Gemini CLI Extension

gemini extensions install https://github.com/n24q02m/wet-mcp

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.wet]
command = "uvx"
args = ["--python", "3.13", "wet-mcp"]

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "n24q02m/wet-mcp:latest"
      ]
    }
  }
}

Configure env vars in ~/.claude/settings.local.json or your shell profile. See Environment Variables below.

Tools

Tool Actions Description
search search, research, docs, similar Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar
extract extract, batch, crawl, map, convert, extract_structured Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema)
media list, download, analyze Media discovery, download, and analysis
config status, set, cache_clear, docs_reindex Server configuration and cache management
setup warmup, setup_sync Pre-download models, configure cloud sync
help -- Full documentation for any tool

MCP Prompts

Prompt Parameters Description
research_topic topic Research a topic using academic search
library_docs library, question Find library documentation

Zero-Config Setup

No environment variables needed. On first start, the server opens a setup page in your browser:

  1. Start the server (via plugin, uvx, or Docker)
  2. A setup URL appears -- open it in any browser
  3. Fill in your credentials on the guided form
  4. Credentials are encrypted and stored locally

Your credentials never leave your machine. The relay server only sees encrypted data.

For CI/automation, you can still use environment variables (see below).

Configuration

Pre-install (optional)

Use the setup MCP tool to warmup models and install dependencies:

# Via MCP tool call (recommended):
setup(action="warmup")

# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.

The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

  1. First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
  2. Token saved: OAuth token is stored locally at ~/.wet-mcp/tokens/ (600 permissions)
  3. Subsequent runs: Token is loaded automatically -- no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",
  "SYNC_REMOTE": "dropbox"
}

Environment Variables

Variable Required Default Description
API_KEYS No -- API keys for cloud providers (format: ENV_VAR:key,...). Enables cloud embedding + reranking
COHERE_API_KEY No -- Cohere API key (embedding + reranking)
JINA_AI_API_KEY No -- Jina AI API key (embedding + reranking)
GEMINI_API_KEY No -- Google Gemini API key (LLM + embedding)
OPENAI_API_KEY No -- OpenAI API key (LLM + embedding)
GITHUB_TOKEN No auto-detect GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token
EMBEDDING_BACKEND No auto-detect cloud or local (Qwen3). Auto: API_KEYS -> cloud, else local
EMBEDDING_MODEL No auto-detect Cloud embedding model name
EMBEDDING_DIMS No 0 (auto=768) Embedding dimensions
RERANK_ENABLED No true Enable reranking after search
RERANK_BACKEND No auto-detect cloud or local. Auto: Cohere/Jina key -> cloud, else local
RERANK_MODEL No auto-detect Cloud rerank model name
RERANK_TOP_N No 10 Return top N results after reranking
LLM_MODELS No gemini-3-flash-preview LLM model for media analysis (google-genai or openai)
WET_AUTO_SEARXNG No true Auto-start embedded SearXNG subprocess
WET_SEARXNG_PORT No 41592 SearXNG port
SEARXNG_URL No http://localhost:41592 External SearXNG URL (when auto disabled)
SEARXNG_TIMEOUT No 30 SearXNG request timeout in seconds
CONVERT_MAX_FILE_SIZE No 104857600 Max file size for local conversion in bytes (100MB)
CONVERT_ALLOWED_DIRS No -- Comma-separated paths to restrict local file conversion
CACHE_DIR No ~/.wet-mcp Data directory for cache, docs, downloads
DOCS_DB_PATH No ~/.wet-mcp/docs.db Docs database location
DOWNLOAD_DIR No ~/.wet-mcp/downloads Media download directory
TOOL_TIMEOUT No 120 Tool execution timeout in seconds (0=no timeout)
WET_CACHE No true Enable/disable web cache
SYNC_ENABLED No false Enable rclone sync
SYNC_PROVIDER No drive rclone provider type (drive, dropbox, s3, etc.)
SYNC_REMOTE No gdrive rclone remote name
SYNC_FOLDER No wet-mcp Remote folder name
SYNC_INTERVAL No 300 Auto-sync interval in seconds (0=manual)
LOG_LEVEL No INFO Logging level

Embedding & Reranking

Both embedding and reranking are always available -- local models are built-in and require no configuration.

  • Jina AI (recommended): A single JINA_AI_API_KEY enables both embedding and reranking
  • Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
  • Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
  • GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
  • All embeddings stored at 768 dims. Switching providers never breaks the vector table

LLM Configuration (2-Mode Architecture)

Priority Mode Config Use case
1 SDK GEMINI_API_KEY or OPENAI_API_KEY Direct API access (google-genai, openai)
2 Disabled Nothing needed Offline, embedding/rerank only (no LLM)

SearXNG Configuration (2-Mode)

Mode Config Description
Embedded (default) WET_AUTO_SEARXNG=true Auto-installs and manages SearXNG as subprocess
External WET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:port Connects to pre-existing SearXNG instance

Security

  • SSRF prevention -- URL validation on crawl targets
  • Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
  • Error sanitization -- No credentials in error messages
  • File conversion sandboxing -- Optional CONVERT_ALLOWED_DIRS restriction

Build from Source

git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

License

MIT -- See LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wet_mcp-2.19.0b1.tar.gz (117.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wet_mcp-2.19.0b1-py3-none-any.whl (130.0 kB view details)

Uploaded Python 3

File details

Details for the file wet_mcp-2.19.0b1.tar.gz.

File metadata

  • Download URL: wet_mcp-2.19.0b1.tar.gz
  • Upload date:
  • Size: 117.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.19.0b1.tar.gz
Algorithm Hash digest
SHA256 c5da71d4fee2feb41737993f7365920c6a101cd8a39c4dbb162df484c866f0a5
MD5 ebad956e13bb8271b31ecb6e30cc0523
BLAKE2b-256 dd5f63252fb7ebb40ecc35b2e2ff7c7a37b67cfc7e692859fb5b114c52fe62d6

See more details on using hashes here.

File details

Details for the file wet_mcp-2.19.0b1-py3-none-any.whl.

File metadata

  • Download URL: wet_mcp-2.19.0b1-py3-none-any.whl
  • Upload date:
  • Size: 130.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.19.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebb1adcdd84cc3c2d7ae7239eacff6fd1bdada750c98b4434ca463078a27dfb
MD5 465eb3a6322a8f8f623c5621c2cfa582
BLAKE2b-256 c0a19a8e728f793a55bac3ff714107256116bc85608d7eb0d159380c17611079

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page