Skip to main content

Open-source MCP Server for web search, extract, crawl, academic research, and library docs with embedded SearXNG

Project description

WET - Web Extended Toolkit MCP Server

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

PyPI Docker License: MIT

Features

  • Web Search - Search via embedded SearXNG (metasearch: Google, Bing, DuckDuckGo, Brave)
  • Academic Research - Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs - Auto-discover and index documentation with FTS5 hybrid search
  • Content Extract - Extract clean content (Markdown/Text)
  • Deep Crawl - Crawl multiple pages from a root URL with depth control
  • Site Map - Discover website URL structure
  • Media - List and download images, videos, audio files
  • Anti-bot - Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Local Cache - TTL-based caching for all web operations
  • Docs Sync - Sync indexed docs across machines via rclone

Quick Start

Prerequisites

  • Python 3.13 (required -- Python 3.14+ is not supported due to SearXNG incompatibility)

Add to mcp.json

uvx (Recommended)

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"],
      "env": {
        // Optional: API keys for embedding and media analysis
        "API_KEYS": "GOOGLE_API_KEY:AIza..."
      }
    }
  }
}

Warning: You must specify --python 3.13 when using uvx. Without it, uvx may pick Python 3.14+ which causes SearXNG search to fail silently (RuntimeError: can't register atexit after shutdown in DNS resolution).

That's it! On first run:

  1. Automatically installs SearXNG from GitHub
  2. Automatically installs Playwright chromium + system dependencies
  3. Starts embedded SearXNG subprocess
  4. Runs the MCP server

Docker

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "API_KEYS",
        "n24q02m/wet-mcp:latest"
      ],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza..."
      }
    }
  }
}

With docs sync (Google Drive)

Step 1: Get a drive token (one-time, requires browser):

uvx --python 3.13 wet-mcp setup-sync drive

This downloads rclone, opens a browser for Google Drive auth, and outputs a base64-encoded token for RCLONE_CONFIG_GDRIVE_TOKEN.

Step 2: Copy the token and add it to your MCP config:

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"],
      "env": {
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        "SYNC_ENABLED": "true",
        "SYNC_REMOTE": "gdrive",
        "SYNC_INTERVAL": "300",
        "RCLONE_CONFIG_GDRIVE_TYPE": "drive",
        "RCLONE_CONFIG_GDRIVE_TOKEN": "<paste base64 token>"
      }
    }
  }
}

Both raw JSON and base64-encoded tokens are supported. Base64 is recommended — it avoids nested JSON escaping issues.

Without uvx

pip install wet-mcp
wet-mcp

Tools

Tool Actions Description
search search, research, docs Web search, academic research, library documentation
extract extract, crawl, map Content extraction, deep crawling, site mapping
media list, download, analyze Media discovery & download
help - Full documentation for any tool

Usage Examples

// search tool
{"action": "search", "query": "python web scraping", "max_results": 10}
{"action": "research", "query": "transformer attention mechanism"}
{"action": "docs", "query": "how to create routes", "library": "fastapi"}

// extract tool
{"action": "extract", "urls": ["https://example.com"]}
{"action": "crawl", "urls": ["https://docs.python.org"], "depth": 2}
{"action": "map", "urls": ["https://example.com"]}

// media tool
{"action": "list", "url": "https://github.com/python/cpython"}
{"action": "download", "media_urls": ["https://example.com/image.png"]}

Configuration

Variable Default Description
WET_AUTO_SEARXNG true Auto-start embedded SearXNG subprocess
WET_SEARXNG_PORT 8080 SearXNG port
SEARXNG_URL http://localhost:8080 External SearXNG URL (when auto disabled)
API_KEYS - LLM API keys (format: ENV_VAR:key,...)
EMBEDDING_MODEL (auto-detect) LiteLLM embedding model for docs vector search
EMBEDDING_DIMS 0 (auto=768) Embedding dimensions
WET_CACHE true Enable/disable web cache
SYNC_ENABLED false Enable rclone sync for docs DB
SYNC_REMOTE - rclone remote name (e.g., "gdrive")
SYNC_FOLDER wet-mcp Remote folder name
SYNC_INTERVAL 0 Auto-sync interval in seconds (0 = manual)
LOG_LEVEL INFO Logging level

LLM Configuration (Optional)

For media analysis and docs embedding, configure API keys:

API_KEYS=GOOGLE_API_KEY:AIza...
LLM_MODELS=gemini/gemini-3-flash-preview

The server auto-detects embedding models from configured API keys (Gemini > OpenAI > Mistral > Cohere).


Architecture

┌─────────────────────────────────────────────────────────┐
│                    MCP Client                           │
│            (Claude, Cursor, Windsurf)                   │
└─────────────────────┬───────────────────────────────────┘
                      │ MCP Protocol
                      v
┌─────────────────────────────────────────────────────────┐
│                   WET MCP Server                        │
│  ┌──────────┐  ┌──────────┐  ┌───────┐  ┌──────────┐   │
│  │  search  │  │ extract  │  │ media │  │   help   │   │
│  │ (search, │  │(extract, │  │(list, │  │          │   │
│  │ research,│  │ crawl,   │  │downld,│  │          │   │
│  │ docs)    │  │ map)     │  │analyz)│  │          │   │
│  └──┬───┬───┘  └────┬─────┘  └──┬────┘  └──────────┘   │
│     │   │           │           │                       │
│     v   v           v           v                       │
│  ┌──────┐ ┌──────┐ ┌──────────┐                         │
│  │SearX │ │DocsDB│ │ Crawl4AI │                         │
│  │NG    │ │FTS5+ │ │(Playwrgt)│                         │
│  │      │ │sqlite│ │          │                         │
│  │      │ │-vec  │ │          │                         │
│  └──────┘ └──────┘ └──────────┘                         │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │  WebCache (SQLite, TTL)  │  rclone sync (docs)   │   │
│  └──────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Build from Source

git clone https://github.com/n24q02m/wet-mcp
cd wet-mcp

# Setup (requires mise: https://mise.jdx.dev/)
mise run setup

# Run
uv run wet-mcp

Docker Build

docker build -t n24q02m/wet-mcp:latest .

Requirements: Python 3.13 (not 3.14+)


Contributing

See CONTRIBUTING.md

License

MIT - See LICENSE

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wet_mcp-2.5.0b0.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wet_mcp-2.5.0b0-py3-none-any.whl (56.1 kB view details)

Uploaded Python 3

File details

Details for the file wet_mcp-2.5.0b0.tar.gz.

File metadata

  • Download URL: wet_mcp-2.5.0b0.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.5.0b0.tar.gz
Algorithm Hash digest
SHA256 8c07aa8efd4c68d2eccfd86828f4f023b05df432b1115a6c759708717c10540b
MD5 73d2fa50f798a961fb87fa6e0eb38e36
BLAKE2b-256 43dedcc15c83a07d4c2a3557cc33f23f54a697b0b58d9db76ca518f390f08c62

See more details on using hashes here.

File details

Details for the file wet_mcp-2.5.0b0-py3-none-any.whl.

File metadata

  • Download URL: wet_mcp-2.5.0b0-py3-none-any.whl
  • Upload date:
  • Size: 56.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for wet_mcp-2.5.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ab924717731835a5a51ac924020651c55c0318358bd1078b7c5f4e8df50f418
MD5 e9016a523067acb6121ea9c8f4e7649f
BLAKE2b-256 ea4c6336f6d323335e51f94212fa677f6b888f25ea8e92382b0342570028a04d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page