Skip to main content

MCP server for AI agents: high-fidelity web search (Exa) + tiered web fetch (Exa → optional local browser → Firecrawl) with an SSRF guard. A drop-in replacement for built-in WebSearch/WebFetch.

Project description

web-retrieval-mcp — MCP web search & web fetch for AI agents (Exa + Firecrawl)

web-retrieval-mcp is an open-source Model Context Protocol (MCP) server that gives AI agents two web tools — neural web search (Exa) and a tiered web fetch (Exa → optional local browser → Firecrawl) — as a drop-in replacement for built-in WebSearch/WebFetch. It preserves per-source provenance, guards against SSRF, runs cross-platform (macOS/Linux/Windows), and works with Claude Code, Claude Desktop, Cursor, and any MCP client. Runs on free API tiers.

License: MIT Python 3.10+ MCP Cross-platform


Why replace the built-in web tools?

An agent's stock WebSearch / WebFetch tend to flatten many sources into one blurry summary, drop provenance, and silently fail on JavaScript-heavy or anti-bot pages. This server fixes that:

Built-in web tools web-retrieval-mcp
Search results One merged summary, sources conflated One block per result — each keeps its own title, URL, highlights, and text, plus a Sources trailer
Fetch reliability Single attempt, gives up on hard pages Tiered fallback: Exa contents → optional local browser → Firecrawl, with a [served by: …] provenance header
JS / anti-bot pages Usually fails Opt-in real headless browser (camoufox) on demand
Safety SSRF guard rejects loopback / private / link-local / multicast hosts before any request
Cost Bundled / metered by your model vendor Free on Exa + Firecrawl free tiers (see below)

Runs on free API tiers — and the free tiers are more than enough

Both providers have a genuinely usable free, no-credit-card tier, and because fetches hit Exa first (Firecrawl is only the fallback), a single developer or agent rarely touches the Firecrawl quota at all:

Provider Free tier (verified 2026) Role in this server
Exa 1,000 requests / month, no card Powers web_search and the first web_fetch tier
Firecrawl 1,000 pages / month, no card Fallback fetch tier only — rarely reached
camoufox (local browser) Unlimited & free — runs on your machine Opt-in render="always" tier for JS/anti-bot pages

For a personal agent that's ~33 searches and 33 hard-page fetches every day, indefinitely, for $0/month. Heavy production workloads can upgrade either provider independently — the tiering and code don't change.

Features

  • 🔎 web_search — neural / keyword / auto search via Exa, one provenance-preserving block per result.
  • 🌐 web_fetch — single-URL readable content through a resilient tier chain with provenance headers.
  • 🧱 Tiered fallback — Exa contents → (opt-in) local camoufox browser → Firecrawl, so hard pages still resolve.
  • 🛡️ SSRF guard — non-public hosts (loopback, RFC-1918, link-local, multicast, NAT64) are refused up front.
  • 🔑 Cross-platform secrets — env vars, a key file, the keyring library, or an OS secret tool. No keys on the command line.
  • 🚫 Hook to disable the built-ins — bundled PreToolUse hook + one-command installer so agents must use these tools.
  • 📦 One-command installuvx, pipx, or pip; ships two console scripts.

Quickstart

Install straight from GitHub (works today — see Publishing for the PyPI status):

# Run with no install — uvx fetches and runs it on demand:
uvx --from git+https://github.com/VelvetSP/web-retrieval-mcp web-retrieval-mcp

# Or install the CLI (isolated, recommended):
pipx install git+https://github.com/VelvetSP/web-retrieval-mcp

# Once published to PyPI this shortens to:  pipx install web-retrieval-mcp

# Optional extras (append to the pipx/pip target):
pipx install "git+https://github.com/VelvetSP/web-retrieval-mcp#egg=web-retrieval-mcp[render]"   # local browser tier
pip  install "web-retrieval-mcp[keyring]"   # cross-platform native secret store (once on PyPI)
python -m camoufox fetch                    # one-time browser download (only if you use [render])

Get free API keys: Exahttps://exa.ai · Firecrawlhttps://firecrawl.dev — then:

export EXA_API_KEY="exa-..."
export FIRECRAWL_API_KEY="fc-..."

Register with Claude Code

# After `pipx install …` above puts `web-retrieval-mcp` on your PATH:
claude mcp add web-retrieval -- web-retrieval-mcp

# Or with no prior install, straight from GitHub via uvx:
claude mcp add web-retrieval -- uvx --from git+https://github.com/VelvetSP/web-retrieval-mcp web-retrieval-mcp

Register with Claude Desktop / any MCP client

{
  "mcpServers": {
    "web-retrieval": {
      "command": "web-retrieval-mcp",
      "env": {
        "EXA_API_KEY": "exa-...",
        "FIRECRAWL_API_KEY": "fc-..."
      }
    }
  }
}

command above assumes web-retrieval-mcp is on PATH (after pipx install). Otherwise set command to uvx with args: ["--from", "git+https://github.com/VelvetSP/web-retrieval-mcp", "web-retrieval-mcp"].

Tools

Tool Signature What it returns
web_search web_search(query, num_results=8, mode="auto") Neural web search via Exa. One block per result — each with its own title, URL, published date, highlights, and text — plus a Sources list. modeauto | neural | keyword.
web_fetch web_fetch(url, render="auto", max_chars=20000, max_age_hours=None) One URL's readable content through the tier chain, with a [served by: …] provenance header.

web_fetch details

Fetch one URL's readable content through the tier chain, returned with a [served by: …] header.

render="auto"   (default) →  Exa /contents  →  Firecrawl                 # no local browser
render="never"            →  Exa /contents  →  Firecrawl                 # same, explicit
render="always"           →  camoufox (local browser)  →  Firecrawl      # for JS / anti-bot pages

max_age_hours controls Exa's freshness window (0 = force fresh; None = Exa default cache).

Cross-platform API keys

Keys are resolved in-process (never on the command line, which is visible via ps), cheapest/safest source first — the same code path on macOS, Linux, and Windows:

  1. Environment variablesEXA_API_KEY, FIRECRAWL_API_KEY. Universal; required for headless / CI.
  2. Key file — a dotenv-style KEY=value file at $WEB_RETRIEVAL_MCP_ENV_FILE or <config-dir>/keys.env (~/.config/web-retrieval-mcp/ on Linux/macOS, %APPDATA%\web-retrieval-mcp\ on Windows).
  3. keyring library — native store on every OS: macOS Keychain, Windows Credential Locker, Linux Secret Service / KWallet. Install the [keyring] extra, then store under service web-retrieval-mcp:
    keyring set web-retrieval-mcp EXA_API_KEY
    keyring set web-retrieval-mcp FIRECRAWL_API_KEY
    
  4. OS-native secret CLI — macOS security, Linux secret-tool (libsecret), if present.

An unexpanded ${...} config literal is treated as absent.

Block the built-in web tools (Claude Code)

So agents and subagents can't silently fall back to the lower-fidelity built-ins, this repo ships a PreToolUse hook that denies WebSearch / WebFetch and points the agent here. Install it idempotently:

web-retrieval-mcp-install              # patch ~/.claude/settings.json (backs it up first)
web-retrieval-mcp-install --print      # preview only, write nothing
web-retrieval-mcp-install --register-mcp   # also run `claude mcp add`
web-retrieval-mcp-install --uninstall  # remove the hook

Break-glass: touch ~/.claude/.web-builtins-allow re-enables the built-ins for the session; remove the file to re-arm. The hook is pure POSIX sh (no jq).

Security — SSRF

web_fetch validates every URL before any request: non-http(s) schemes and any host resolving to a non-public IP (loopback, private/RFC-1918, link-local, reserved/NAT64, multicast) are refused. The only tier that runs a real browser on your machine (camoufox) is opt-in (render="always"), so the default path never exposes it. Residual: the camoufox tier follows redirects, so the up-front check covers the initial URL only — full closure would need a validating forward proxy. The default auto/never path never runs the browser.

FAQ

What is web-retrieval-mcp? An open-source MCP (Model Context Protocol) server that gives AI agents two web tools — web_search (Exa) and web_fetch (Exa → local browser → Firecrawl) — as a drop-in replacement for built-in web access, with provenance preservation and an SSRF guard.

Does it work with Claude Code? Yes. Register with claude mcp add web-retrieval -- web-retrieval-mcp, and optionally install the bundled hook so the built-in WebSearch/WebFetch are disabled in favor of these tools.

Is it free? Yes. The code is MIT-licensed, and it runs on the free tiers of Exa (1,000 requests/month) and Firecrawl (1,000 pages/month), neither of which requires a credit card. The local browser tier is free and unlimited.

Which platforms are supported? macOS, Linux, and Windows. Key resolution and the server are cross-platform; the local browser tier needs the optional [render] extra.

How is it better than built-in WebSearch/WebFetch? It returns one result block per source (no conflated summaries), preserves provenance, falls back across multiple fetch backends so hard/JS pages still resolve, and guards against SSRF.

Do I need the browser stack? No. Search and the default fetch path need only mcp + anyio. The camoufox/playwright browser is the optional [render] extra, used only for render="always".

Publishing

Status: distributed from GitHub today; not yet on PyPI, so pip install web-retrieval-mcp / uvx web-retrieval-mcp (the short forms) don't resolve yet — use the git-install commands in Quickstart.

To publish and make the package discoverable to agents, do these in order (see PUBLISHING.md for the full runbook):

  1. PyPIpython -m build then uv publish (or twine upload dist/*). Unlocks the short pip install web-retrieval-mcp / uvx web-retrieval-mcp.
  2. Official MCP Registry (registry.modelcontextprotocol.io) — the one high-leverage listing; aggregators (PulseMCP, Glama, mcp.so, Smithery) ingest from it. Publish server.json with mcp-publisher (GitHub OAuth, namespace io.github.velvetsp/...). PyPI-gated.
  3. Tag a releasegit tag v0.1.0 && git push --tags, then cut a GitHub Release (release pages are indexed by Google and add a freshness signal).

Contributing

Issues and PRs welcome at https://github.com/VelvetSP/web-retrieval-mcp. The server is a single module (src/web_retrieval_mcp/server.py); stdout is JSON-RPC only — keep all diagnostics on stderr.

License

MIT © VelvetSP


Keywords: MCP server, Model Context Protocol, AI agent web search, LLM web fetch, Exa API, Firecrawl API, camoufox, Claude Code MCP, web scraping for agents, RAG retrieval, SSRF-safe fetch, cross-platform, free web search API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_retrieval_mcp-0.1.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

web_retrieval_mcp-0.1.0-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file web_retrieval_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: web_retrieval_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for web_retrieval_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 065515e95fe3565228cf7bf0a4ce2f7d3bff33581c8e2bf7e9692c013cb2191c
MD5 30d06a0cefdb3a99d0cfbfd7e19e6ce0
BLAKE2b-256 74b3f8c9cd4f6ac2dc9496aa506006534bc34b946ce9f42bebba4510960359da

See more details on using hashes here.

File details

Details for the file web_retrieval_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for web_retrieval_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9eea3538e689168db62589c395a11d5d814e77f56cf7185d37a67f378a278214
MD5 f9a84b244fc0c13578ce14c895d07d19
BLAKE2b-256 13344d0d98bc1d44cff94672fce1d47c4eb03c049c959e866663714d580e3dca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page