Skip to main content

Fully local MCP server and CLI for web research

Project description

SourceWeave Web Search

SourceWeave Web Search is a fully local MCP server and CLI for web research.

It uses SearXNG for discovery, Crawl4AI for cleaned page extraction, and Redis or Valkey as the canonical persisted page cache.

For most users, the setup is simple:

  1. run the supporting services locally in containers, or point at existing external endpoints
  2. start the MCP server with uvx
  3. connect your MCP client to the running server over stdio or local HTTP

Key Features

  • MCP server with stdio, sse, and streamable-http transports
  • fully local web research workflow with source discovery and stable follow-up reads for MCP clients
  • automatic document conversion for PDFs and other supported documents when detected
  • lean MCP contract with search_web, read_pages, and read_urls
  • publishable Python package, container image, and generated OpenWebUI artifact
  • compatible with OpenCode, VS Code Copilot, and other MCP clients

Requirements

  • Python 3.12+
  • a reachable SearXNG endpoint
  • a reachable Crawl4AI endpoint
  • a reachable Redis or Valkey instance

Optional:

  • Docker and Docker Compose for the repo-local stack

Recommended Local Deployment

Start the supporting services locally:

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
cp .env.example .env
docker compose up -d redis crawl4ai searxng

Then start the MCP server from the published package with uvx and point it at those local endpoints:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp

For a local HTTP MCP endpoint instead of stdio:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8000

You can also point the same uvx command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.

Installation Options

Python package

Published releases can be installed from PyPI:

pip install sourceweave-web-search

Or run directly without a global install:

uvx --from sourceweave-web-search sourceweave-search-mcp
uvx --from sourceweave-web-search sourceweave-search --query "python programming"

Repo checkout

For local development or source-based runs:

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp

Container image

The release workflow can publish a container image to:

  • ghcr.io/mrnaqa/sourceweave-web-search-mcp

Example runtime:

docker run --rm -p 8000:8000 \
  -e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
  -e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
  -e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
  ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest

Example docker compose recipe:

services:
  redis:
    image: valkey/valkey:9-alpine
    command: ["redis-server", "--appendonly", "no"]

  crawl4ai:
    image: unclecode/crawl4ai:0.8.6

  searxng:
    image: searxng/searxng:2026.4.11-9e08a6771

  sourceweave-mcp:
    image: ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest
    depends_on:
      - redis
      - crawl4ai
      - searxng
    environment:
      SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL: http://searxng:8080/search?format=json&q=<query>
      SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL: http://crawl4ai:11235
      SOURCEWEAVE_SEARCH_CACHE_REDIS_URL: redis://redis:6379/2
      FASTMCP_HOST: 0.0.0.0
      FASTMCP_PORT: 8000
    ports:
      - "8000:8000"

That gives you a local HTTP MCP endpoint at http://127.0.0.1:8000/mcp with the SourceWeave container linked to the supporting services by container name.

The repo's own docker compose up -d --build mcp path also builds and runs this same publishable image locally.

Runtime Configuration

Set these environment variables:

Variable Purpose
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL SearXNG URL template. Must contain <query>.
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL Crawl4AI base URL.
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL Redis or Valkey URL used for caching.
FASTMCP_HOST Host for sse or streamable-http transport.
FASTMCP_PORT Port for sse or streamable-http transport.

Example:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
sourceweave-search --query "python programming" --read-first-pages 2

Quick Start

The CLI is useful for smoke testing the runtime outside an MCP client.

Search and immediately read the first results:

sourceweave-search --query "python programming" --read-first-pages 2

Verified live examples from the repo-local stack:

  • sourceweave-search --read-url https://en.wikipedia.org/wiki/Comparison_of_HTTP_server_software ... returned cleaned page content
  • sourceweave-search --query 'HTTP overview' --domain developer.mozilla.org --read-first-page ... returned compact search results plus a focused page read

Constrain search to a specific host with --domain:

sourceweave-search \
  --query "react useEffect cleanup example" \
  --domain developer.mozilla.org \
  --read-first-page

Read a direct URL without running search_web first:

sourceweave-search \
  --read-url "https://packaging.python.org/en/latest/"

Read a document URL directly without extra flags:

sourceweave-search \
  --query "guide pdf" \
  --url "https://example.com/guide.pdf"

MCP Server

Run over stdio:

sourceweave-search-mcp

Run as a local HTTP endpoint:

sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000

What MCP Clients Get

MCP clients receive a lean three-tool contract:

  • search_web(query, domains?, urls?): discover relevant sources and get compact results with stable page_id handles
  • read_pages(page_ids, focus?): read stored pages by page_id
  • read_urls(urls, focus?): read one or more direct URLs without searching first

Public result shapes are intentionally small:

  • search_web returns page_id, url, title, summary, and key_points
  • read_pages and read_urls return page_id, url, title, and content
  • content_type is only included when the content is not HTML, and truncated is only included when true

Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.

MCP Client Setup

OpenCode

Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "local",
      "command": [
        "uvx",
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "environment": {
        "SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
        "SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
        "SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
      },
      "enabled": true,
      "timeout": 30000
    }
  }
}

For a shared HTTP endpoint instead:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "remote",
      "url": "http://127.0.0.1:18000/mcp",
      "enabled": true,
      "timeout": 30000
    }
  }
}

VS Code Copilot

Example .vscode/mcp.json:

{
  "servers": {
    "sourceweave": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "env": {
        "SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
        "SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
        "SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
      }
    }
  }
}

For a shared HTTP endpoint instead:

{
  "servers": {
    "sourceweave": {
      "type": "http",
      "url": "http://127.0.0.1:18000/mcp"
    }
  }
}

OpenWebUI

This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.

From a repo checkout, verify it is in sync with the canonical implementation:

uv run sourceweave-build-openwebui --check

Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path.

Defaults

Default host-side endpoints used by the package:

  • SearXNG: http://127.0.0.1:19080/search?format=json&q=<query>
  • Crawl4AI: http://127.0.0.1:19235
  • Redis: redis://127.0.0.1:16379/2

Default repo-local ports:

  • SearXNG: 19080
  • Crawl4AI: 19235
  • Redis: 16379
  • MCP: 8000 when run directly with uvx; 18000 at /mcp when using the repo's mcp compose service

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourceweave_web_search-0.3.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourceweave_web_search-0.3.0-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file sourceweave_web_search-0.3.0.tar.gz.

File metadata

  • Download URL: sourceweave_web_search-0.3.0.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sourceweave_web_search-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9bc7c632596bfb8f5663d5d26d3136eaf757acf3c2bf43ebf4ce94f37e0ae87a
MD5 efbbbd760b5476a160d16b821ed8dc2e
BLAKE2b-256 ca9d6febfc983b5a4d05271fe048b060ea2e882d0cf67643e16070e0e4172b33

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.3.0.tar.gz:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sourceweave_web_search-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sourceweave_web_search-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8da4ca2384ede4f49efad58fb656290ba26fe7d33b1a4ff7c1773367c67a0e54
MD5 7bdb411ec1d24a5006c0aaa30c8288b8
BLAKE2b-256 1d57fac236407152a5e430d8751b1a7a78a0640c58d9ddafd1b78b5fba85e189

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.3.0-py3-none-any.whl:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page