Skip to main content

MCP server and CLI for search-first web research with batched page reading, focused extraction, and direct URL support

Project description

SourceWeave Web Search

SourceWeave Web Search is an MCP server and CLI for search-first web research plus follow-up page reading.

It uses SearXNG for discovery, Crawl4AI for cleaned page extraction, and Redis or Valkey as the canonical persisted page cache.

For most users, the setup is simple:

  1. run the supporting services locally in containers, or point at existing external endpoints
  2. start the MCP server with uvx
  3. connect your MCP client to the running server over stdio or local HTTP

Key Features

  • MCP server with stdio, sse, and streamable-http transports
  • search-first source discovery plus batched page reading for MCP clients
  • explicit per-URL document conversion for PDFs and other supported documents
  • focused reads, direct URL reads, related-link limits, image metadata, and page-quality hints
  • publishable Python package, container image, and generated OpenWebUI artifact
  • compatible with OpenCode, VS Code Copilot, and other MCP clients

Requirements

  • Python 3.12+
  • a reachable SearXNG endpoint
  • a reachable Crawl4AI endpoint
  • a reachable Redis or Valkey instance

Optional:

  • Docker and Docker Compose for the repo-local stack

Recommended Local Deployment

Start the supporting services locally:

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
cp .env.example .env
docker compose up -d redis crawl4ai searxng

Then start the MCP server from the published package with uvx and point it at those local endpoints:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp

For a local HTTP MCP endpoint instead of stdio:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8000

You can also point the same uvx command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.

Installation Options

Python package

Published releases can be installed from PyPI:

pip install sourceweave-web-search

Or run directly without a global install:

uvx --from sourceweave-web-search sourceweave-search-mcp
uvx --from sourceweave-web-search sourceweave-search --query "python programming"

Repo checkout

For local development or source-based runs:

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp

Container image

The release workflow can publish a container image to:

  • ghcr.io/mrnaqa/sourceweave-web-search-mcp

Example runtime:

docker run --rm -p 8000:8000 \
  -e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
  -e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
  -e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
  ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest

Example docker compose recipe:

services:
  redis:
    image: valkey/valkey:9-alpine
    command: ["redis-server", "--appendonly", "no"]

  crawl4ai:
    image: unclecode/crawl4ai:0.8.6

  searxng:
    image: searxng/searxng:2026.4.11-9e08a6771

  sourceweave-mcp:
    image: ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest
    depends_on:
      - redis
      - crawl4ai
      - searxng
    environment:
      SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL: http://searxng:8080/search?format=json&q=<query>
      SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL: http://crawl4ai:11235
      SOURCEWEAVE_SEARCH_CACHE_REDIS_URL: redis://redis:6379/2
      FASTMCP_HOST: 0.0.0.0
      FASTMCP_PORT: 8000
    ports:
      - "8000:8000"

That gives you a local HTTP MCP endpoint at http://127.0.0.1:8000/mcp with the SourceWeave container linked to the supporting services by container name.

The repo's own docker compose up -d --build mcp path also builds and runs this same publishable image locally.

Runtime Configuration

Set these environment variables:

Variable Purpose
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL SearXNG URL template. Must contain <query>.
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL Crawl4AI base URL.
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL Redis or Valkey URL used for caching.
FASTMCP_HOST Host for sse or streamable-http transport.
FASTMCP_PORT Port for sse or streamable-http transport.

Example:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
sourceweave-search --query "python programming" --read-first-pages 2

Quick Start

The CLI is useful for smoke testing the runtime outside an MCP client.

Search and immediately read the first results:

sourceweave-search --query "python programming" --read-first-pages 2

Read a discovered page and include stored related links:

sourceweave-search \
  --query "react useEffect cleanup example" \
  --read-first-page \
  --related-links-limit 3

Read a direct URL without running search_web first:

sourceweave-search \
  --read-url "https://packaging.python.org/en/latest/" \
  --max-chars 2000

Force document conversion for an explicit URL:

sourceweave-search \
  --query "guide pdf" \
  --url '{"url": "https://example.com/guide.pdf", "convert_document": true}'

MCP Server

Run over stdio:

sourceweave-search-mcp

Run as a local HTTP endpoint:

sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000

What MCP Clients Get

MCP clients receive a simple two-step flow:

  • search_web: discover relevant sources with compact summaries, key points, metadata, and stable page_id handles for follow-up work
  • read_pages: read by page_id after search_web or use it as a standalone direct-URL reader, batch related pages in one call, optionally focus the extraction, and retrieve stored related-link and page-quality context when useful

Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.

MCP Client Setup

OpenCode

Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "local",
      "command": [
        "uvx",
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "environment": {
        "SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
        "SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
        "SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
      },
      "enabled": true,
      "timeout": 30000
    }
  }
}

For a shared HTTP endpoint instead:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "remote",
      "url": "http://127.0.0.1:18000/mcp",
      "enabled": true,
      "timeout": 30000
    }
  }
}

VS Code Copilot

Example .vscode/mcp.json:

{
  "servers": {
    "sourceweave": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "env": {
        "SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
        "SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
        "SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
      }
    }
  }
}

For a shared HTTP endpoint instead:

{
  "servers": {
    "sourceweave": {
      "type": "http",
      "url": "http://127.0.0.1:18000/mcp"
    }
  }
}

OpenWebUI

This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.

From a repo checkout, verify it is in sync with the canonical implementation:

uv run sourceweave-build-openwebui --check

Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path.

Defaults

Default host-side endpoints used by the package:

  • SearXNG: http://127.0.0.1:19080/search?format=json&q=<query>
  • Crawl4AI: http://127.0.0.1:19235
  • Redis: redis://127.0.0.1:16379/2

Default repo-local ports:

  • SearXNG: 19080
  • Crawl4AI: 19235
  • Redis: 16379
  • MCP: 8000 when run directly with uvx; 18000 at /mcp when using the repo's mcp compose service

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourceweave_web_search-0.2.3.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourceweave_web_search-0.2.3-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file sourceweave_web_search-0.2.3.tar.gz.

File metadata

  • Download URL: sourceweave_web_search-0.2.3.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sourceweave_web_search-0.2.3.tar.gz
Algorithm Hash digest
SHA256 326ef55db6e897d011550b050a0e5f326d6092a0dd71e33874da72ee3f3f9c88
MD5 3b5edde2a4aafa5e02f2af3acc4380f3
BLAKE2b-256 ecce53dfea03ce8a9aff0002d22857226007d7b4e562cdca2ea14cadc4ddc2fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.2.3.tar.gz:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sourceweave_web_search-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for sourceweave_web_search-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9a7e5225bc6ee98b5640282efb5e4f0e96a7115b0d5a08bb62ed33831c7b51ff
MD5 2ea0e85370a96fc83f90bbff026275dd
BLAKE2b-256 83ffb45192d1e081921429e145f4bee23ecfff54a04f7ecc73ba05a793694b68

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.2.3-py3-none-any.whl:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page