Skip to main content

Fully local MCP server and CLI for web research

Project description

SourceWeave Web Search

SourceWeave Web Search is a fully local MCP server and CLI for web research.

It uses SearXNG for discovery, Crawl4AI for cleaned page extraction, and Redis or Valkey as the canonical persisted page cache.

For most users, the setup is simple:

  1. run the supporting services locally in containers, or point at existing external endpoints
  2. start the MCP server with uvx
  3. connect your MCP client to the running server over stdio or local HTTP

Key Features

  • MCP server with stdio, sse, and streamable-http transports
  • fully local web research workflow with source discovery and stable follow-up reads for MCP clients
  • automatic document conversion for PDFs and other supported documents when detected
  • lean MCP contract with search_web, read_pages, and read_urls, plus optional search effort control for quick, normal, or deep discovery
  • publishable Python package, container image, and generated OpenWebUI artifact
  • compatible with OpenCode, VS Code Copilot, and other MCP clients

Requirements

  • Python 3.12+
  • a reachable SearXNG endpoint
  • a reachable Crawl4AI endpoint
  • a reachable Redis or Valkey instance

Optional:

  • Docker and Docker Compose for the repo-local stack

Recommended Local Deployment

Start the supporting services locally:

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
cp .env.example .env
docker compose up -d redis crawl4ai searxng

Then start the MCP server from the published package with uvx and point it at those local endpoints:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp

For a local HTTP MCP endpoint instead of stdio:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8000

You can also point the same uvx command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.

Installation Options

Python package

Published releases can be installed from PyPI:

pip install sourceweave-web-search

Or run directly without a global install:

uvx --from sourceweave-web-search sourceweave-search-mcp
uvx --from sourceweave-web-search sourceweave-search --query "python programming"

Repo checkout

For local development or source-based runs:

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp

Container image

The release workflow can publish a container image to:

  • ghcr.io/mrnaqa/sourceweave-web-search-mcp

Example runtime:

docker run --rm -p 8000:8000 \
  -e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
  -e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
  -e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
  ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest

Example docker compose recipe:

services:
  redis:
    image: valkey/valkey:9-alpine
    command: ["redis-server", "--appendonly", "no"]

  crawl4ai:
    image: unclecode/crawl4ai:0.8.6

  searxng:
    image: searxng/searxng:2026.4.11-9e08a6771

  sourceweave-mcp:
    image: ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest
    depends_on:
      - redis
      - crawl4ai
      - searxng
    environment:
      SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL: http://searxng:8080/search?format=json&q=<query>
      SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL: http://crawl4ai:11235
      SOURCEWEAVE_SEARCH_CACHE_REDIS_URL: redis://redis:6379/2
      FASTMCP_HOST: 0.0.0.0
      FASTMCP_PORT: 8000
    ports:
      - "8000:8000"

That gives you a local HTTP MCP endpoint at http://127.0.0.1:8000/mcp with the SourceWeave container linked to the supporting services by container name.

The repo's own docker compose up -d --build mcp path also builds and runs this same publishable image locally.

Runtime Configuration

Set these environment variables:

Variable Purpose
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL SearXNG URL template. Must contain <query>.
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL Crawl4AI base URL.
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL Redis or Valkey URL used for caching.
FASTMCP_HOST Host for sse or streamable-http transport.
FASTMCP_PORT Port for sse or streamable-http transport.

Example:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
sourceweave-search --query "python programming" --read-first-pages 2

Quick Start

The CLI is useful for smoke testing the runtime outside an MCP client.

Search and immediately read the first results:

sourceweave-search --query "python programming" --read-first-pages 2

Verified live examples from the repo-local stack:

  • sourceweave-search --read-url https://en.wikipedia.org/wiki/Comparison_of_HTTP_server_software ... returned cleaned page content
  • sourceweave-search --query 'HTTP overview' --domain developer.mozilla.org --read-first-page ... returned compact search results plus a focused page read

Constrain search to a specific host with --domain:

sourceweave-search \
  --query "react useEffect cleanup example" \
  --domain developer.mozilla.org \
  --read-first-page

Read a direct URL without running search_web first:

sourceweave-search \
  --read-url "https://packaging.python.org/en/latest/"

Read a document URL directly without extra flags:

sourceweave-search \
  --query "guide pdf" \
  --url "https://example.com/guide.pdf"

MCP Server

Run over stdio:

sourceweave-search-mcp

Run as a local HTTP endpoint:

sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000

What MCP Clients Get

MCP clients receive a lean three-tool contract:

  • search_web(query, domains?, urls?, effort?): discover relevant sources and get compact results with stable page_id handles
  • read_pages(page_ids, focus?): read stored pages by page_id
  • read_urls(urls, focus?): read one or more direct URLs without searching first

Public result shapes are intentionally small:

  • search_web returns page_id, url, title, summary, and key_points
  • read_pages and read_urls return page_id, url, title, and content
  • content_type is only included when the content is not HTML, and truncated is only included when true

Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.

MCP Client Setup

OpenCode

Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "local",
      "command": [
        "uvx",
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "environment": {
        "SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
        "SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
        "SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
      },
      "enabled": true,
      "timeout": 30000
    }
  }
}

For a shared HTTP endpoint instead:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "remote",
      "url": "http://127.0.0.1:18000/mcp",
      "enabled": true,
      "timeout": 30000
    }
  }
}

VS Code Copilot

Example .vscode/mcp.json:

{
  "servers": {
    "sourceweave": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "env": {
        "SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
        "SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
        "SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
      }
    }
  }
}

For a shared HTTP endpoint instead:

{
  "servers": {
    "sourceweave": {
      "type": "http",
      "url": "http://127.0.0.1:18000/mcp"
    }
  }
}

OpenWebUI

This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.

From a repo checkout, verify it is in sync with the canonical implementation:

uv run sourceweave-build-openwebui --check

Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path. The generated file rewrites the default endpoints to the repo-local compose service names so it matches the container deployment path out of the box.

Defaults

Default host-side endpoints used by the package:

  • SearXNG: http://127.0.0.1:19080/search?format=json&q=<query>
  • Crawl4AI: http://127.0.0.1:19235
  • Redis: redis://127.0.0.1:16379/2

Default repo-local ports:

  • SearXNG: 19080
  • Crawl4AI: 19235
  • Redis: 16379
  • MCP: 8000 when run directly with uvx; 18000 at /mcp when using the repo's mcp compose service

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourceweave_web_search-0.4.0.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourceweave_web_search-0.4.0-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file sourceweave_web_search-0.4.0.tar.gz.

File metadata

  • Download URL: sourceweave_web_search-0.4.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sourceweave_web_search-0.4.0.tar.gz
Algorithm Hash digest
SHA256 1727fae2420f1041040558b7e53919c13470c74016d45518c61173aab1d54b1c
MD5 5aca620c043795d7764963a5e5bd96f0
BLAKE2b-256 864774260fdd8d4e5f7425aad1b941c0dfc10012259cd94824f631cd69110487

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.4.0.tar.gz:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sourceweave_web_search-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sourceweave_web_search-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 254a21cebbeb7bf95a766139b2f1ceec84641a4aaec86554f2ecf8f9c46ac910
MD5 7c14ea7341c94c7d3bc29e8891a16e09
BLAKE2b-256 75b22ab90f84ad2bb8178d093f1b32494bca98c988ef90c5a532b7d19b500565

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.4.0-py3-none-any.whl:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page