Skip to main content

Fully local MCP server and CLI for web research

Project description

SourceWeave Web Search

Python 3.12+ MIT License MCP Docker managed runtime

Search-first MCP server and CLI for web research.

[!NOTE] sourceweave-search-mcp is the default local entrypoint. When explicit SOURCEWEAVE_SEARCH_* endpoint variables are absent, it discovers or starts the local Docker-backed stack automatically. If you already run the services yourself, set explicit endpoints and it will use them instead.

OverviewGetting startedManaged local runtimeMCP client setupCLIContainer deploymentsOpenWebUIRuntime configurationDevelopment

Overview

SourceWeave Web Search gives MCP clients a compact three-tool contract for web research:

  • search_web(query, domains?, urls?, effort?) discovers sources and returns compact results with stable page_id handles.
  • read_pages(page_ids, focus?) reads stored pages by page_id.
  • read_urls(urls, focus?) reads direct URLs without searching first.

It combines:

Component Role
SearXNG Search discovery
Crawl4AI Clean HTML extraction
Redis or Valkey Persisted page cache and page_id store
MarkItDown Document conversion for PDFs and other supported files

Getting started

Requirements

  • Python 3.12+
  • Docker with Compose support for the default managed local runtime
  • Explicit SOURCEWEAVE_SEARCH_* endpoints only if you want hosted or self-managed services

Managed local runtime

Run the server from the published package:

uvx --from sourceweave-web-search sourceweave-search-mcp

Or start the MCP server over HTTP:

uvx --from sourceweave-web-search sourceweave-search-mcp \
  --transport streamable-http \
  --host 127.0.0.1 \
  --port 8000

When no endpoint env vars are set, sourceweave-search-mcp:

Mode What happens
Managed stack found Join the existing SourceWeave-managed stack for the current runtime state directory
Healthy external stack found Reuse the canonical local ports 19080, 19235, and 16379 without ownership
No reusable stack Start and supervise a Docker-backed stack on canonical or free local ports

Managed state lives under ~/.sourceweave-local/managed-runtime. Multiple MCP processes on the same machine share one managed stack per state directory.

[!IMPORTANT] Managed runtime removes containers only when the last active SourceWeave-managed process exits. Named volumes are preserved, so cache data survives restarts. If the original owning process dies, a later process can recover the same stack from Docker project identity and persisted runtime state.

Explicit endpoint mode

If you already run SearXNG, Crawl4AI, and Redis or Valkey yourself, or want to point at hosted services, set explicit endpoints and the MCP entrypoint will bypass managed Docker startup:

SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp

Direct CLI

sourceweave-search runs the tool directly. Use it when the supporting services are already available or when you provide explicit endpoints. It does not start Docker.

sourceweave-search --query "python programming" --read-first-pages 2
sourceweave-search --read-url "https://packaging.python.org/en/latest/"

[!TIP] The direct CLI also accepts --searxng-base-url, --crawl4ai-base-url, and --cache-redis-url overrides.

MCP client setup

OpenCode

Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "local",
      "command": [
        "uvx",
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ],
      "enabled": true,
      "timeout": 300000
    }
  }
}

For a shared HTTP endpoint instead:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "sourceweave": {
      "type": "remote",
      "url": "http://127.0.0.1:18000/mcp",
      "enabled": true,
      "timeout": 300000
    }
  }
}

VS Code Copilot

Example .vscode/mcp.json:

{
  "servers": {
    "sourceweave": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ]
    }
  }
}

For a shared HTTP endpoint instead:

{
  "servers": {
    "sourceweave": {
      "type": "http",
      "url": "http://127.0.0.1:18000/mcp"
    }
  }
}

Claude Code

Example .mcp.json:

{
  "mcpServers": {
    "sourceweave": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "sourceweave-web-search",
        "sourceweave-search-mcp"
      ]
    }
  }
}

For a project-scoped shared config, place the same block in .mcp.json at the repo root.

CLI

The direct CLI is useful once the supporting services are already reachable. It gives you the same search-first workflow without the MCP wrapper.

sourceweave-search --query "react useEffect cleanup example" --read-first-page
sourceweave-search --query "HTTP overview" --domain developer.mozilla.org --read-first-page
sourceweave-search --read-url "https://packaging.python.org/en/latest/"

Container deployments

The managed local runtime is for host-side uvx or uv run launches. Containerized deployments still use explicit endpoint wiring.

  • Image: ghcr.io/mrnaqa/sourceweave-web-search-mcp
  • Repo-local compose entrypoint: docker compose up -d --build mcp

Example container run:

docker run --rm -p 8000:8000 \
  -e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
  -e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
  -e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
  ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest

OpenWebUI

This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.

From a repo checkout, verify it is in sync with the canonical implementation:

uv run sourceweave-build-openwebui --check

Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path. The generated file rewrites the default endpoints to the repo-local compose service names so it matches the container deployment path out of the box.

Runtime configuration

Optional environment variables:

Variable Purpose
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL SearXNG URL template. Must contain <query>.
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL Crawl4AI base URL.
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL Redis or Valkey URL used for caching.
FASTMCP_HOST Host for sse or streamable-http transport.
FASTMCP_PORT Port for sse or streamable-http transport.

If the endpoint variables are unset, sourceweave-search-mcp defaults to managed local runtime.

  • Canonical host endpoints remain the preferred defaults and the external-reuse probe targets.
  • A SourceWeave-managed stack may use different free host ports when the canonical defaults are already occupied.
  • Multiple MCP processes on the same machine share one managed stack per local runtime state directory.

Default endpoint values:

  • SearXNG: http://127.0.0.1:19080/search?format=json&q=<query>
  • Crawl4AI: http://127.0.0.1:19235
  • Redis: redis://127.0.0.1:16379/2

Default preferred host ports for managed startup:

  • SearXNG: 19080
  • Crawl4AI: 19235
  • Redis: 16379
  • MCP: 8000 when run directly with uvx; 18000 at /mcp when using the repo's mcp compose service

Development

git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp

Useful checks:

uv run sourceweave-build-openwebui --check
uv run sourceweave-search-mcp --help
uv run pytest tests/test_config.py tests/test_packaging.py tests/test_tool.py tests/test_managed_runtime.py -m "not integration"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourceweave_web_search-0.5.0.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourceweave_web_search-0.5.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file sourceweave_web_search-0.5.0.tar.gz.

File metadata

  • Download URL: sourceweave_web_search-0.5.0.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sourceweave_web_search-0.5.0.tar.gz
Algorithm Hash digest
SHA256 de87aab57cd71de53b3778399236568284fc566ad785e3b48d7c021c76a78997
MD5 c4ecc89cc400dddbc1cfec6c6b60ad3b
BLAKE2b-256 ac151eb977e140488ec844db083cd9cf89365dd67576412f2a0fce7e4a7cdf73

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.5.0.tar.gz:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sourceweave_web_search-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sourceweave_web_search-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1aad33ef3ceb6b640bf2d5949bd17e98d5e74e7d8f735c0bbcff3c0c812b4b35
MD5 154bff925a7c49c3573aa458b05d448a
BLAKE2b-256 fd4e125d0a581a5935572a566346ebc02a7aeef241b8cc2707972b07ab024f3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sourceweave_web_search-0.5.0-py3-none-any.whl:

Publisher: release.yml on MRNAQA/sourceweave-web-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page