Skip to main content

Free, open-source web search MCP server for AI coding tools

Project description

Searchlight

Free, open-source web search MCP server for AI coding tools.

Works with Claude Code, Cursor, Windsurf, VS Code Copilot, and any MCP-compatible AI tool. Zero API keys required — install, add one line to your MCP config, and start searching.

Features

  • Zero cost — Free search via native HTTP scraping (Bing, Baidu, Yandex, Brave, DuckDuckGo)
  • Zero config — Works out of the box with auto backend and language-aware routing
  • 7 search engines with automatic failover and reachability probing
  • 4 MCP toolsweb_search, web_read, web_search_and_read, search_config
  • Quality Site Library — Auto-enhances queries with authoritative sources (Anthropic, OpenAI, MCP docs, LangChain, etc.)
  • Smart content extraction — trafilatura → readability → BeautifulSoup fallback chain with quality scoring
  • JS page rendering — Automatic Jina AI proxy fallback for JavaScript-rendered pages
  • Smart caching — Async SQLite with dynamic TTL (time-sensitive queries cache shorter)
  • Auto-learning — Automatically discovers and adds high-quality websites from your reading patterns
  • Security — Automatic API key/secret detection and redaction in queries

Installation

Option 1: PyPI (Recommended)

pip install searchlight-mcp

Or with uv:

uv pip install searchlight-mcp

Option 2: Install from GitHub

pip install git+https://github.com/McKenzieIT/smart-web-search.git

Quick Start — One-Click MCP Setup

Claude Code

Add to ~/.claude.json or project .mcp.json:

{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}

Or use the CLI one-liner:

claude mcp add searchlight -- python -m searchlight

Cursor

Add to .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}

Or global: ~/.cursor/mcp.json

VS Code Copilot

Add to .vscode/mcp.json:

{
  "servers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}

Windsurf

Add to .windsurf/mcp.json:

{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}

Generic MCP Client

Any MCP-compatible tool can use this configuration:

{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"]
    }
  }
}

Restart your AI tool after adding the config. Searchlight's 4 tools are immediately available.

Agent Prompts — Teach Your AI to Search Well

After installing searchlight, paste the appropriate prompt into your agent's system prompt or custom instructions so it knows when and how to search.

Claude Code

Add to your CLAUDE.md or project .claude/instructions.md:

## Web Search Guidelines
- Use `web_search` to find documentation, error solutions, current events, and comparisons.
- Use `web_read` to extract detailed content from a specific URL.
- Use `web_search_and_read` for deep research that requires reading multiple pages.
- For quick lookups, `web_search` alone is sufficient — no need to read every result.
- Use `time_range="month"` or `"week"` for current events or recent changes.
- Use `mode="preview"` to check if a page is relevant before reading the full content.
- The Quality Site Library automatically prioritizes authoritative sources for AI/developer topics.

Cursor

Add to .cursorrules or Cursor's custom instructions:

## Web Search
When you need current information, documentation, or solutions not in your training data:
- Use `web_search(query)` to find relevant results quickly.
- Use `web_read(url)` to read a specific page's content.
- Use `web_search_and_read(query)` for comprehensive research.
- Prefer `web_search` for quick answers; use `web_search_and_read` for in-depth analysis.
- Use `time_range="month"` for recent information.

VS Code Copilot

Add to .github/copilot-instructions.md:

## Web Search
- `web_search(query)`: Find information on the web. Returns titles, URLs, snippets.
- `web_read(url)`: Read a web page's content as Markdown.
- `web_search_and_read(query)`: Search and read top results in one call.
- Use web_search as the first choice. Use web_search_and_read for research tasks.
- Use time_range parameter for current events.

Windsurf

Add to .windsurfrules:

## Web Search
- Use web_search to find docs, error solutions, and current info.
- Use web_read to extract content from specific URLs.
- Use web_search_and_read for deep research.
- The searchlight MCP server auto-boosts results from authoritative AI/developer sources.

MCP Tools

web_search

Search the web using multiple engines. Returns Markdown-formatted results.

web_search(query="Python asyncio tutorial", max_results=10, time_range="month")
Parameter Type Default Description
query string required Search terms (max 500 chars)
max_results int 10 Number of results (1-20)
language string null Language code (zh, en, ja, etc.)
time_range string null "day", "week", "month", "year"
backend string null Override backend for this search

web_read

Read and extract clean Markdown content from a web page.

web_read(url="https://docs.python.org/3/library/asyncio.html", max_length=10000, mode="full")
Parameter Type Default Description
url string required URL to read
max_length int 10000 Maximum characters to return
mode string "full" "preview" (headings only) or "full"

web_search_and_read

Search + read top results in one call. Best for deep research.

web_search_and_read(query="FastAPI vs Flask comparison", max_read=2)
Parameter Type Default Description
query string required Search query
max_results int 5 Max search results
max_read int 2 Pages to read (auto-extends on failure)
max_length int 10000 Max chars per page

search_config

View and manage searchlight configuration.

search_config(action="status")                          # Show config, cache, QSL stats
search_config(action="health_check")                    # Test engine connectivity
search_config(action="clear_cache")                     # Clear all cached results
search_config(action="set_backend", backend="bing")     # Switch default backend

Search Backends

All backends use direct HTTP scraping — no API keys needed.

Backend Description Language
auto Best available (default) Auto-detect
bing Bing International English
bing_cn Bing China Chinese
baidu Baidu Search Chinese
yandex Yandex Search English
brave Brave Search English
duckduckgo DuckDuckGo English

The auto backend automatically detects Chinese characters and routes to Baidu/Bing CN for Chinese queries, and uses Brave/DuckDuckGo/Bing for English queries. Engines are probed for reachability and failed engines are skipped.

Quality Site Library

Searchlight includes a built-in Quality Site Library that enhances search results for AI/developer topics:

  • Query Enhancement — Automatically adds authoritative keywords when searching for LLM, MCP, agent, Python topics
  • Result Boosting — Moves results from quality domains (official docs, research papers) higher in rankings
  • Auto-Learning — Tracks websites you read and automatically adds high-quality ones to the library

Built-in categories with curated sources:

Category Quality Sources
LLM OpenAI Platform, Anthropic Docs, Google AI, Hugging Face
MCP modelcontextprotocol.io, Anthropic MCP Docs
Agents LangChain, CrewAI, LlamaIndex
Anthropic anthropic.com, docs.anthropic.com
Google AI ai.google.dev, Google Cloud
Python docs.python.org, PyPI, uv/Real Python

The library is stored at ~/.searchlight/sites.json and can be manually edited.

Content Extraction Pipeline

URL → HTTP Fetch (SSL progressive degradation)
    → trafilatura (Markdown mode)
    → readability + markdownify (fallback)
    → BeautifulSoup cleanup (fallback)
    → Quality Report (5-dimension assessment)
    → Section-aware truncation
    → Cached in SQLite

Quality scores measure: text density, structure quality, noise-free, completeness, and HTML cleanliness. JavaScript-rendered pages automatically fall back to the Jina AI proxy for rendering.

Configuration

Set via MCP config env field or shell environment:

SEARCHLIGHT_BACKEND=auto           # Default backend
SEARCHLIGHT_CACHE_TTL=24           # Cache TTL in hours
SEARCHLIGHT_CACHE_MAX_SIZE=100     # Max cache size in MB
SEARCHLIGHT_MAX_CONTENT=10000      # Max content length in chars
SEARCHLIGHT_TIMEOUT=15             # HTTP timeout in seconds
SEARCHLIGHT_VERBOSE=true           # Enable debug logging

Example with custom backend:

{
  "mcpServers": {
    "searchlight": {
      "command": "python",
      "args": ["-m", "searchlight"],
      "env": {
        "SEARCHLIGHT_BACKEND": "bing"
      }
    }
  }
}

Architecture

searchlight/
├── server.py              # MCP server + 4 tools
├── sites_library.py       # Quality Site Library (QSL)
├── config.py              # Environment-based config
├── search/
│   ├── base.py            # SearchBackend ABC + SearchResult
│   └── native.py          # Native HTTP search (7 engines)
├── reader/
│   ├── fetcher.py         # HTTP fetching + SSL fallback + Jina proxy
│   └── extractor.py       # Content extraction + QualityReport
├── processing/
│   ├── filter.py          # Dedup + spam filtering
│   └── truncator.py       # Section-aware Markdown truncation
├── cache/
│   └── sqlite.py          # Async SQLite with smart TTL
├── security/
│   └── sanitizer.py       # Secret detection in queries
└── utils/
    ├── logger.py          # Logging setup
    └── health.py          # Backend health checks

Publishing

pip install build twine
python -m build
twine upload dist/*

Development

pip install -e ".[dev]"
pytest tests/ -v

Run with verbose logging:

SEARCHLIGHT_VERBOSE=true python -m searchlight

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchlight_mcp-4.1.0.tar.gz (70.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searchlight_mcp-4.1.0-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file searchlight_mcp-4.1.0.tar.gz.

File metadata

  • Download URL: searchlight_mcp-4.1.0.tar.gz
  • Upload date:
  • Size: 70.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for searchlight_mcp-4.1.0.tar.gz
Algorithm Hash digest
SHA256 bc1d4913e511ff0d4e18fa5672d0f3f058bfff1379176604b30de475e62f0779
MD5 d6cc08b32286e14d9e67e2b13e36b9c5
BLAKE2b-256 7ce6a0b7c279f05d0d70dade899e2f26de4e3bc97fcb42c3fb24f22d076f3f17

See more details on using hashes here.

File details

Details for the file searchlight_mcp-4.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for searchlight_mcp-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d5e76a21021706ad32df68b3f6dcf83803f163f6bec9be6cbe8ef1e4f46cdf0
MD5 1cdfd7f6a0b298b76cd8f9e749c6174e
BLAKE2b-256 d71b04915591311148178da3934630abdfbb063e8d786604baec65a4a4fed416

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page