Skip to main content

Python port of open-webSearch.

Project description

WebProbe (Python port of open-webSearch)

This project replicates the core search and fetch capabilities of Aas-ee/open-webSearch using Python. It exposes a CLI that can:

  • Search multiple engines (Bing, DuckDuckGo, Baidu, Brave, Exa, Startpage, CSDN, Juejin, Linux.do)
  • Fetch full-length articles from CSDN, Linux.do, and Juejin
  • Download GitHub README.* files without hitting the API

Installation

python -m pip install --upgrade pip
pip install -e .

Package usage

Install the project locally and consume the webprobe package directly:

from webprobe import WebProbeServer, search, fetch_csdn

print(search("visible web", limit=5))
print(fetch_csdn("https://blog.csdn.net/example/article/details/xxxxx"))

# Start the bundled HTTP server (serves /search and /fetch?kind=csdn)
server = WebProbeServer(host="0.0.0.0", port=3210)
try:
    server.serve_forever()
finally:
    server.shutdown()

The HTTP server exposes /search?query=...&limit=...&engines=... and /fetch?kind=<csdn|linuxdo|juejin|github>&url=....

CLI

Run python main.py --help to see available commands. Key subcommands:

search

python main.py search "open websearch" --limit 12 --engines bing,duckduckgo

Article fetchers

Each fetcher prints JSON or plain text:

  • python main.py fetch-csdn <url>
  • python main.py fetch-linuxdo <url>
  • python main.py fetch-juejin <url>
  • python main.py fetch-github <repo-url>

Configuration

Environment variables mirror the TypeScript version:

Variable Default Description
DEFAULT_SEARCH_ENGINE bing Default search engine
ALLOWED_SEARCH_ENGINES (empty) Comma-separated whitelist
USE_PROXY / PROXY_URL false / http://127.0.0.1:7890 HTTP proxy for requests

Set USE_PROXY=true to route all HTTP traffic through PROXY_URL.

Architecture

  • src/engine/search_service.py orchestrates multi-engine searches with distribution logic.
  • src/engines/* implement individual search/fetch adapters for each provider.
  • src/utils/ contains HTTP helpers, Playwright bridges for future browser fallbacks, and shared fetch logic for CSDN articles.

Next steps

  1. Wire this CLI into an MCP server similar to the TypeScript runtime.
  2. Add Playwright-backed fallbacks for blocked search pages and protected articles.
  3. Extend fetchers with generic web extraction (fetch_web_content) as in the original repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcpwebprobe-0.1.0.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcpwebprobe-0.1.0-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file mcpwebprobe-0.1.0.tar.gz.

File metadata

  • Download URL: mcpwebprobe-0.1.0.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mcpwebprobe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dec9d8689c2e1c97addfd789c63170325e1113ac924e9c423f1880dc0fe90df5
MD5 48377167b5fa3758935013b5639354c0
BLAKE2b-256 3a334f05e9d86e0d8666ba08c849ff57756979a7866f790bd432d29adec0db59

See more details on using hashes here.

File details

Details for the file mcpwebprobe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcpwebprobe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 46.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mcpwebprobe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a21fc552dc1c59ada61fcaf7c84e95cdf0280875df0e6a7f61627fe94275386
MD5 10173af68ac30d846d17fc8396c41c24
BLAKE2b-256 57fe1cf894de94f9c9b8e1207679ae8395580f3206a63edb8021acaf5c67ddfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page