Python port of open-webSearch.
Project description
WebProbe (Python port of open-webSearch)
This project replicates the core search and fetch capabilities of Aas-ee/open-webSearch using Python. It exposes a CLI that can:
- Search multiple engines (Bing, DuckDuckGo, Baidu, Brave, Exa, Startpage, CSDN, Juejin, Linux.do)
- Fetch full-length articles from CSDN, Linux.do, and Juejin
- Download GitHub
README.*files without hitting the API
Installation
python -m pip install --upgrade pip
pip install -e .
Package usage
Install the project locally and consume the webprobe package directly:
from webprobe import WebProbeServer, search, fetch_csdn
print(search("visible web", limit=5))
print(fetch_csdn("https://blog.csdn.net/example/article/details/xxxxx"))
# Start the bundled HTTP server (serves /search and /fetch?kind=csdn)
server = WebProbeServer(host="0.0.0.0", port=3210)
try:
server.serve_forever()
finally:
server.shutdown()
The HTTP server exposes /search?query=...&limit=...&engines=... and /fetch?kind=<csdn|linuxdo|juejin|github>&url=....
CLI
Run python main.py --help to see available commands. Key subcommands:
search
python main.py search "open websearch" --limit 12 --engines bing,duckduckgo
Article fetchers
Each fetcher prints JSON or plain text:
python main.py fetch-csdn <url>python main.py fetch-linuxdo <url>python main.py fetch-juejin <url>python main.py fetch-github <repo-url>
Configuration
Environment variables mirror the TypeScript version:
| Variable | Default | Description |
|---|---|---|
DEFAULT_SEARCH_ENGINE |
bing |
Default search engine |
ALLOWED_SEARCH_ENGINES |
(empty) | Comma-separated whitelist |
USE_PROXY / PROXY_URL |
false / http://127.0.0.1:7890 |
HTTP proxy for requests |
Set USE_PROXY=true to route all HTTP traffic through PROXY_URL.
Architecture
src/engine/search_service.pyorchestrates multi-engine searches with distribution logic.src/engines/*implement individual search/fetch adapters for each provider.src/utils/contains HTTP helpers, Playwright bridges for future browser fallbacks, and shared fetch logic for CSDN articles.
Next steps
- Wire this CLI into an MCP server similar to the TypeScript runtime.
- Add Playwright-backed fallbacks for blocked search pages and protected articles.
- Extend fetchers with generic web extraction (
fetch_web_content) as in the original repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcpwebprobe-0.1.0.tar.gz.
File metadata
- Download URL: mcpwebprobe-0.1.0.tar.gz
- Upload date:
- Size: 30.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dec9d8689c2e1c97addfd789c63170325e1113ac924e9c423f1880dc0fe90df5
|
|
| MD5 |
48377167b5fa3758935013b5639354c0
|
|
| BLAKE2b-256 |
3a334f05e9d86e0d8666ba08c849ff57756979a7866f790bd432d29adec0db59
|
File details
Details for the file mcpwebprobe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mcpwebprobe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 46.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a21fc552dc1c59ada61fcaf7c84e95cdf0280875df0e6a7f61627fe94275386
|
|
| MD5 |
10173af68ac30d846d17fc8396c41c24
|
|
| BLAKE2b-256 |
57fe1cf894de94f9c9b8e1207679ae8395580f3206a63edb8021acaf5c67ddfa
|