Search the web, rank results, fetch any page content.

These details have not been verified by PyPI

Project links

Project description

SkySearch

基于 Bing 搜索 + DrissionPage 动态渲染的搜索服务，核心优势是能抓取 JS 动态加载的内容（知乎、SPA、单页应用等）。

DrissionPage：Python 轻量级网页自动化库，比 Selenium 更轻，支持 JS 动态渲染。

快速开始

安装

pip install skysearch

立即使用

import skysearch

# 搜索（快，推荐）
results = skysearch.search_bare("浙江大学 录取 分数线", num=10)

# 抓取页面（按需使用）
info = skysearch.fetch("https://example.com", mode='info')

命令行：

skysearch "浙江大学 录取 分数线" --bare -n 10
skysearch --url https://example.com --mode info

核心设计理念

为什么推荐 search_bare + fetch，而不是 search？

	`search_bare` → `fetch`	`search`
速度	快，按需抓取	慢，抓取所有结果页面
灵活性	agent 人工判断 URL，按需抓取	自动抓取全部，不够灵活
BM25 排名	无（一般不需要）	有（但不值得牺牲速度）

mode=info vs mode=text？

	`mode='info'`	`mode='text'`
返回内容	title、text、links、meta	清洗后正文
数据完整性	✅ 完整结构化数据	❌ readability 可能删减正文

命令行使用

search_bare 搜索

# 纯搜索，不抓页面不排名
skysearch "关键词" --bare

# 指定结果数量
skysearch "Python教程" --bare -n 20

# 交互式输入
skysearch --bare

URL 抓取

# 结构化信息（推荐）
skysearch --url https://example.com --mode info

# 原始 HTML
skysearch --url https://example.com --mode raw

# 纯文本
skysearch --url https://example.com --mode text

# 调试：打开浏览器窗口
skysearch --url https://example.com --mode info --headed

# 抓取后保持浏览器打开
skysearch --url https://example.com --mode info --keep

搜索 + BM25 排名（不推荐，慢）

# 会抓取所有结果页面做排名，耗时较长
skysearch "关键词" -n 10

Python API

search_bare — 纯搜索

import skysearch

results = skysearch.search_bare("浙江大学 录取 分数线", num=10)
# 返回：
# [
#   {'title': '...', 'url': 'https://...', 'snippet': '...'},
#   ...
# ]

fetch — URL 抓取

# 结构化信息（推荐）
info = skysearch.fetch("https://example.com", mode='info')
# 返回：
# {
#   'url': 'https://example.com',
#   'title': '页面标题',
#   'text': '完整正文内容',
#   'links': [{'text': '链接文字', 'href': 'https://...'}, ...],
#   'meta': {'description': '...', 'keywords': '...'}
# }

# 原始 HTML
raw = skysearch.fetch("https://example.com", mode='raw')

# 纯文本
text = skysearch.fetch("https://example.com", mode='text')

# 调试：打开浏览器窗口
info = skysearch.fetch("https://example.com", mode='info', headless=False)

search_and_fetch — 搜索 + 抓取一体化

# 返回列表，每个元素包含搜索结果和抓取内容
results = skysearch.search_and_fetch("关键词", num=5, mode='info')
# [{
#   'title': '...', 'url': '...', 'score': 12.5,
#   'content': {...}  # fetch 返回的完整结构
# }, ...]

fetch_info / fetch_raw / fetch_links

# 结构化信息
info = skysearch.fetch_info("https://example.com")

# 原始 HTML
raw = skysearch.fetch_raw("https://example.com")

# 提取页面所有链接
links = skysearch.fetch_links("https://example.com")
# [{'text': '链接文字', 'href': 'https://...'}, ...]

中文搜索注意

skysearch 对中文词边界敏感。短词查询会被错误切分，应在词与词之间加空格：

查询	结果
`浙大2024分数线`	❌ 切词错误，答非所问
`浙江大学录取分数线`	✅ 精准命中多条相关内容

英文搜索无此问题。

API 参考

search_bare(query, num=10)

参数	说明	默认值
query	搜索关键词（中文建议手动分词加空格）	-
num	结果数量（支持分页，建议不超过 50）	10

fetch(url, mode='info', keep=False, timeout=10, retry=2, headless=True)

参数	说明	默认值
url	页面 URL	-
mode	输出模式：`info`/`text`/`raw`	info
keep	抓取后保持浏览器打开	False
timeout	请求超时秒数	10
retry	重试次数	2
headless	无头浏览器模式	True

fetch_info(url, keep=False, timeout=10, retry=2, headless=True)

返回 dict：{'url', 'title', 'text', 'links', 'meta'}

fetch_raw(url, keep=False, timeout=10, retry=2, headless=True)

返回 dict：{'url', 'html'}

fetch_links(url, timeout=10, retry=2, headless=True)

返回 list：[{'text': '...', 'href': '...'}, ...]

search_and_fetch(query, num=10, mode='info', verbose=False, keep=False, headless=True)

参数	说明	默认值
query	搜索关键词	-
num	结果数量	10
mode	输出模式	info
verbose	打印详细过程	False
keep	保持浏览器打开	False
headless	无头浏览器模式	True

输出模式说明

模式	说明	适用场景
`info`	结构化 JSON（url, title, text, links, meta）	✅ 数据分析 / agent
`text`	清洗后正文，内容可能被删减	人类阅读
`raw`	原始 HTML	深度解析

技术栈

模块	技术
HTTP 请求	DrissionPage (SessionPage)
动态渲染	DrissionPage (ChromiumPage)
HTML 解析	BeautifulSoup4 + lxml
正文提取	readability-lxml

项目结构

skysearch/
├── __init__.py       # 导出所有 API
├── api.py            # 简洁 API 接口
├── cli.py            # 命令行入口
├── search.py         # Bing 搜索（含分页）
├── ranker.py         # BM25 排序
└── fetcher/          # 页面抓取包
    ├── __init__.py
    ├── core.py       # 核心函数
    ├── session.py    # 会话管理
    └── parser.py     # HTML 解析

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

May 6, 2026

0.4.0

May 6, 2026

0.3.1

May 4, 2026

0.3.0

May 4, 2026

0.2.0

May 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skysearch-0.4.1.tar.gz (13.3 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skysearch-0.4.1-py3-none-any.whl (14.1 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file skysearch-0.4.1.tar.gz.

File metadata

Download URL: skysearch-0.4.1.tar.gz
Upload date: May 6, 2026
Size: 13.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for skysearch-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`bca83b27ced9ee0f4fecabe788291e29923a47e1a81e151167aedd95b7ee1bd3`
MD5	`ac353eb17e72ba83feb699d413aa8bb0`
BLAKE2b-256	`5250b48808cf7839101cf8a3ee6bd87905c9873cf14ce75e8a3c594d2526a319`

See more details on using hashes here.

File details

Details for the file skysearch-0.4.1-py3-none-any.whl.

File metadata

Download URL: skysearch-0.4.1-py3-none-any.whl
Upload date: May 6, 2026
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for skysearch-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6a32f37f299b290c6a06df9eb9f51e30be11ee18552618c3416969b68f73aa6`
MD5	`71453daa7f481826311f158c9a5e7bd4`
BLAKE2b-256	`806f5a17db26e9b93a6df7c3e09cd2affb3cccdaf643cd48a42b7b45c2faef59`

See more details on using hashes here.

skysearch 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SkySearch

快速开始

安装

立即使用

核心设计理念

命令行使用

search_bare 搜索

URL 抓取

搜索 + BM25 排名（不推荐，慢）

Python API

search_bare — 纯搜索

fetch — URL 抓取

search_and_fetch — 搜索 + 抓取一体化

fetch_info / fetch_raw / fetch_links

中文搜索注意

API 参考

search_bare(query, num=10)

fetch(url, mode='info', keep=False, timeout=10, retry=2, headless=True)

fetch_info(url, keep=False, timeout=10, retry=2, headless=True)

fetch_raw(url, keep=False, timeout=10, retry=2, headless=True)

fetch_links(url, timeout=10, retry=2, headless=True)

search_and_fetch(query, num=10, mode='info', verbose=False, keep=False, headless=True)

输出模式说明

技术栈

项目结构

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes