轻量、异步、开箱即用的社交媒体聚合解析库
Project description
✨ 特性
- 🌍 广泛的平台支持 — 覆盖国内外 17+ 主流社交媒体平台
- 🧹 链接清理 — 自动提取分享文案中的链接,并清除可移除的跟踪参数
- 🎬 多媒体解析 — 支持视频, 图文, 动图, 实况照片和富文本文章
- 📦 同步 / 异步 API — 同时提供
async/await与*_sync调用方式 - 🐚 CLI 支持 — 命令行原生支持,轻量开箱即用
- 🤖 Telegram Bot — 基于本项目的 Bot 已上线 → @ParseHuBot
🌐 支持平台
| 平台 | 视频 | 图文 | 其他 |
|---|---|---|---|
| Twitter / X | ✅ | ✅ | |
| ✅ | ✅ | ||
| YouTube | ✅ | 🎵 音乐 | |
| ✅ | |||
| Threads | ✅ | ✅ | |
| Bilibili | ✅ | 📝 动态 | |
| 抖音 | ✅ | ✅ | |
| TikTok | ✅ | ✅ | |
| 微博 | ✅ | ✅ | |
| 小红书 | ✅ | ✅ | |
| 贴吧 | ✅ | ✅ | |
| 微信公众号 | ✅ | ||
| 快手 | ✅ | ||
| 酷安 | ✅ | ||
| 皮皮虾 | ✅ | ✅ | |
| 最右 | ✅ | ✅ | |
| 小黑盒 | ✅ | ✅ |
📦 安装
CLI 安装
pipx install "parsehub[cli]"
ph -v
Python 库安装
# uv
uv add parsehub
# pip
pip install parsehub
# 需要完整 CLI 能力时,可安装 `cli` 扩展
uv add "parsehub[cli]"
pip install "parsehub[cli]"
🚀 快速开始
CLI
解析链接或分享文案
parsehub "https://example.com/post/1"
# 短命令等价写法
ph "https://example.com/post/1"
下载媒体
ph d "https://example.com/post/1"
常用命令
| 命令 | 说明 |
|---|---|
ph ls |
查看支持的平台 |
ph set proxy <platform> <proxy> |
设置解析代理和下载代理 |
ph set proxy <platform> <proxy> --for download |
只设置下载代理 |
ph set cookie <platform> |
保存平台 Cookie |
ph set list |
查看配置列表 |
ph set show <platform> |
查看平台配置 |
配置会自动按平台应用到后续解析和下载; 临时覆盖时仍可直接传参数:
ph "https://example.com/post/1" --proxy http://127.0.0.1:7890
ph d "https://example.com/post/1" --parse-proxy http://127.0.0.1:7890 --cookie "key=value"
Python API
同步解析
from parsehub import ParseHub
ph = ParseHub()
result = ph.parse_sync("https://www.xiaoheihe.cn/app/bbs/link/174972336")
print(result)
dr = result.download_sync()
print(dr)
异步解析
import asyncio
from parsehub import ParseHub
async def main():
ph = ParseHub()
result = await ph.parse("https://tieba.baidu.com/p/9939510114")
print(result)
dr = await result.download()
print(dr)
asyncio.run(main())
下载媒体
from parsehub import ParseHub
ph = ParseHub()
result = ph.download_sync("https://www.xiaoheihe.cn/app/bbs/link/174972336")
print(result)
🔑 高级用法
Cookie 登录与代理
需要登录态的平台可传 Cookie, 解析入口使用 cookie / proxy, 下载入口使用 parse_cookie / parse_proxy 作为解析阶段参数
当前支持 Cookie 的平台:
Twitter / XInstagramYouTubeBilibili抖音TikTok快手小红书
from parsehub import ParseHub
ph = ParseHub()
result = ph.parse_sync(
"https://example.com",
cookie="key1=value1; key2=value2",
proxy="http://127.0.0.1:7890",
)
Cookie 支持多种格式:
# Cookie header 字符串
ph.parse_sync("https://example.com", cookie="key1=value1; key2=value2")
# JSON 字符串
ph.parse_sync("https://example.com", cookie='{"key1": "value1", "key2": "value2"}')
# 字典
ph.parse_sync("https://example.com", cookie={"key1": "value1", "key2": "value2"})
下载进度回调
from parsehub import ParseHub
from parsehub.types import ProgressUnit
class ProgressTracker:
async def __call__(self, current: int, total: int, unit: ProgressUnit, *args, task_name: str = "", **kwargs):
print(f"[{task_name}] {current}/{total} ({unit})")
result = ParseHub().download_sync(
"https://example.com",
path="./downloads",
callback=ProgressTracker(),
callback_args=("extra_arg",),
callback_kwargs={"task_name": "demo"},
)
unit 值:
bytes: 单文件下载时的字节进度count: 多文件下载时的文件数量进度
保存 metadata.json
from parsehub import ParseHub
result = ParseHub().download_sync(
"https://example.com",
path="./downloads",
save_metadata=True,
)
print(result.output_dir / "metadata.json")
全局配置
from pathlib import Path
from parsehub.config import GlobalConfig
GlobalConfig.default_save_dir = Path("./downloads")
错误处理
from parsehub import ParseHub
from parsehub.errors import ParseError, UnknownPlatform
try:
result = ParseHub().parse_sync("https://example.com")
except UnknownPlatform:
print("暂不支持该平台")
except ParseError as exc:
print(f"解析失败: {exc}")
🤝 参考项目
- Evil0ctal/Douyin_TikTok_Download_API
- yt-dlp/yt-dlp
- instaloader/instaloader
- SocialSisterYi/bilibili-API-collect
- Nemo2011/bilibili-api
📜 开源协议
本项目基于 MIT License 开源。
如果这个项目对你有帮助,欢迎点个 ⭐ Star!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsehub-2.0.23.tar.gz.
File metadata
- Download URL: parsehub-2.0.23.tar.gz
- Upload date:
- Size: 85.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a0c0b5b489af808d836b2e60615e8806d03ed88930af8f7ae23d21a4d582086
|
|
| MD5 |
0e35990aac22fe433c234b7a9f39b62e
|
|
| BLAKE2b-256 |
9d18ee5c106b92fd927ba44618d7cebb19b1bb122fd57adbe871a9d6434c2dd4
|
Provenance
The following attestation bundles were made for parsehub-2.0.23.tar.gz:
Publisher:
python-publish.yml on z-mio/ParseHub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parsehub-2.0.23.tar.gz -
Subject digest:
9a0c0b5b489af808d836b2e60615e8806d03ed88930af8f7ae23d21a4d582086 - Sigstore transparency entry: 1560101404
- Sigstore integration time:
-
Permalink:
z-mio/ParseHub@7ad41fc232b6e542865bdeb3b69f65dd615e20f5 -
Branch / Tag:
refs/tags/v2.0.23 - Owner: https://github.com/z-mio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7ad41fc232b6e542865bdeb3b69f65dd615e20f5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file parsehub-2.0.23-py3-none-any.whl.
File metadata
- Download URL: parsehub-2.0.23-py3-none-any.whl
- Upload date:
- Size: 98.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59ad036a98ed81bc534464d47a06cce7e47fb462969bb59a6abbb05d5061bd8e
|
|
| MD5 |
0b2e43c1c9d639c3f7148022428eb520
|
|
| BLAKE2b-256 |
9e156e77b0d294646038860046aa4370d52de2fc716f5d759b8cffd1aa6da24b
|
Provenance
The following attestation bundles were made for parsehub-2.0.23-py3-none-any.whl:
Publisher:
python-publish.yml on z-mio/ParseHub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parsehub-2.0.23-py3-none-any.whl -
Subject digest:
59ad036a98ed81bc534464d47a06cce7e47fb462969bb59a6abbb05d5061bd8e - Sigstore transparency entry: 1560101577
- Sigstore integration time:
-
Permalink:
z-mio/ParseHub@7ad41fc232b6e542865bdeb3b69f65dd615e20f5 -
Branch / Tag:
refs/tags/v2.0.23 - Owner: https://github.com/z-mio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7ad41fc232b6e542865bdeb3b69f65dd615e20f5 -
Trigger Event:
release
-
Statement type: