轻量、异步、开箱即用的社交媒体聚合解析库
Project description
✨ 特性
- 🌍 广泛的平台支持 — 覆盖国内外 17+ 主流社交媒体平台
- 🧹 链接清理 — 自动提取分享文案中的链接,并清除可移除的跟踪参数
- 🎬 多媒体解析 — 支持视频、图文、动图、实况照片和富文本文章
- 📦 同步 / 异步 API — 同时提供
async/await与*_sync调用方式 - 🤖 Telegram Bot — 基于本项目的 Bot 已上线 → @ParseHuBot
🌐 支持平台
| 平台 | 视频 | 图文 | 其他 |
|---|---|---|---|
| Twitter / X | ✅ | ✅ | |
| ✅ | ✅ | ||
| YouTube | ✅ | 🎵 音乐 | |
| ✅ | |||
| Threads | ✅ | ✅ | |
| Bilibili | ✅ | 📝 动态 | |
| 抖音 | ✅ | ✅ | |
| TikTok | ✅ | ✅ | |
| 微博 | ✅ | ✅ | |
| 小红书 | ✅ | ✅ | |
| 贴吧 | ✅ | ✅ | |
| 微信公众号 | ✅ | ||
| 快手 | ✅ | ||
| 酷安 | ✅ | ||
| 皮皮虾 | ✅ | ✅ | |
| 最右 | ✅ | ✅ | |
| 小黑盒 | ✅ | ✅ |
可通过
ParseHub().get_platforms()获取当前版本实际注册的平台列表。
📦 安装
# uv (推荐)
uv add parsehub
# pip
pip install parsehub
要求 Python ≥ 3.12
🚀 快速开始
同步解析
from parsehub import ParseHub
ph = ParseHub()
result = ph.parse_sync("https://www.xiaoheihe.cn/app/bbs/link/174972336")
print(result.title)
print(result.raw_url)
异步解析
import asyncio
from parsehub import ParseHub
async def main():
ph = ParseHub()
result = await ph.parse("https://tieba.baidu.com/p/9939510114")
print(result)
asyncio.run(main())
下载媒体
from parsehub import ParseHub
ph = ParseHub()
result = ph.download_sync(
"https://www.xiaoheihe.cn/app/bbs/link/174972336",
path="./downloads",
save_metadata=True,
)
print(result.output_dir)
print(result.media)
需要 Cookie 登录或解析代理时,可以直接在下载时传入解析参数:
from parsehub import ParseHub
ph = ParseHub()
downloaded = ph.download_sync(
"https://example.com",
path="./downloads",
parse_cookie="key1=value1; key2=value2",
parse_proxy="http://127.0.0.1:7890",
save_metadata=True,
)
🧩 API 速览
解析
await ph.parse(url, proxy=None, cookie=None)
ph.parse_sync(url, proxy=None, cookie=None)
url:分享文案或分享链接,支持自动提取文本中的第一个链接proxy:解析阶段使用的代理cookie:解析阶段使用的 Cookie,支持字符串、JSON 字符串或字典
下载
await ph.download(
url,
path=None,
callback=None,
callback_args=(),
callback_kwargs=None,
proxy=None,
parse_proxy=None,
parse_cookie=None,
save_metadata=False,
)
ph.download_sync(
url,
path=None,
callback=None,
callback_args=(),
callback_kwargs=None,
proxy=None,
parse_proxy=None,
parse_cookie=None,
save_metadata=False,
)
path:下载保存目录,默认使用GlobalConfig.default_save_dirproxy:下载媒体时使用的代理parse_proxy/parse_cookie:下载前解析链接时使用的代理和 Cookiesave_metadata:是否在输出目录保存metadata.json
工具方法
ph.get_platform(url)
ph.get_platforms()
await ph.get_raw_url(url, proxy=None, clean_all=True)
get_platform():返回匹配到的平台枚举,未匹配时返回Noneget_platforms():返回所有已注册平台的id、名称和支持类型get_raw_url():获取清理后的原始链接
解析结果
result.platform:平台枚举result.type:内容类型,如video、image、multimedia、richtextresult.title:标题result.content:纯文本正文result.raw_url:清理后的原始链接result.media:媒体引用或媒体引用列表result.to_dict():转为可序列化字典result.download()/result.download_sync():下载当前解析结果中的媒体
🔑 高级用法
分享文案与平台识别
url 参数可以直接传分享文案,ParseHub 会自动提取其中的第一个链接:
from parsehub import ParseHub
ph = ParseHub()
text = "复制这条分享 https://tieba.baidu.com/p/9939510114 后打开"
print(ph.get_platform(text))
print(ph.parse_sync(text).raw_url)
Cookie 登录与代理
需要登录态的平台可传 Cookie;解析入口使用 cookie / proxy,下载入口使用 parse_cookie / parse_proxy 作为解析阶段参数。
from parsehub import ParseHub
ph = ParseHub()
result = ph.parse_sync(
"https://example.com",
cookie="key1=value1; key2=value2",
proxy="http://127.0.0.1:7890",
)
Cookie 支持多种格式:
from parsehub import ParseHub
ph = ParseHub()
# Cookie header 字符串
ph.parse_sync("https://example.com", cookie="key1=value1; key2=value2")
# JSON 字符串
ph.parse_sync("https://example.com", cookie='{"key1": "value1", "key2": "value2"}')
# 字典
ph.parse_sync("https://example.com", cookie={"key1": "value1", "key2": "value2"})
当前支持 Cookie 的平台包括:
Twitter / XInstagramYouTubeBilibili抖音TikTok快手
下载进度回调
from parsehub import ParseHub
class ProgressTracker:
async def __call__(self, current: int, total: int, unit: str, *args, task_name: str = "", **kwargs):
print(f"[{task_name}] {current}/{total} ({unit})")
result = ParseHub().download_sync(
"https://example.com",
path="./downloads",
callback=ProgressTracker(),
callback_args=("extra_arg",),
callback_kwargs={"task_name": "demo"},
)
unit 可能为:
bytes:单文件下载时的字节进度count:多文件下载时的文件数量进度
保存 metadata.json
from parsehub import ParseHub
result = ParseHub().download_sync(
"https://example.com",
path="./downloads",
save_metadata=True,
)
print(result.output_dir / "metadata.json")
全局配置
from pathlib import Path
from parsehub.config import GlobalConfig
GlobalConfig.default_save_dir = Path("./downloads")
错误处理
from parsehub import ParseHub
from parsehub.errors import ParseError, UnknownPlatform
try:
result = ParseHub().parse_sync("https://example.com")
except UnknownPlatform:
print("暂不支持该平台")
except ParseError as exc:
print(f"解析失败: {exc}")
🤝 参考项目
- Evil0ctal/Douyin_TikTok_Download_API
- yt-dlp/yt-dlp
- instaloader/instaloader
- SocialSisterYi/bilibili-API-collect
- Nemo2011/bilibili-api
📜 开源协议
本项目基于 MIT License 开源。
如果这个项目对你有帮助,欢迎点个 ⭐ Star!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsehub-2.0.18.tar.gz.
File metadata
- Download URL: parsehub-2.0.18.tar.gz
- Upload date:
- Size: 77.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e26529023d04536f54c26fdd9f88377f8468841a7f5c4323181ab417e8abdaff
|
|
| MD5 |
b72a89006a30cf4ed214f255679ec423
|
|
| BLAKE2b-256 |
4798f1ff0f3be806c62cb289974dd0c1122c1abee9b4134a564076729a27b3fb
|
Provenance
The following attestation bundles were made for parsehub-2.0.18.tar.gz:
Publisher:
python-publish.yml on z-mio/ParseHub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parsehub-2.0.18.tar.gz -
Subject digest:
e26529023d04536f54c26fdd9f88377f8468841a7f5c4323181ab417e8abdaff - Sigstore transparency entry: 1517284853
- Sigstore integration time:
-
Permalink:
z-mio/ParseHub@b7f444288d10cddfa1d2e0f26028883c933f1b45 -
Branch / Tag:
refs/tags/v2.0.18 - Owner: https://github.com/z-mio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b7f444288d10cddfa1d2e0f26028883c933f1b45 -
Trigger Event:
release
-
Statement type:
File details
Details for the file parsehub-2.0.18-py3-none-any.whl.
File metadata
- Download URL: parsehub-2.0.18-py3-none-any.whl
- Upload date:
- Size: 92.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b21af95d94a3543f11a72386180d4aa8e029b62b34b1c4a5df820ad4e98b407
|
|
| MD5 |
2b7a802583b228ca719dac10be8b1c12
|
|
| BLAKE2b-256 |
723b8f4479e13bb22a0c796d488fbf058f69b60da019e483a8d515823a2d4750
|
Provenance
The following attestation bundles were made for parsehub-2.0.18-py3-none-any.whl:
Publisher:
python-publish.yml on z-mio/ParseHub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parsehub-2.0.18-py3-none-any.whl -
Subject digest:
2b21af95d94a3543f11a72386180d4aa8e029b62b34b1c4a5df820ad4e98b407 - Sigstore transparency entry: 1517285090
- Sigstore integration time:
-
Permalink:
z-mio/ParseHub@b7f444288d10cddfa1d2e0f26028883c933f1b45 -
Branch / Tag:
refs/tags/v2.0.18 - Owner: https://github.com/z-mio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b7f444288d10cddfa1d2e0f26028883c933f1b45 -
Trigger Event:
release
-
Statement type: