Skip to main content

轻量、异步、开箱即用的社交媒体聚合解析库

Project description

🔗 ParseHub

社交媒体聚合解析器

PyPI version Python License: MIT GitHub stars

轻量、异步、开箱即用的社交媒体解析与媒体下载库,支持 17+ 平台。

支持平台 · 安装 · 快速开始 · API · 高级用法 · TG Bot


✨ 特性

  • 🌍 广泛的平台支持 — 覆盖国内外 17+ 主流社交媒体平台
  • 🧹 链接清理 — 自动提取分享文案中的链接,并清除可移除的跟踪参数
  • 🎬 多媒体解析 — 支持视频、图文、动图、实况照片和富文本文章
  • 📦 同步 / 异步 API — 同时提供 async/await*_sync 调用方式
  • 🤖 Telegram Bot — 基于本项目的 Bot 已上线 → @ParseHuBot

🌐 支持平台

平台 视频 图文 其他
Twitter / X
Instagram
YouTube 🎵 音乐
Facebook
Threads
Bilibili 📝 动态
抖音
TikTok
微博
小红书
贴吧
微信公众号
快手
酷安
皮皮虾
最右
小黑盒

可通过 ParseHub().get_platforms() 获取当前版本实际注册的平台列表。

📦 安装

# uv (推荐)
uv add parsehub

# pip
pip install parsehub

要求 Python ≥ 3.12

🚀 快速开始

同步解析

from parsehub import ParseHub

ph = ParseHub()
result = ph.parse_sync("https://www.xiaoheihe.cn/app/bbs/link/174972336")

print(result.title)
print(result.raw_url)

异步解析

import asyncio
from parsehub import ParseHub


async def main():
    ph = ParseHub()
    result = await ph.parse("https://tieba.baidu.com/p/9939510114")
    print(result)


asyncio.run(main())

下载媒体

from parsehub import ParseHub

ph = ParseHub()
result = ph.download_sync(
    "https://www.xiaoheihe.cn/app/bbs/link/174972336",
    path="./downloads",
    save_metadata=True,
)

print(result.output_dir)
print(result.media)

需要 Cookie 登录或解析代理时,可以直接在下载时传入解析参数:

from parsehub import ParseHub

ph = ParseHub()
downloaded = ph.download_sync(
    "https://example.com",
    path="./downloads",
    parse_cookie="key1=value1; key2=value2",
    parse_proxy="http://127.0.0.1:7890",
    save_metadata=True,
)

🧩 API 速览

解析

await ph.parse(url, proxy=None, cookie=None)
ph.parse_sync(url, proxy=None, cookie=None)
  • url:分享文案或分享链接,支持自动提取文本中的第一个链接
  • proxy:解析阶段使用的代理
  • cookie:解析阶段使用的 Cookie,支持字符串、JSON 字符串或字典

下载

await ph.download(
    url,
    path=None,
    callback=None,
    callback_args=(),
    callback_kwargs=None,
    proxy=None,
    parse_proxy=None,
    parse_cookie=None,
    save_metadata=False,
)

ph.download_sync(
    url,
    path=None,
    callback=None,
    callback_args=(),
    callback_kwargs=None,
    proxy=None,
    parse_proxy=None,
    parse_cookie=None,
    save_metadata=False,
)
  • path:下载保存目录,默认使用 GlobalConfig.default_save_dir
  • proxy:下载媒体时使用的代理
  • parse_proxy / parse_cookie:下载前解析链接时使用的代理和 Cookie
  • save_metadata:是否在输出目录保存 metadata.json

工具方法

ph.get_platform(url)
ph.get_platforms()
await ph.get_raw_url(url, proxy=None, clean_all=True)
  • get_platform():返回匹配到的平台枚举,未匹配时返回 None
  • get_platforms():返回所有已注册平台的 id、名称和支持类型
  • get_raw_url():获取清理后的原始链接

解析结果

  • result.platform:平台枚举
  • result.type:内容类型,如 videoimagemultimediarichtext
  • result.title:标题
  • result.content:纯文本正文
  • result.raw_url:清理后的原始链接
  • result.media:媒体引用或媒体引用列表
  • result.to_dict():转为可序列化字典
  • result.download() / result.download_sync():下载当前解析结果中的媒体

🔑 高级用法

分享文案与平台识别

url 参数可以直接传分享文案,ParseHub 会自动提取其中的第一个链接:

from parsehub import ParseHub

ph = ParseHub()
text = "复制这条分享 https://tieba.baidu.com/p/9939510114 后打开"

print(ph.get_platform(text))
print(ph.parse_sync(text).raw_url)

Cookie 登录与代理

需要登录态的平台可传 Cookie;解析入口使用 cookie / proxy,下载入口使用 parse_cookie / parse_proxy 作为解析阶段参数。

from parsehub import ParseHub

ph = ParseHub()
result = ph.parse_sync(
    "https://example.com",
    cookie="key1=value1; key2=value2",
    proxy="http://127.0.0.1:7890",
)

Cookie 支持多种格式:

from parsehub import ParseHub

ph = ParseHub()

# Cookie header 字符串
ph.parse_sync("https://example.com", cookie="key1=value1; key2=value2")

# JSON 字符串
ph.parse_sync("https://example.com", cookie='{"key1": "value1", "key2": "value2"}')

# 字典
ph.parse_sync("https://example.com", cookie={"key1": "value1", "key2": "value2"})

当前支持 Cookie 的平台包括:

  • Twitter / X
  • Instagram
  • YouTube
  • Bilibili
  • 抖音
  • TikTok
  • 快手

下载进度回调

from parsehub import ParseHub


class ProgressTracker:
    async def __call__(self, current: int, total: int, unit: str, *args, task_name: str = "", **kwargs):
        print(f"[{task_name}] {current}/{total} ({unit})")


result = ParseHub().download_sync(
    "https://example.com",
    path="./downloads",
    callback=ProgressTracker(),
    callback_args=("extra_arg",),
    callback_kwargs={"task_name": "demo"},
)

unit 可能为:

  • bytes:单文件下载时的字节进度
  • count:多文件下载时的文件数量进度

保存 metadata.json

from parsehub import ParseHub

result = ParseHub().download_sync(
    "https://example.com",
    path="./downloads",
    save_metadata=True,
)

print(result.output_dir / "metadata.json")

全局配置

from pathlib import Path
from parsehub.config import GlobalConfig

GlobalConfig.default_save_dir = Path("./downloads")

错误处理

from parsehub import ParseHub
from parsehub.errors import ParseError, UnknownPlatform

try:
    result = ParseHub().parse_sync("https://example.com")
except UnknownPlatform:
    print("暂不支持该平台")
except ParseError as exc:
    print(f"解析失败: {exc}")

🤝 参考项目

📜 开源协议

本项目基于 MIT License 开源。


如果这个项目对你有帮助,欢迎点个 ⭐ Star!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsehub-2.0.18.tar.gz (77.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsehub-2.0.18-py3-none-any.whl (92.0 kB view details)

Uploaded Python 3

File details

Details for the file parsehub-2.0.18.tar.gz.

File metadata

  • Download URL: parsehub-2.0.18.tar.gz
  • Upload date:
  • Size: 77.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for parsehub-2.0.18.tar.gz
Algorithm Hash digest
SHA256 e26529023d04536f54c26fdd9f88377f8468841a7f5c4323181ab417e8abdaff
MD5 b72a89006a30cf4ed214f255679ec423
BLAKE2b-256 4798f1ff0f3be806c62cb289974dd0c1122c1abee9b4134a564076729a27b3fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsehub-2.0.18.tar.gz:

Publisher: python-publish.yml on z-mio/ParseHub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parsehub-2.0.18-py3-none-any.whl.

File metadata

  • Download URL: parsehub-2.0.18-py3-none-any.whl
  • Upload date:
  • Size: 92.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for parsehub-2.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 2b21af95d94a3543f11a72386180d4aa8e029b62b34b1c4a5df820ad4e98b407
MD5 2b7a802583b228ca719dac10be8b1c12
BLAKE2b-256 723b8f4479e13bb22a0c796d488fbf058f69b60da019e483a8d515823a2d4750

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsehub-2.0.18-py3-none-any.whl:

Publisher: python-publish.yml on z-mio/ParseHub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page