一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载
Project description
XHS Crawl
小红书内容爬虫工具
功能特点
- 支持帖子内容抓取
- 支持图片下载
- 异步处理
- 自动重试机制
安装
pip install xhs-crawl
使用说明
命令行工具
安装完成后,你可以直接使用命令行工具下载小红书帖子内容:
xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"
参数说明:
- 第一个参数为小红书帖子URL(必填)
-d或--dir:指定图片保存目录,默认为./downloads
Python代码调用
你也可以在Python代码中调用:
import asyncio
from xhs_crawl import XHSSpider
async def main():
# 初始化爬虫
spider = XHSSpider()
try:
# 获取帖子数据
url = "https://www.xiaohongshu.com/explore/[POST_ID]"
post = await spider.get_post_data(url)
if post:
print(f"标题: {post.title}")
print(f"内容: {post.content}")
print(f"发现 {len(post.images)} 张图片")
# 下载图片
await spider.download_images(post, "./downloads")
finally:
# 关闭客户端连接
await spider.close()
if __name__ == "__main__":
asyncio.run(main())
返回数据结构
get_post_data 方法返回的 post 对象包含以下属性:
post_id: 帖子IDtitle: 帖子标题content: 帖子正文内容images: 帖子包含的图片URL列表
注意事项
- 请确保提供的URL格式正确
- 下载目录需要有写入权限
- 建议合理控制爬取频率,避免对目标网站造成压力
- 该工具仅用于学习和研究目的,请遵守相关法律法规
许可证
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xhs_crawl-0.1.5.tar.gz
(6.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xhs_crawl-0.1.5.tar.gz.
File metadata
- Download URL: xhs_crawl-0.1.5.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bbcee3f9e1ce8111224cc69c4bb0ed6658b2071c6bcfd84fc321f67e07a7daa
|
|
| MD5 |
f0c4fe1b8411f14d95ef5d7be7d42aca
|
|
| BLAKE2b-256 |
5c0ed9a4a0e0b1cf0e3001f6d9604d331c7c00dc14881021297d87fad94ce96c
|
File details
Details for the file xhs_crawl-0.1.5-py3-none-any.whl.
File metadata
- Download URL: xhs_crawl-0.1.5-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa411c6d282ca106fcb078b34d9ea6fe3f5471aced8c53bcd95a63972e6532ce
|
|
| MD5 |
9654ec9c52b20d76fb5267c2435780a6
|
|
| BLAKE2b-256 |
52dd5e1c74fb8d27577449a7f20c4ef7e0472aa246019da3760c4d5d8d6ca72e
|