一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载
Project description
XHS Crawl
小红书内容爬虫工具
功能特点
- 支持帖子内容抓取
- 支持图片下载
- 异步处理
- 自动重试机制
安装
pip install xhs-crawl
使用说明
命令行工具
安装完成后,你可以直接使用命令行工具下载小红书帖子内容:
xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"
参数说明:
- 第一个参数为小红书帖子URL(必填)
-d或--dir:指定图片保存目录,默认为./downloads
Python代码调用
你也可以在Python代码中调用:
import asyncio
from xhs_crawl import XHSSpider
async def main():
# 初始化爬虫
spider = XHSSpider()
try:
# 获取帖子数据
url = "https://www.xiaohongshu.com/explore/[POST_ID]"
post = await spider.get_post_data(url)
if post:
print(f"标题: {post.title}")
print(f"内容: {post.content}")
print(f"发现 {len(post.images)} 张图片")
# 下载图片
await spider.download_images(post, "./downloads")
finally:
# 关闭客户端连接
await spider.close()
if __name__ == "__main__":
asyncio.run(main())
返回数据结构
get_post_data 方法返回的 post 对象包含以下属性:
post_id: 帖子IDtitle: 帖子标题content: 帖子正文内容images: 帖子包含的图片URL列表
注意事项
- 请确保提供的URL格式正确
- 下载目录需要有写入权限
- 建议合理控制爬取频率,避免对目标网站造成压力
- 该工具仅用于学习和研究目的,请遵守相关法律法规
许可证
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xhs_crawl-0.1.4.tar.gz
(6.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xhs_crawl-0.1.4.tar.gz.
File metadata
- Download URL: xhs_crawl-0.1.4.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
923a866a884385b19315b9e13db458727443a3ef0a6215c9e59c2cdee8a9987f
|
|
| MD5 |
68d7b62c9cc76bd677e51027f170202e
|
|
| BLAKE2b-256 |
38c902d8db74c2b31464fb9b911509c68a03cf4abf33737de40abeef90b9a6d5
|
File details
Details for the file xhs_crawl-0.1.4-py3-none-any.whl.
File metadata
- Download URL: xhs_crawl-0.1.4-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2c6bfcc036a06acd7d7977747cd05df743ef7d87bdc05f67f57e686135e8d57
|
|
| MD5 |
0363ff00f5618ed1191d9559bcfa10e5
|
|
| BLAKE2b-256 |
64ef24ea80e1fa2600615f87d1333581cbd1e4abc0d9e00bf5305e2c6127031b
|