一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载
Project description
XHS Crawl
小红书内容爬虫工具
功能特点
- 支持帖子内容抓取
- 支持图片下载
- 异步处理
- 自动重试机制
安装
pip install xhs-crawl
使用说明
命令行工具
安装完成后,你可以直接使用命令行工具下载小红书帖子内容:
xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"
参数说明:
- 第一个参数为小红书帖子URL(必填)
-d或--dir:指定图片保存目录,默认为./downloads
Python代码调用
你也可以在Python代码中调用:
import asyncio
from xhs_crawl import XHSSpider
async def main():
# 初始化爬虫
spider = XHSSpider()
try:
# 获取帖子数据
url = "https://www.xiaohongshu.com/explore/[POST_ID]"
post = await spider.get_post_data(url)
if post:
print(f"标题: {post.title}")
print(f"内容: {post.content}")
print(f"发现 {len(post.images)} 张图片")
# 下载图片
await spider.download_images(post, "./downloads")
finally:
# 关闭客户端连接
await spider.close()
if __name__ == "__main__":
asyncio.run(main())
返回数据结构
get_post_data 方法返回的 post 对象包含以下属性:
post_id: 帖子IDtitle: 帖子标题content: 帖子正文内容images: 帖子包含的图片URL列表
注意事项
- 请确保提供的URL格式正确
- 下载目录需要有写入权限
- 建议合理控制爬取频率,避免对目标网站造成压力
- 该工具仅用于学习和研究目的,请遵守相关法律法规
许可证
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xhs_crawl-0.1.2.tar.gz
(5.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xhs_crawl-0.1.2.tar.gz.
File metadata
- Download URL: xhs_crawl-0.1.2.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9929c16916a02ab01c01a0b59b4c687dc2db683202c5ffd544c3e2b05cc2aec3
|
|
| MD5 |
1de0731911c61139ed3b992901ecfbd2
|
|
| BLAKE2b-256 |
28306c2cc2c759f9ce0613ef9c1f8436ecb8d35bca4fe3e81809c5424e543d19
|
File details
Details for the file xhs_crawl-0.1.2-py3-none-any.whl.
File metadata
- Download URL: xhs_crawl-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b32b757a9f65acc9881c7dd6271a771f3bb5445e66a036860e053152460ed52f
|
|
| MD5 |
55ee481b565b5b8ed036caf70606f828
|
|
| BLAKE2b-256 |
62e871fc9f810447bda9d2d2c9ea4ee6e686e12b18cefb38ad2be618498eab37
|