Skip to main content

一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载

Project description

XHS Crawl

小红书内容爬虫工具

功能特点

  • 支持帖子内容抓取
  • 支持图片下载
  • 异步处理
  • 自动重试机制

安装

pip install xhs-crawl

使用说明

命令行工具

安装完成后,你可以直接使用命令行工具下载小红书帖子内容:

xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"

参数说明:

  • 第一个参数为小红书帖子URL(必填)
  • -d--dir:指定图片保存目录,默认为 ./downloads

Python代码调用

你也可以在Python代码中调用:

import asyncio
from xhs_crawl import XHSSpider

async def main():
    # 初始化爬虫
    spider = XHSSpider()
    
    try:
        # 获取帖子数据
        url = "https://www.xiaohongshu.com/explore/[POST_ID]"
        post = await spider.get_post_data(url)
        
        if post:
            print(f"标题: {post.title}")
            print(f"内容: {post.content}")
            print(f"发现 {len(post.images)} 张图片")
            
            # 下载图片
            await spider.download_images(post, "./downloads")
    finally:
        # 关闭客户端连接
        await spider.close()

if __name__ == "__main__":
    asyncio.run(main())

返回数据结构

get_post_data 方法返回的 post 对象包含以下属性:

  • post_id: 帖子ID
  • title: 帖子标题
  • content: 帖子正文内容
  • images: 帖子包含的图片URL列表

注意事项

  1. 请确保提供的URL格式正确
  2. 下载目录需要有写入权限
  3. 建议合理控制爬取频率,避免对目标网站造成压力
  4. 该工具仅用于学习和研究目的,请遵守相关法律法规

许可证

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_crawl-0.1.1.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xhs_crawl-0.1.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file xhs_crawl-0.1.1.tar.gz.

File metadata

  • Download URL: xhs_crawl-0.1.1.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b52b7d5f72aa0af234c9bc035ae59512bcecf43e95f578fb7a97486dd2beafe1
MD5 5dbc9813e31cb32891794fc17b5b3751
BLAKE2b-256 cbb8122f26b9c9c2247909027736587948f432a2379ef9c524e31d842662c17b

See more details on using hashes here.

File details

Details for the file xhs_crawl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: xhs_crawl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 19ace805192a32184342f57f05630b1e5dff8043b19eca0c377b1e8b91047d9d
MD5 314834634ebb392f6bc832a2fc7498f9
BLAKE2b-256 c0378e2dc9a2bbb5ad79c1f113e3985b65626087088bffafe2009b8b6249ab38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page