Skip to main content

一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载

Project description

XHS Crawl

小红书内容爬虫工具

功能特点

  • 支持帖子内容抓取
  • 支持图片下载
  • 异步处理
  • 自动重试机制

安装

pip install xhs-crawl

使用说明

命令行工具

安装完成后,你可以直接使用命令行工具下载小红书帖子内容:

xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"

参数说明:

  • 第一个参数为小红书帖子URL(必填)
  • -d--dir:指定图片保存目录,默认为 ./downloads

Python代码调用

你也可以在Python代码中调用:

import asyncio
from xhs_crawl import XHSSpider

async def main():
    # 初始化爬虫
    spider = XHSSpider()
    
    try:
        # 获取帖子数据
        url = "https://www.xiaohongshu.com/explore/[POST_ID]"
        post = await spider.get_post_data(url)
        
        if post:
            print(f"标题: {post.title}")
            print(f"内容: {post.content}")
            print(f"发现 {len(post.images)} 张图片")
            
            # 下载图片
            await spider.download_images(post, "./downloads")
    finally:
        # 关闭客户端连接
        await spider.close()

if __name__ == "__main__":
    asyncio.run(main())

返回数据结构

get_post_data 方法返回的 post 对象包含以下属性:

  • post_id: 帖子ID
  • title: 帖子标题
  • content: 帖子正文内容
  • images: 帖子包含的图片URL列表

注意事项

  1. 请确保提供的URL格式正确
  2. 下载目录需要有写入权限
  3. 建议合理控制爬取频率,避免对目标网站造成压力
  4. 该工具仅用于学习和研究目的,请遵守相关法律法规

许可证

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_crawl-0.1.2.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xhs_crawl-0.1.2-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file xhs_crawl-0.1.2.tar.gz.

File metadata

  • Download URL: xhs_crawl-0.1.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9929c16916a02ab01c01a0b59b4c687dc2db683202c5ffd544c3e2b05cc2aec3
MD5 1de0731911c61139ed3b992901ecfbd2
BLAKE2b-256 28306c2cc2c759f9ce0613ef9c1f8436ecb8d35bca4fe3e81809c5424e543d19

See more details on using hashes here.

File details

Details for the file xhs_crawl-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: xhs_crawl-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b32b757a9f65acc9881c7dd6271a771f3bb5445e66a036860e053152460ed52f
MD5 55ee481b565b5b8ed036caf70606f828
BLAKE2b-256 62e871fc9f810447bda9d2d2c9ea4ee6e686e12b18cefb38ad2be618498eab37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page