Skip to main content

一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载

Project description

XHS Crawl

小红书内容爬虫工具

功能特点

  • 支持帖子内容抓取
  • 支持图片下载
  • 异步处理
  • 自动重试机制

安装

pip install xhs-crawl

使用说明

命令行工具

安装完成后,你可以直接使用命令行工具下载小红书帖子内容:

xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"

参数说明:

  • 第一个参数为小红书帖子URL(必填)
  • -d--dir:指定图片保存目录,默认为 ./downloads

Python代码调用

你也可以在Python代码中调用:

import asyncio
from xhs_crawl import XHSSpider

async def main():
    # 初始化爬虫
    spider = XHSSpider()
    
    try:
        # 获取帖子数据
        url = "https://www.xiaohongshu.com/explore/[POST_ID]"
        post = await spider.get_post_data(url)
        
        if post:
            print(f"标题: {post.title}")
            print(f"内容: {post.content}")
            print(f"发现 {len(post.images)} 张图片")
            
            # 下载图片
            await spider.download_images(post, "./downloads")
    finally:
        # 关闭客户端连接
        await spider.close()

if __name__ == "__main__":
    asyncio.run(main())

返回数据结构

get_post_data 方法返回的 post 对象包含以下属性:

  • post_id: 帖子ID
  • title: 帖子标题
  • content: 帖子正文内容
  • images: 帖子包含的图片URL列表

注意事项

  1. 请确保提供的URL格式正确
  2. 下载目录需要有写入权限
  3. 建议合理控制爬取频率,避免对目标网站造成压力
  4. 该工具仅用于学习和研究目的,请遵守相关法律法规

许可证

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_crawl-0.1.5.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xhs_crawl-0.1.5-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file xhs_crawl-0.1.5.tar.gz.

File metadata

  • Download URL: xhs_crawl-0.1.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.5.tar.gz
Algorithm Hash digest
SHA256 0bbcee3f9e1ce8111224cc69c4bb0ed6658b2071c6bcfd84fc321f67e07a7daa
MD5 f0c4fe1b8411f14d95ef5d7be7d42aca
BLAKE2b-256 5c0ed9a4a0e0b1cf0e3001f6d9604d331c7c00dc14881021297d87fad94ce96c

See more details on using hashes here.

File details

Details for the file xhs_crawl-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: xhs_crawl-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 fa411c6d282ca106fcb078b34d9ea6fe3f5471aced8c53bcd95a63972e6532ce
MD5 9654ec9c52b20d76fb5267c2435780a6
BLAKE2b-256 52dd5e1c74fb8d27577449a7f20c4ef7e0472aa246019da3760c4d5d8d6ca72e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page