Skip to main content

一个异步的小红书爬虫工具,支持笔记内容和图片的批量下载

Project description

XHS Crawl

小红书内容爬虫工具

功能特点

  • 支持帖子内容抓取
  • 支持图片下载
  • 异步处理
  • 自动重试机制

安装

pip install xhs-crawl

使用说明

命令行工具

安装完成后,你可以直接使用命令行工具下载小红书帖子内容:

xhs-crawl "https://www.xiaohongshu.com/explore/[POST_ID]" -d "./downloads"

参数说明:

  • 第一个参数为小红书帖子URL(必填)
  • -d--dir:指定图片保存目录,默认为 ./downloads

Python代码调用

你也可以在Python代码中调用:

import asyncio
from xhs_crawl import XHSSpider

async def main():
    # 初始化爬虫
    spider = XHSSpider()
    
    try:
        # 获取帖子数据
        url = "https://www.xiaohongshu.com/explore/[POST_ID]"
        post = await spider.get_post_data(url)
        
        if post:
            print(f"标题: {post.title}")
            print(f"内容: {post.content}")
            print(f"发现 {len(post.images)} 张图片")
            
            # 下载图片
            await spider.download_images(post, "./downloads")
    finally:
        # 关闭客户端连接
        await spider.close()

if __name__ == "__main__":
    asyncio.run(main())

返回数据结构

get_post_data 方法返回的 post 对象包含以下属性:

  • post_id: 帖子ID
  • title: 帖子标题
  • content: 帖子正文内容
  • images: 帖子包含的图片URL列表

注意事项

  1. 请确保提供的URL格式正确
  2. 下载目录需要有写入权限
  3. 建议合理控制爬取频率,避免对目标网站造成压力
  4. 该工具仅用于学习和研究目的,请遵守相关法律法规

许可证

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_crawl-0.1.4.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xhs_crawl-0.1.4-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file xhs_crawl-0.1.4.tar.gz.

File metadata

  • Download URL: xhs_crawl-0.1.4.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.4.tar.gz
Algorithm Hash digest
SHA256 923a866a884385b19315b9e13db458727443a3ef0a6215c9e59c2cdee8a9987f
MD5 68d7b62c9cc76bd677e51027f170202e
BLAKE2b-256 38c902d8db74c2b31464fb9b911509c68a03cf4abf33737de40abeef90b9a6d5

See more details on using hashes here.

File details

Details for the file xhs_crawl-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: xhs_crawl-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.9.0 Darwin/21.6.0

File hashes

Hashes for xhs_crawl-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d2c6bfcc036a06acd7d7977747cd05df743ef7d87bdc05f67f57e686135e8d57
MD5 0363ff00f5618ed1191d9559bcfa10e5
BLAKE2b-256 64ef24ea80e1fa2600615f87d1333581cbd1e4abc0d9e00bf5305e2c6127031b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page