Skip to main content

Little Red Book notes, home page, detailed page crawler

Project description

Spider_XHS

image

小红书个人主页图片和视频无水印爬取

效果图

image

image

image

image

运行环境

Python环境

NodeJS环境

运行方法:把你想要的id全部放到列表里


# 主页处理

from xhs_spider.home import Home

home = Home()

url_list = [

    'https://www.xiaohongshu.com/user/profile/6185ce66000000001000705b',

    'https://www.xiaohongshu.com/user/profile/6034d6f20000000001006fbb',

]

home.main(url_list)

# 笔记处理

from xhs_spider.note import Note

one_note = OneNote()

url_list = [

    'https://www.xiaohongshu.com/explore/64356527000000001303282b',

]

one_note.main(url_list)

# 搜索结果处理

from xhs_spider.search import Search

search = Search()

query = '你好'

# 搜索的数量(前多少个)

number = 22

search.main(query, number)

日志

  1. 23/08/08 first commit

  2. 23/09/13 【api更改params增加两个字段】修复图片无法下载,有些页面无法访问导致报错。

  3. 23/09/16 【较大视频出现编码问题】修复视频编码问题,加入异常处理。

  4. 23/09/18 代码重构,加入失败重试。

  5. 23/09/19 新增下载搜索结果功能

注意事项

本项目仅供学习与交流,侵权必删

other

  1. 自行将cookies放到目录下cookies.txt中,去设置里的应用程序里找或者网络请求里找,需要哪些可以参考cookie.txt文件。

  2. 可采用以下方法获取cookie,并运行对应文件。

image

image

  1. 欢迎star,不时更新。

  2. 有问题可以加QQ或者微信交流(992822653)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xhs_spider-1.1.0.tar.gz (44.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page