Skip to main content

No project description provided

Project description

Spiders for all

codecov

爬取、下载哔哩哔哩、小红书等网站数据、视频, 持续更新中...

https://github.com/iiicebearrr/spiders-for-all/assets/110714291/d28e67f5-8ff2-4e39-b6de-14434cfb9804

https://github.com/iiicebearrr/spiders-for-all/assets/110714291/4696cd19-0940-451c-9206-c03efe4a65a5

https://github.com/iiicebearrr/spiders-for-all/assets/110714291/53079374-6c28-4b41-9b89-ed6ab5fb40a2

Features

  • bilibli

    • 提供bvid下载视频
    • 提供bvid列表、文件批量下载视频
    • 提供SESS_DATA下载高清视频
    • 内置爬虫一键下载
      • 【综合热门】栏目视频爬取、下载
      • 【每周必看】栏目视频爬取、下载
      • 【入站必刷】栏目视频爬取、下载
      • 【排行榜】各分栏目视频爬取、下载
      • 爬取数据可视化
      • 爬取某作者的所有视频
  • 小红书(Coming soon)

  • GUI(Coming soon)

目录

安装

pip install spiders-for-all # python 版本 >= 3.10

使用方法(bilibili)

命令行

列出内置的爬虫

python -m spiders_for_all list-spiders

运行一个爬虫

通过爬虫名称运行:

python -m spiders_for_all run-spider -n precious

或通过别名:

python -m spiders_for_all run-spider -n 入站必刷

分析爬取的数据

python -m spiders_for_all data-analysis -n precious

通过bvid下载视频

注意: 在下载视频前, 你需要确保你的主机上已安装了ffmpeg, 如果是使用docker方式启动, 则可以忽略这一步

python -m spiders_for_all download-video -b BV1hx411w7MG -s ./videos_dl

多线程下载视频

传入多个bvid:

python -m spiders_for_all download-videos -b BVID1 BVID2 -s ./videos_dl

或传入一个包含bvid列表的文件, 回车换行:

bvid_list.txt:

BVID1
BVID2
...
python -m spiders_for_all download-videos -b bvid_list.txt -s ./videos_dl

指定SESS_DATA下载高清视频

如何获取SESS_DATA

  • 网页登陆bilibili
  • F12打开开发者工具
  • 刷新页面
  • 打开Network选项卡
  • 选中任何一个包含Cookies的请求
  • 复制Request Headers中的Cookie字段中的SESSDATA
python -m spiders_for_all download-video -b BV1hx411w7MG -s ./videos_dl -d {SESS_DATA}

查看帮助

python -m spiders_for_all --help

代码

运行爬虫

from spiders_for_all.bilibili.spiders import PreciousSpider

if __name__ == '__main__':
    spider = PreciousSpider()
    spider.run()

分析爬取的数据

from spiders_for_all.bilibili.analysis import Analysis
from spiders_for_all.bilibili import db

if __name__ == '__main__':
    analysis = Analysis(db.BilibiliPreciousVideos)
    analysis.show()

通过bvid下载视频

from spiders_for_all.bilibili.download import Downloader

if __name__ == '__main__':
    downloader = Downloader(
        bvid='BV1hx411w7MG',
        save_dir='./videos_dl',
        sess_data="YOUR_SESS_DATA_HERE"
    )
    downloader.download()

内置爬虫

通过list-spiders列出内置的爬虫:

python -m spiders_for_all list-spiders

备注: 包含参数的爬虫:

  • 每周必看(spiders_for_all.bilibili.spiders.WeeklySpider):

    • 不传参数的情况下默认爬取最新一期的视频
    • 通过-p week {week}指定爬取第几期的视频, 比如-p week 1表示爬取第一期的视频

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiders_for_all-0.2.1.tar.gz (20.1 kB view hashes)

Uploaded Source

Built Distribution

spiders_for_all-0.2.1-py3-none-any.whl (22.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page