Skip to main content

No project description provided

Project description

欢迎来到 bilibili-spiders

codecov

爬取/下载/展示bilibili综合热门/每周必看/入站必刷/排行榜视频数据信息

https://github.com/iiicebearrr/bilibili-spiders/assets/110714291/3ac5859c-3a1b-4048-bff0-96c298caeccd

目录

安装

pip install spiders-for-all # python 版本 >= 3.10

使用方法

命令行

列出内置的爬虫

python -m spiders_for_all list-spiders

运行一个爬虫

通过爬虫名称运行:

python -m spiders_for_all run-spider -n precious

或通过别名:

python -m spiders_for_all run-spider -n 入站必刷

分析爬取的数据

python -m spiders_for_all data-analysis -n precious

通过bvid下载视频

注意: 在下载视频前, 你需要确保你的主机上已安装了ffmpeg, 如果是使用docker方式启动, 则可以忽略这一步

python -m spiders_for_all download-video -b BV1hx411w7MG -s ./videos_dl

指定SESS_DATA下载高清视频

如何获取SESS_DATA

  • 网页登陆bilibili
  • F12打开开发者工具
  • 刷新页面
  • 打开Network选项卡
  • 选中任何一个包含Cookies的请求
  • 复制Request Headers中的Cookie字段中的SESSDATA
python -m spiders_for_all download-video -b BV1hx411w7MG -s ./videos_dl -d {SESS_DATA}

查看帮助

python -m spiders_for_all --help

代码

运行爬虫

from spiders_for_all.bilibili.spiders import PreciousSpider

if __name__ == '__main__':
    spider = PreciousSpider()
    spider.run()

分析爬取的数据

from spiders_for_all.bilibili.analysis import Analysis
from spiders_for_all.bilibili import db

if __name__ == '__main__':
    analysis = Analysis(db.BilibiliPreciousVideos)
    analysis.show()

通过bvid下载视频

from spiders_for_all.bilibili.download import Downloader

if __name__ == '__main__':
    downloader = Downloader(
        bvid='BV1hx411w7MG',
        save_path='./videos_dl',
        sess_data="YOUR_SESS_DATA_HERE"
    )
    downloader.download()

定制你自己的爬虫

from spiders_for_all.core.base import Spider


class CustomSpider(Spider):
    api = "Your api url to request"
    name = "Your spider name"
    alias = "Your spider alias"

    # database model to save all your crawled data
    database_model = YourDatabaseModel  # type: db.Base

    # item model to validate your crawled data
    item_model = YourItemModel  # type: pydantic.BaseModel

    # response model to validate your api response
    response_model = YourResponseModel  # type: pydantic.BaseModel

    def run(self):
        # Your spider logic here.
        # Note: You must implement this method.
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiders_for_all-0.1.1.tar.gz (14.9 kB view hashes)

Uploaded Source

Built Distribution

spiders_for_all-0.1.1-py3-none-any.whl (17.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page