No project description provided
Project description
欢迎来到 bilibili-spiders
爬取/下载/展示bilibili综合热门/每周必看/入站必刷/排行榜视频数据信息
目录
安装
使用pip
git clone https://github.com/iiicebearrr/bilibili-spiders.git
cd bilibili-spiders
# Make sure your python version >= 3.12
python -m virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
使用Docker
git clone https://github.com/iiicebearrr/bilibili-spiders.git
cd bilibili-spiders
docker build -t bilibili-spiders .
docker run --name bs -d bilibili-spiders
现在你可以连接到容器内通过cli执行命令, 或直接在本地执行docker exec bs python -m bilibili list-spiders
使用方法
命令行
列出内置的爬虫
python -m bilibili list-spiders
运行一个爬虫
运行前, 你需要运行python -m bilibili.db
来初始化数据库, 这个操作只需要进行一次
通过爬虫名称运行:
python -m bilibili run-spider -n precious
或通过别名:
python -m bilibili run-spider -n 入站必刷
分析爬取的数据
python -m bilibili data-analysis -n precious
通过bvid下载视频
注意: 在下载视频前, 你需要确保你的主机上已安装了ffmpeg
, 如果是使用docker方式启动, 则可以忽略这一步
python -m bilibili download-video -b BV1hx411w7MG -s ./videos_dl
指定SESS_DATA下载高清视频
如何获取SESS_DATA
- 网页登陆bilibili
- 按
F12
打开开发者工具 - 刷新页面
- 打开
Network
选项卡 - 选中任何一个包含
Cookies
的请求 - 复制
Request Headers
中的Cookie
字段中的SESSDATA
值
python -m bilibili download-video -b BV1hx411w7MG -s ./videos_dl -d {SESS_DATA}
查看帮助
python -m bilibili --help
代码
运行爬虫
from bilibili.spiders import PreciousSpider
if __name__ == '__main__':
spider = PreciousSpider()
spider.run()
分析爬取的数据
from spiders_for_all.bilibili.analysis import Analysis
from spiders_for_all.bilibili import db
if __name__ == '__main__':
analysis = Analysis(db.BilibiliPreciousVideos)
analysis.show()
通过bvid下载视频
from bilibili.download import Downloader
if __name__ == '__main__':
downloader = Downloader(
bvid='BV1hx411w7MG',
save_path='./videos_dl',
sess_data="YOUR_SESS_DATA_HERE"
)
downloader.download()
定制你自己的爬虫
from spiders_for_all.core.base import Spider
class CustomSpider(Spider):
api = "Your api url to request"
name = "Your spider name"
alias = "Your spider alias"
# database model to save all your crawled data
database_model = YourDatabaseModel # type: db.Base
# item model to validate your crawled data
item_model = YourItemModel # type: pydantic.BaseModel
# response model to validate your api response
response_model = YourResponseModel # type: pydantic.BaseModel
def run(self):
# Your spider logic here.
# Note: You must implement this method.
pass
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spiders_for_all-0.1.0.tar.gz
(15.6 kB
view hashes)
Built Distribution
Close
Hashes for spiders_for_all-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdb347f3034eb9eff3cc81cca3d3323c2e9fd95a627b532230c8c991e380f5de |
|
MD5 | 8cd0b3708b1f59d59ba209479e6c579c |
|
BLAKE2b-256 | e2c33f4e742eba4847b9e0fd79a52018d44cae875e965213bdb64f2f3ce93525 |