Craw light novel from [哔哩轻小说(linovelib)](https://w.linovelib.com/) and convert to epub.
Project description
linovelib2epub
Crawl light novel from 哔哩轻小说(linovelib) and convert to epub.
preview
A picture is worth a thousand words. Talk is cheap, show me the real effect.
This demo use this screen recorder tool to record.
Features
- flexible
has_illustration
anddivide_volume
option for epub output - support download a certain volume of a novel
- built-in http request retry mechanism to improve network fault tolerance
- built-in random browser user_agent through fake_useragent library
- built-in strict integrity check about image download
- built-in mechanism for saving temporary book data by pickle library
- use multi-process to download images
- support add custom css style to epub
Supported Websites (plan)
序号 | 网站名称 | 语言 | 爬虫难度 | 支持进度 | 备注 |
---|---|---|---|---|---|
1 | 哔哩轻小说(Mobile) | 简/繁 | 中😰 | :ok: | 默认选项。 |
2 | 哔哩轻小说(Web) | 简/繁 | 中😰 | 🚫 | 资源同Mobile,没必要。 |
3 | 轻之国度 | 简/繁 | 高🤣👿 | 🚫 | 需要登录,轻币门槛,导航分类混乱。 |
4 | 无限轻小说 | 繁 | 中😰 | ? | 不用登录。一章多页。 |
5 | 轻小说文库 | 简/繁 | 中😰 | ? | 需要登录。一章一页。 |
6 | 轻小说百科 | 简/繁 | 低😆 | ? | 不用登录,一章一页。遗憾的是插图清晰度低。 |
7 | 真白萌 | 简/繁 | 中😰 | ? | 需要登录,一章一页。 |
爬虫友好度有两个重要指标:
- 1.访问门槛。是否需要登陆以及积分。
- 2.页面结构。一个章节多页渲染的视为中等难度。
如果你发现其他的很好轻小说目标源,资源丰富,更新及时,插图清晰,并且爬虫门槛合理的,可以在issue发起补充。
代码实现中对其他轻小说源的支持,关键是继承并重写这个 BaseNovelWebsiteSpider
类。
Usage
install from source
- clone this repo
git clone https://github.com/lightnovel-center/linovelib2epub.git
- set up a clean local python venv
See also: creating-virtual-environments
replace py
with your real python command if needed. e.g. python
or python3
.
# new a venv
py -m venv venv
# activate venv
.\venv\Scripts\activate
# install dependencies
py -m pip install -r requirements.txt
# install this package in local
# under project root folder: linovelib2epub/
python -m pip install -e .
- Now you can use this package as a pypi remote package.
from linovelib2epub.linovel import Linovelib2Epub
# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=3279)
linovelib_epub.run()
install from pypi
- Install this package from pypi:
pip install linovelib2epub
Or update to the latest version:
pip install linovelib2epub --upgrade
- create a python file and edit the content as follows:
from linovelib2epub.linovel import Linovelib2Epub
# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=3279)
linovelib_epub.run()
If it finished without errors, you can see the epub file is under the folder where your python file is located.
Options
Parameters | type | required | default | description |
---|---|---|---|---|
book_id | number | YES | None | 书籍ID。 |
base_url | string | NO | 'https://w.linovelib.com/novel' | 哔哩轻小说主页URL |
divide_volume | boolean | NO | False | 是否分卷 |
select_volume_mode | boolean | NO | False | 选择卷模式,它为True时 divide_volume 强制为True。 |
has_illustration | boolean | NO | True | 是否下载插图 |
image_download_folder | string | NO | "images" | 图片下载临时文件夹. 不允许以相对路径../开头。 |
pickle_temp_folder | string | NO | "pickle" | pickle临时数据保存的文件夹。 |
http_timeout | number | NO | 10 | 一个HTTP请求的超时等待时间(秒)。代表connect和read timeout。 |
http_retries | number | NO | 5 | 当一个HTTP请求失败后,重试的最大次数。 |
http_cookie | string | NO | '' | 自定义HTTP cookie。 |
custom_style_cover | string | NO | '' | 自定义cover.xhtml的样式 |
custom_style_nav | string | NO | '' | 自定义nav.xhtml的样式 |
custom_style_chapter | string | NO | '' | 自定义每章(?.xhtml)的样式 |
disable_proxy | boolean | NO | True | 是否禁用所在的代理环境,默认禁用 |
Todo
- quality: setup pytest and codecov
- quality: setup more formatter and linter for maintainability
Contributors
GokouRuri 🐛 💻 |
xxxfhy 🐛 |
lesfox 🐛 |
Holence 💻 |
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for linovelib2epub-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a269d006f986e2b9836ad0f0eae4610ca7c5b02c8f6eed902419368f98560645 |
|
MD5 | ed80060ddc5a31d0993f3bd214be5cf1 |
|
BLAKE2b-256 | aa483b943df0ec22b1280a1c7b39aa893c4ab5abf0683407b59f77682b0fc5e1 |