Craw light novel from [哔哩轻小说(linovelib)](https://w.linovelib.com/) and convert to epub.
Project description
linovelib2epub
Crawl light novel from 哔哩轻小说(linovelib) and convert to epub.
preview
A picture is worth a thousand words. Talk is cheap, show me the real effect.
This demo use this screen recorder tool to record.
Features
- flexible
has_illustration
anddivide_volume
option for epub output - support download a certain volume of a novel
- built-in http request retry mechanism to improve network fault tolerance
- built-in random browser user_agent through fake_useragent library
- built-in strict integrity check about image download
- built-in mechanism for saving temporary book data by pickle library
- use multi-process to download images
- support add custom css style to epub
Supported Websites (plan)
序号 | 网站名称 | 语言 | 爬虫难度 | 支持进度 | 备注 |
---|---|---|---|---|---|
1 | 哔哩轻小说(Mobile) | 简/繁 | 中😰 | :ok: | 默认选项。 |
2 | 哔哩轻小说(Web) | 简/繁 | 中😰 | 🚫 | 资源同Mobile,没必要。 |
3 | 轻之国度 | 简/繁 | 高🤣👿 | 🚫 | 需要登录,轻币门槛,导航分类混乱。 |
4 | 无限轻小说 | 繁 | 中😰 | ? | 不用登录。一章多页。 |
5 | 轻小说文库 | 简/繁 | 中😰 | ? | 需要登录。一章一页。 |
6 | 轻小说百科 | 简/繁 | 低😆 | ? | 不用登录,一章一页。遗憾的是插图清晰度低。 |
7 | 真白萌 | 简/繁 | 中😰 | ? | 需要登录,一章一页。 |
爬虫友好度有两个重要指标:
- 1.访问门槛。是否需要登陆以及积分。
- 2.页面结构。一个章节多页渲染的视为中等难度。
如果你发现其他的很好轻小说目标源,资源丰富,更新及时,插图清晰,并且爬虫门槛合理的,可以在issue发起补充。
代码实现中对其他轻小说源的支持,关键是继承并重写这个 BaseNovelWebsiteSpider
类。
Usage
install from source
- clone this repo
git clone https://github.com/lightnovel-center/linovelib2epub.git
- set up a clean local python venv
See also: creating-virtual-environments
replace py
with your real python command if needed. e.g. python
or python3
.
# new a venv
py -m venv venv
# activate venv
.\venv\Scripts\activate
# install dependencies
py -m pip install -r requirements.txt
# install this package in local
# under project root folder: linovelib2epub/
python -m pip install -e .
- Now you can use this package as a pypi remote package.
from linovelib2epub.linovel import Linovelib2Epub
# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=3279)
linovelib_epub.run()
install from pypi
- Install this package from pypi:
pip install linovelib2epub
Or update to the latest version:
pip install linovelib2epub --upgrade
- create a python file and edit the content as follows:
from linovelib2epub.linovel import Linovelib2Epub
# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
linovelib_epub = Linovelib2Epub(book_id=3279)
linovelib_epub.run()
If it finished without errors, you can see the epub file is under the folder where your python file is located.
Options
Parameters | type | required | default | description |
---|---|---|---|---|
book_id | number | YES | None | 书籍ID。 |
base_url | string | NO | 'https://w.linovelib.com/novel' | 哔哩轻小说主页URL |
divide_volume | boolean | NO | False | 是否分卷 |
select_volume_mode | boolean | NO | False | 选择卷模式,它为True时 divide_volume 强制为True。 |
has_illustration | boolean | NO | True | 是否下载插图 |
image_download_folder | string | NO | "images" | 图片下载临时文件夹. 不允许以相对路径../开头。 |
pickle_temp_folder | string | NO | "pickle" | pickle临时数据保存的文件夹。 |
http_timeout | number | NO | 10 | 一个HTTP请求的超时等待时间(秒)。代表connect和read timeout。 |
http_retries | number | NO | 5 | 当一个HTTP请求失败后,重试的最大次数。 |
http_cookie | string | NO | '' | 自定义HTTP cookie。 |
custom_style_cover | string | NO | '' | 自定义cover.xhtml的样式 |
custom_style_nav | string | NO | '' | 自定义nav.xhtml的样式 |
custom_style_chapter | string | NO | '' | 自定义每章(?.xhtml)的样式 |
disable_proxy | boolean | NO | True | 是否禁用所在的代理环境,默认禁用 |
Todo
- quality: setup pytest and codecov
- quality: setup more formatter and linter for maintainability
Contributors
GokouRuri 🐛 💻 |
xxxfhy 🐛 |
lesfox 🐛 |
Holence 💻 |
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file linovelib2epub-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: linovelib2epub-0.1.3-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a269d006f986e2b9836ad0f0eae4610ca7c5b02c8f6eed902419368f98560645 |
|
MD5 | ed80060ddc5a31d0993f3bd214be5cf1 |
|
BLAKE2b-256 | aa483b943df0ec22b1280a1c7b39aa893c4ab5abf0683407b59f77682b0fc5e1 |