Skip to main content

Craw light novel from [哔哩轻小说(linovelib)](https://w.linovelib.com/) and convert to epub.

Project description

linovelib2epub

Crawl light novel from 哔哩轻小说(linovelib) and convert to epub.

Hatch project flake8 Build and Publish PyPI - Downloads PyPI Lines of code Hits-of-Code GitHub commit activity

preview

A picture is worth a thousand words. Talk is cheap, show me the real effect.

preview

This demo use this screen recorder tool to record.

Features

  • flexible has_illustration and divide_volume option for epub output
  • support download a certain volume of a novel
  • built-in http request retry mechanism to improve network fault tolerance
  • built-in random browser user_agent through fake_useragent library
  • built-in strict integrity check about image download
  • built-in mechanism for saving temporary book data by pickle library
  • use multi-process to download images
  • support add custom css style to epub

Supported Websites (plan)

序号 网站名称 语言 爬虫难度 支持进度 备注
1 哔哩轻小说(Mobile) 简/繁 中😰 :ok: 默认选项。
2 哔哩轻小说(Web) 简/繁 中😰 🚫 资源同Mobile,没必要。
3 轻之国度 简/繁 高🤣👿 🚫 需要登录,轻币门槛,导航分类混乱。
4 无限轻小说 中😰 不用登录。一章多页。
5 轻小说文库 简/繁 中😰 需要登录。一章一页。
6 轻小说百科 简/繁 低😆 不用登录,一章一页。遗憾的是插图清晰度低。
7 真白萌 简/繁 中😰 需要登录,一章一页。

爬虫友好度有两个重要指标:

  • 1.访问门槛。是否需要登陆以及积分。
  • 2.页面结构。一个章节多页渲染的视为中等难度。

如果你发现其他的很好轻小说目标源,资源丰富,更新及时,插图清晰,并且爬虫门槛合理的,可以在issue发起补充。

代码实现中对其他轻小说源的支持,关键是继承并重写这个 BaseNovelWebsiteSpider 类。

Usage

install from source

  1. clone this repo
git clone https://github.com/lightnovel-center/linovelib2epub.git
  1. set up a clean local python venv

See also: creating-virtual-environments

replace py with your real python command if needed. e.g. python or python3.

# new a venv
py -m venv venv

# activate venv
.\venv\Scripts\activate

# install dependencies
py -m pip install -r requirements.txt

# install this package in local
# under project root folder: linovelib2epub/
python -m pip install -e .
  1. Now you can use this package as a pypi remote package.
from linovelib2epub.linovel import Linovelib2Epub

# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
    linovelib_epub = Linovelib2Epub(book_id=3279)
    linovelib_epub.run()

install from pypi

  1. Install this package from pypi:
pip install linovelib2epub

Or update to the latest version:

pip install linovelib2epub --upgrade
  1. create a python file and edit the content as follows:
from linovelib2epub.linovel import Linovelib2Epub

# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
    linovelib_epub = Linovelib2Epub(book_id=3279)
    linovelib_epub.run()

If it finished without errors, you can see the epub file is under the folder where your python file is located.

Options

Parameters type required default description
book_id number YES None 书籍ID。
base_url string NO 'https://w.linovelib.com/novel' 哔哩轻小说主页URL
divide_volume boolean NO False 是否分卷
select_volume_mode boolean NO False 选择卷模式,它为True时 divide_volume 强制为True。
has_illustration boolean NO True 是否下载插图
image_download_folder string NO "images" 图片下载临时文件夹. 不允许以相对路径../开头。
pickle_temp_folder string NO "pickle" pickle临时数据保存的文件夹。
http_timeout number NO 10 一个HTTP请求的超时等待时间(秒)。代表connect和read timeout。
http_retries number NO 5 当一个HTTP请求失败后,重试的最大次数。
http_cookie string NO '' 自定义HTTP cookie。
custom_style_cover string NO '' 自定义cover.xhtml的样式
custom_style_nav string NO '' 自定义nav.xhtml的样式
custom_style_chapter string NO '' 自定义每章(?.xhtml)的样式
disable_proxy boolean NO True 是否禁用所在的代理环境,默认禁用

Todo

  • quality: setup pytest and codecov
  • quality: setup more formatter and linter for maintainability

Contributors

All Contributors

GokouRuri
GokouRuri

🐛 💻
xxxfhy
xxxfhy

🐛
lesfox
lesfox

🐛
Holence
Holence

💻

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

linovelib2epub-0.1.3-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file linovelib2epub-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for linovelib2epub-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a269d006f986e2b9836ad0f0eae4610ca7c5b02c8f6eed902419368f98560645
MD5 ed80060ddc5a31d0993f3bd214be5cf1
BLAKE2b-256 aa483b943df0ec22b1280a1c7b39aa893c4ab5abf0683407b59f77682b0fc5e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page