Yet another manga scraper and downloader
Project description
tankobon
What?
tankobon is a website scraper for comics and mangas. tankobon relies on stores, which define how to parse a website for chapters and chapters for links to the pages themselves. (somewhat like youtube-dl extractors.) Currently, the following websites are supported:
komi-san.com
m.mangabat.com
mangadex.org
mangakakalot.com
Creating a Store
A store is a regular Python module in the stores/
folder.
It should provide a Parser
class, which is a subclass of tankobon.manga.Parser
.
The following methods below must be implemented:
chapters(self) -> Generator[Tuple[str, Dict[str, str]], None, None]
Yields chapter_info which looks like this:
{
"id": ..., # chapter number
"title": ..., # chapter title
"url": ..., # chapter url
"volume": ..., # volume, i.e '0'
}
Volume is optional and may be undefined. Example:
def chapters(self):
# use self.soup to access the title page
for href in self.soup.find_all("a", href=True):
# validify href here and parse chapter id
...
yield {"id": ..., "title": href.text, "url": href["href"]}
pages(self, chapter_data: Dict[str, str]) -> List[str]
Return a list of urls to a chapter's pages, given the chapter data yielded from chapters()
.
The pages must be in order (page 1 is [0], page 2 is [1], etc.) Example:
def pages(self, chapter_data):
pages = []
# to get the chapter's html, use self.session.get (requests session)
# or self.soup (html already parsed by BeautifulSoup).
chapter_page = self.soup_from_url(chapter_data["url"])
for href in chapter_page.find_all("a", href=True):
# validify href here
...
pages.append(href["href"])
return pages
The following methods below may or may not be implemented: generic implementations are provided.
title(self) -> str
Return the title of the manga. Example:
def title(self):
return self.soup.title
Index Compatibility
Between version v3.1.0a1 and v3.2.0a0, the location of the index file has moved from site-packages to ~/.tankobon/index.json
, specific to each install of tankobon.
Todo
- download pre-parsed indexes from a special Github repo (tankobon-index?)
- create GUI to make downloading easier (like youtube-DLG)
Usage
tankobon download 'https://komi-san.com' # download all chapters
tankobon store info 'komi_san/https://komi-san.com' # and then get info on the chapters
Install
python(3) -m pip install tankobon
Build
All my python projects now use flit to build and publish.
To build, do flit build
.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tankobon-5.0.0b0.tar.gz
.
File metadata
- Download URL: tankobon-5.0.0b0.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.25.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
6e36f76fa2d97ca0819ad0bd63f513724b200e5d4d1066bf25882fd8dfc969cf
|
|
MD5 |
8d44418553f3ed583c0c891446daa4bf
|
|
BLAKE2b-256 |
7d54b49fbdc932bde2f62d02fe718c42419b8b351bc7bdfdda321916119c83dd
|
File details
Details for the file tankobon-5.0.0b0-py3-none-any.whl
.
File metadata
- Download URL: tankobon-5.0.0b0-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.25.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e3d21eda97358ec8d03c316eaf6bc258d214584f46b9b69fe56a7db11083ea8c
|
|
MD5 |
842230d0c0b7a15806f17b09a4c66f8a
|
|
BLAKE2b-256 |
c72441beb79b4d75d4b109bf7bade9f417607c521fe5102a6bb72ab55a0b124a
|