Skip to main content

Yet another manga scraper and downloader

Project description

tankobon

logo

Code style: black PyPI - Downloads PyPI - License PyPI PyPI - Python Version Lines of code

What?

tankobon is a website scraper for comics and mangas. tankobon relies on stores, which define how to parse a website for chapters and chapters for links to the pages themselves. (somewhat like youtube-dl extractors.) Currently, the following websites are supported:

  • komi-san.com
  • m.mangabat.com
  • mangadex.org
  • mangakakalot.com

Creating a Store

A store is a regular Python module in the stores/ folder. It should provide a Parser class, which is a subclass of tankobon.manga.Parser. The following methods below must be implemented:

chapters(self) -> Generator[Tuple[str, Dict[str, str]], None, None]

Yields chapter_info which looks like this:

{
    "id": ...,  # chapter number
    "title": ...,  # chapter title
    "url": ...,  # chapter url
    "volume": ...,  # volume, i.e '0'
}

Volume is optional and may be undefined. Example:

def chapters(self):
    # use self.soup to access the title page
    for href in self.soup.find_all("a", href=True):
        # validify href here and parse chapter id
        ...
        yield {"id": ..., "title": href.text, "url": href["href"]}

pages(self, chapter_data: Dict[str, str]) -> List[str]

Return a list of urls to a chapter's pages, given the chapter data yielded from chapters(). The pages must be in order (page 1 is [0], page 2 is [1], etc.) Example:

def pages(self, chapter_data):
    pages = []
    # to get the chapter's html, use self.session.get (requests session)
    # or self.soup (html already parsed by BeautifulSoup).
    chapter_page = self.soup_from_url(chapter_data["url"])

    for href in chapter_page.find_all("a", href=True):
        # validify href here
        ...
        pages.append(href["href"])
    return pages

The following methods below may or may not be implemented: generic implementations are provided.

title(self) -> str

Return the title of the manga. Example:

def title(self):
    return self.soup.title

Index Compatibility

Between version v3.1.0a1 and v3.2.0a0, the location of the index file has moved from site-packages to ~/.tankobon/index.json, specific to each install of tankobon.

Todo

  • download pre-parsed indexes from a special Github repo (tankobon-index?)
  • create GUI to make downloading easier (like youtube-DLG)

Usage

tankobon download 'https://komi-san.com'  # download all chapters
tankobon store info 'komi_san/https://komi-san.com'  # and then get info on the chapters

Install

python(3) -m pip install tankobon

Build

All my python projects now use flit to build and publish. To build, do flit build.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tankobon-5.0.0b0.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

tankobon-5.0.0b0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file tankobon-5.0.0b0.tar.gz.

File metadata

  • Download URL: tankobon-5.0.0b0.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for tankobon-5.0.0b0.tar.gz
Algorithm Hash digest
SHA256 6e36f76fa2d97ca0819ad0bd63f513724b200e5d4d1066bf25882fd8dfc969cf
MD5 8d44418553f3ed583c0c891446daa4bf
BLAKE2b-256 7d54b49fbdc932bde2f62d02fe718c42419b8b351bc7bdfdda321916119c83dd

See more details on using hashes here.

File details

Details for the file tankobon-5.0.0b0-py3-none-any.whl.

File metadata

  • Download URL: tankobon-5.0.0b0-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for tankobon-5.0.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3d21eda97358ec8d03c316eaf6bc258d214584f46b9b69fe56a7db11083ea8c
MD5 842230d0c0b7a15806f17b09a4c66f8a
BLAKE2b-256 c72441beb79b4d75d4b109bf7bade9f417607c521fe5102a6bb72ab55a0b124a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page