Skip to main content

Batch convert multiple web pages into one e-book by URL, xml string, etc.

Project description

xml2epub

GitHub Repo stars GitHub Workflow Status python pypi wheel license PyPI - Downloads

Batch convert multiple web pages into one e-book by URL, xml string, etc.

How to install

xml2epub is available on pypi https://pypi.org/project/xml2epub/

$ pip install xml2epub

Basic Usage

import xml2epub

## create an empty eBook
book = xml2epub.Epub("My New E-book Name")
## create chapters by url
chapter1 = xml2epub.create_chapter_from_url("https://dev.to/devteam/top-7-featured-dev-posts-from-the-past-week-h6h")
chapter2 = xml2epub.create_chapter_from_url("https://dev.to/ks1912/getting-started-with-docker-34g6")
## add chapters to your eBook
book.add_chapter(chapter1)
book.add_chapter(chapter2)
## generate epub file
book.create_epub("Your Output Directory")

After waiting for a while, if no error is reported, the following "My New E-book Name.epub" file will be generated in "Your Output Directory":

The generated epub file

API

  • create_chapter_from_file(file_name, url=None, title=None, strict=True): Create a Chapter object from an html or xhtml file.
    • file_name (string): The filename containing the html or xhtml content of the created chapter.
    • url (Option[string]): The url used to infer the chapter title.
    • title (Option[string]): The chapter name of the chapter, if None, the content of the title tag obtained from the web file will be used as the chapter name.
    • strict (Option[boolean]): Whether to perform strict page cleaning, which will remove inline styles, insignificant attributes, etc., generally True.
  • create_chapter_from_url(url, title=None, strict=True): Create a Chapter object by extracting webpage from given url.
    • url (string): website link.
    • title (Option[string]): The chapter name of the chapter, if None, the content of the title tag obtained from the web file will be used as the chapter name.
    • strict (Option[boolean]): Whether to perform strict page cleaning, which will remove inline styles, insignificant attributes, etc., generally True.
  • create_chapter_from_string(html_string, url=None, title=None, strict=True): Create a Chapter object from a string. The principle of the above two methods is to first obtain the html or xml string, and then call this method.
    • html_string (string): html or xhtml string.
    • url (Option[string]): The url used to infer the chapter title.
    • title (Option[string]): The chapter name of the chapter, if None, the content of the title tag obtained from the web file will be used as the chapter name.
    • strict (Option[boolean]): Whether to perform strict page cleaning, which will remove inline styles, insignificant attributes, etc., generally True.
  • Epub(title, creator='dfface', language='en', rights='', publisher='dfface', epub_dir=None): Constructor method to create Epub object.Mainly used to add book information and all chapters and generate epub file.
    • title (str): The title of the epub.
    • creator (Option[str]): The author of the epub.
    • language (Option[str]): The language of the epub.
    • rights (Option[str]): The copyright of the epub.
    • publisher (Option[str]): The publisher of the epub.
    • epub_dir(Option[str]): The path of intermediate file, the system's temporary file path is used by default, or you can specify it yourself.
  • Epub object add_chapter(chapter_object): Add Chapter object to Epub.
    • chapter_object (Chapter object): Use the three methods of creating a chapter object to get the object.
  • Epub object create_epub(output_directory, epub_name=None): Create an epub file from the Epub object.
    • output_directory (str): Directory to output the epub file to.
    • epub_name (Option[str]): The file name of your epub. This should not contain .epub at the end. If this argument is not provided, defaults to the title of the epub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml2epub-2.0.tar.gz (16.1 kB view hashes)

Uploaded Source

Built Distribution

xml2epub-2.0-py3-none-any.whl (17.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page