Skip to main content

Extensible utility to rewrite eBook content. Useful for fixing common mistakes made by authors and publishers alike.

Project description

Extensible utility to rewrite eBook content. Useful for fixing common mistakes made by authors and publishers alike.

Motivation

While I like ebooks a lot, I am often disappointed with their technical quality, or lack thereof. For example, I recently purchased a bundle of ebooks in a bout of nostalgia, to replace paper editions long lost. Alas, the files are full of repeated typos, misspelled names, and so forth, telling a sad tale of a publisher scanning in source material and not proofreading the result.

Faced with this, I used to fix the errors I find manually, mostly using regular expressions. This is a pain in the bum, even when employing Edit Books (a part of Kovid Goyal’s excellent Calibre). I therefore decided to write a piece of software to perform certain content rewrites automatically. I also wanted to make the mechanism extensible by dynamically loading content transformers at run-time. The result is eBookOCD. I hope that other people come up with either their own ideas for transformers or with actual code that can be shared.

Installing

eBookOCD requires Python 3.8 or higher, due to the language features used. Installation files can be found on PyPI. In many cases, executing

pip install eBookOCD

in a command shell will suffice.

Basic usage

  • ebookocd --help

    Display all supported command line options, showing both full and abbreviated names.

  • ebookocd source.epub --dest destination.epub

    Rewrite source.epub content into a new file called destination.epub. The source file will be unaffected. This is the recommended method of rewriting files.

  • ebookocd file.epub

    Rewrite the file in place, overwriting the existing content. You should only use this method if your source file is version controlled, or if you have a backup available.

  • ebookocd file.epub --unzip destination_directory

    Create the specified directory and extract the source file’s content into it. If the destination path already exists, execution will be aborted.

  • ebookocd --zip directory destination.epub

    Bundle the directory’s content as a compressed EPUB file. If the destination file already exists, execution will be aborted.

Advanced usage

  • ebookocd in.epub --dest out.epub --transform mymodule.myfile.MyTransformer

    Rewrites in.epub content into out.epub, using the specified transformer class for content filtering.

Content transformers

Content transformers are Python classes used to process ebook content. They are loaded dynamically at run-time, providing a mechanism to expand the functionality of eBookOCD with third party transformer classes.

If no transformer class is specified, the internal DefaultTransformer will be used. It is primarily concerned with removing unnecessary spaces from text-type files, like HTML and CSS.

Transformers can modify the content in any desired fashion, the only condition being that all methods of TransformerMixin (see API) are implemented. The transformer ebookocd.transform.monty.WonderfulSpam is provided as a simple reference. This example extends the DefaultTransformer class, but you can opt to write code that references ebookocd.api.TransformerMixin directly.

License

Copyright © 2020 Ralph Seichter. Please see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eBookOCD-0.1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

eBookOCD-0.1-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file eBookOCD-0.1.tar.gz.

File metadata

  • Download URL: eBookOCD-0.1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6

File hashes

Hashes for eBookOCD-0.1.tar.gz
Algorithm Hash digest
SHA256 7a302ff1b310b2d6e2e73b7a8e3f363238330aab6fc3c4fb761c799aee0dff21
MD5 155d8ed2e0a06ca9c6f865e623826fe2
BLAKE2b-256 807feb4a2e4a86de9aae67693df4f32df995122ed4585e21b7ffa5fe27cb6dca

See more details on using hashes here.

File details

Details for the file eBookOCD-0.1-py3-none-any.whl.

File metadata

  • Download URL: eBookOCD-0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6

File hashes

Hashes for eBookOCD-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 020a8f89f02269670acae06d0223e104357fb5a35513f3b257f8e38c912fd87d
MD5 f520168f65d2d1c59e84f80f89354549
BLAKE2b-256 cdfc0c6ea44c8774b67e43aed1a6f90a16a9ab86e3eb5737bc5ec2d8b45dfac7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page