Extensible utility to rewrite eBook content. Useful for fixing common mistakes made by authors and publishers alike.
Project description
Extensible utility to rewrite eBook content. Useful for fixing common mistakes made by authors and publishers alike.
Motivation
While I like ebooks a lot, I am often disappointed with their technical quality, or lack thereof. For example, I recently purchased a bundle of ebooks in a bout of nostalgia, to replace paper editions long lost. Alas, the files are full of repeated typos, misspelled names, and so forth, telling a sad tale of a publisher scanning in source material and not proofreading the result.
Faced with this, I used to fix the errors I find manually, mostly using regular expressions. This is a pain in the bum, even when employing Edit Books (a part of Kovid Goyal’s excellent Calibre). I therefore decided to write a piece of software to perform certain content rewrites automatically. I also wanted to make the mechanism extensible by dynamically loading content transformers at run-time. The result is eBookOCD. I hope that other people come up with either their own ideas for transformers or with actual code that can be shared.
Installing
eBookOCD requires Python 3.8 or higher, due to the language features used. Installation files can be found on PyPI. In many cases, executing
pip install eBookOCD
in a command shell will suffice.
Basic usage
ebookocd --help
Display all supported command line options, showing both full and abbreviated names.
ebookocd source.epub --dest destination.epub
Rewrite source.epub content into a new file called destination.epub. The source file will be unaffected. This is the recommended method of rewriting files.
ebookocd file.epub
Rewrite the file in place, overwriting the existing content. You should only use this method if your source file is version controlled, or if you have a backup available.
ebookocd file.epub --unzip destination_directory
Create the specified directory and extract the source file’s content into it. If the destination path already exists, execution will be aborted.
ebookocd --zip directory destination.epub
Bundle the directory’s content as a compressed EPUB file. If the destination file already exists, execution will be aborted.
Advanced usage
ebookocd in.epub --dest out.epub --transform mymodule.myfile.MyTransformer
Rewrites in.epub content into out.epub, using the specified transformer class for content filtering.
Content transformers
Content transformers are Python classes used to process ebook content. They are loaded dynamically at run-time, providing a mechanism to expand the functionality of eBookOCD with third party transformer classes.
If no transformer class is specified, the internal DefaultTransformer will be used. It is primarily concerned with removing unnecessary spaces from text-type files, like HTML and CSS.
Transformers can modify the content in any desired fashion, the only condition being that all methods of TransformerMixin (see API) are implemented. The transformer ebookocd.transform.monty.WonderfulSpam is provided as a simple reference. This example extends the DefaultTransformer class, but you can opt to write code that references ebookocd.api.TransformerMixin directly.
License
Copyright © 2020 Ralph Seichter. Please see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file eBookOCD-0.1.tar.gz
.
File metadata
- Download URL: eBookOCD-0.1.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a302ff1b310b2d6e2e73b7a8e3f363238330aab6fc3c4fb761c799aee0dff21 |
|
MD5 | 155d8ed2e0a06ca9c6f865e623826fe2 |
|
BLAKE2b-256 | 807feb4a2e4a86de9aae67693df4f32df995122ed4585e21b7ffa5fe27cb6dca |
File details
Details for the file eBookOCD-0.1-py3-none-any.whl
.
File metadata
- Download URL: eBookOCD-0.1-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 020a8f89f02269670acae06d0223e104357fb5a35513f3b257f8e38c912fd87d |
|
MD5 | f520168f65d2d1c59e84f80f89354549 |
|
BLAKE2b-256 | cdfc0c6ea44c8774b67e43aed1a6f90a16a9ab86e3eb5737bc5ec2d8b45dfac7 |