Skip to main content

Montag is a utility which reads e-book files and scrubs them of profanity.

Project description

Montag

Docker Image Docker Image (arm32v7)

"Didn't firemen prevent fires rather than stoke them up and get them going?"

Montag is a utility which reads an e-book file (in any format supported by Calibre's ebook-convert) and scrubs it of profanity (or words from any other list you can provide).

There are all sorts of arguments to be had about obscenity filters, censorship, etc. That's okay! I'm not really interested in having those arguments. My 13 year-old daughter asked me if I could take some swear words out of a young adult novel she was reading so I wrote this for her. If it's useful to you, great. If not, carry on my wayward son.

montag is part of a family of projects with similar goals:

Prerequisites

Python Prerequisites

Montag requires Python 3 and the EbookLib and python-magic libraries. It also uses some utilities from the Calibre project.

On a Debian-based Linux distribution, these requirements could be installed with:

$ sudo apt-get install libmagic1 imagemagick calibre-bin python3 python3-magic python3-ebooklib

Docker

Alternately, a Dockerfile is provided to allow you to run Montag in Docker. You can build the mmguero/montag:latest Docker image with build_docker.sh, then use montag-docker.sh to process your e-book files.

Usage

Montag is easy to use. Specify the input and output e-book filenames, and, optionally, the file containing the words to be censored (one per line) and the text encoding.

$ ./montag.py 
usage: montag.py [options]

e-book profanity scrubber

required arguments:
  -i <STR>, --input <STR>
                        Input file
  -o <STR>, --output <STR>
                        Output file
  -w <STR>, --word-list <STR>
                        Profanity list text file (default: swears.txt)
  -e <STR>, --encoding <STR>
                        Text encoding (default: utf-8)

So, using Andy Weir's "The Martian" as an example:

$ ./montag.py -i "The Martian - Andy Weir.mobi" -o "The Martian - Andy Weir (scrubbed).mobi"
Processing "The Martian - Andy Weir.mobi" of type "Mobipocket E-book "The Martian", 775003 bytes uncompressed, version 6, codepage 65001"
Extracting metadata...
Converting to EPUB...
Processing book contents...
Generating output...
Converting...
Restoring metadata...

Upon opening the book, you will find the text reads something like this:

CHAPTER 1

LOG ENTRY: SOL 6

I’m pretty much ******.

That’s my considered opinion.

******.

Six days into what should be the greatest two months of my life, and it’s turned into a nightmare.

...

Alternately, if you are using the Docker method described above, use montag-docker.sh rather than montag.py directly.

Known Limitations

Montag is not smart enough to do any in-depth language analysis or deep filtering. For a while I was trying to use the rominf/profanity-filter library for the word detection and filtering, but I ran into issues and ended up just going with a simpler method that works but presents a few limitations:

  • Only whole words are matched and censored. In other words, if the word frick is in your list of profanity, Frick you! will be censored, but Absofrickenlutely will not. As such if you wish to catch all of the variations of the word frick, you'd have to list them individually in your swears.txt word list.
  • Having phrases (eg., multiple space-separated words) in your swears.txt word list won't do you any good.
  • Montag can't tell the difference between different meanings of the same word. For example, if the word ass is in your list, both "And he said unto his sons, Saddle me the ass. So they saddled him the ass: and he rode thereon" (from the KJV of The Bible) and "Then the high king carefully turned the golden screw. Once: Nothing. Twice: Nothing. Then he turned it the third time, and the boy’s ass fell off" (from Patrick Rothfuss' The Wise Man's Fear) will be censored.

Contributing

If you'd like to help improve Montag, pull requests will be welcomed!

Authors

  • Seth Grover - Initial work - mmguero

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments

Thanks to:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

montag-cleaner-1.0.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

montag_cleaner-1.0.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file montag-cleaner-1.0.0.tar.gz.

File metadata

  • Download URL: montag-cleaner-1.0.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for montag-cleaner-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b26c37e7b6ea85895c2557883fc30e90cc81de696fccf3fbb0693900aec8c667
MD5 703a64c3644b0ef59c0b32be1b1663b7
BLAKE2b-256 fa6614cd770a7515505b0c259fc83a368ed19b4417da370298d070785e1017cb

See more details on using hashes here.

Provenance

File details

Details for the file montag_cleaner-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: montag_cleaner-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1

File hashes

Hashes for montag_cleaner-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dac47d70f0942bc67c88e86786d3a1dde3c22503ad42af27b5e5d219e3954de5
MD5 3c8b7397a329623451e2dc411f222e15
BLAKE2b-256 4c70d5503b35edc87b96231b53ba28c5b52206527470fe40e4963c41fbf1efd4

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page