Skip to main content

CLI utility to remove PII watermarks from pdfs downloaded from Move USP/ESALQ

Project description

Move Unmarker

Very small CLI utility to remove PII watermarks from pdfs downloaded from Move USP/ESALQ, using PyMuPDF.

Beware that there is no input sanitization or error checking, you are on your own. This tool will run, as opposed to work, without fail on any pdf that has at least one content stream per page, which is basically every pdf in the wild. Unless the pdf has a watermark corresponding to the 2nd content stream of every page this will either do nothing (with the exception of changing compression options and maybe other idiosyncrasies of PyMuPDF when it comes to writing a pdf) or, in case it does have a 2 or more content streams on a page, it will keep just the first and likely make the file useless, though it will still open. Most pdf writers concatenate multiple content streams into one, so chances are it won't do anything or just crash.
This tool will overwrite without confirmation any file with the same name as --output (default "unmarked.pdf").

Installation

  1. Make sure Python 3.8 or higher and pip are installed
  2. Run pip install move-unmarker

Usage

usage: unmarker [-h] [-o OUTPUT] [-g GARBAGE] input

Utility to remove PII watermarks from pdfs downloaded from Move USP/ESALQ.

positional arguments:
input                   input filename

options:
-h, --help              show this help message and exit
-o OUTPUT, --output OUTPUT
                        output filename (default: "unmarked.pdf")
-g GARBAGE, --garbage GARBAGE
                        level of garbage collection (default: 1)  

pymupdf.Document.save method for more details on garbage collection.

TLDR

  • unmarker watermarked.pdf
  • unmarker -o unmarked.pdf watermarked.pdf
  • unmarker --garbage 3 watermarked.pdf

Development

  1. Check Python's version python -V

  2. Install Python 3.8 or higher and pip, if they aren't already installed:

    • Windows winget install Python.Python.3.X (replace X with the desired minor version)
    • Ubuntu/Debian based distros apt install python3 python3-pip
    • Arch based distros pacman -S python python-pip
    • Fedora dnf install python3 python3-pip
  3. Install poetry

  4. Clone this repo
    git clone https://github.com/joaofauvel/move-unmarker.git && cd move-unmarker

  5. Install requirements
    poetry install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

move_unmarker-0.1.4.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

move_unmarker-0.1.4-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file move_unmarker-0.1.4.tar.gz.

File metadata

  • Download URL: move_unmarker-0.1.4.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Windows/10

File hashes

Hashes for move_unmarker-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5d26070d13b193f0d6cc24212761d93578302f16e547ce95f9e450cb393dd6af
MD5 c8b4ff096fc638ffa08ea0f61b38f3e2
BLAKE2b-256 782c122b17474467fe4453c618fbe0e6933021c2ab7b06d227176b8689299bfe

See more details on using hashes here.

File details

Details for the file move_unmarker-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: move_unmarker-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Windows/10

File hashes

Hashes for move_unmarker-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a67bd378ce123ded6aa3a447559ab1d7d1fa9ba1de5d536bd5c7628dde38b418
MD5 4e6127a1f50f219de586e0514625640e
BLAKE2b-256 bafbb627741a36ca9c6c40b173174a050f7cec8e96368becf21409fd268e297a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page