Skip to main content
Help us improve Python packaging – donate today!

A command to mask secret information of images using OCR

Project Description

masecret is a command to mask secret information in image files using OCR.

Prerequisite

  • Python 3.3+
  • Tesseract
    • Language data for OCR (can be specified with --lang, default is eng) must be available.

Installation

$ pip3 install masecret

You may need sudo.

masecret depends on pyocr and Pillow. If you fail to install Pillow, please see the installation instruction of Pillow.

Usage

Mask a single image file with a regular expression pattern that match AWS account number:

$ masecret -r '[-\d]{12,}' original.png -o masked.png

Mask multiple image files (output directory must exist):

$ masecret -r '[-\d]{12,}' original1.png original2.png ... -o masked_images/

Mask image files in-place with -i option:

$ masecret -i -r '[-\d]{12,}' original1.png original2.png ...

WARNING: No backup files will be saved.

SECRETS.txt

If -r option is not specified, regular expression will be read from a file named SECRETS.txt in a current directory. Content of the file is regular expression patterns that match secret information you want to mask. You can include multiple patterns line by line.

Full Usage

usage:
    masecret [options] INPUT -o OUTPUT
    masecret [options] INPUT... -o OUTPUT
    masecret -i [options] INPUT...

Mask secret information in image files using OCR. Put regular expression
matches secret information into a file named SECRETS.txt or -r option.

positional arguments:
  INPUT                 input files

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -o OUTPUT, --output OUTPUT
                        output file or directory (default: None)
  -r REGEX, --regex REGEX
                        regular expression matches secret information
                        (default: None)
  -s SECRET_PATH, --secret SECRET_PATH
                        path to file containing regexes line by line that
                        match secret information (default: ./SECRETS.txt)
  -l LANG, --lang LANG  language for OCR, can be multiple languages joined by
                        + sign, e.g. eng+jpn (default: eng)
  -c COLOR, --color COLOR
                        color to fill secrets (default: #666)
  -i, --in-place        mask image files in-place. WARNING: No backup files
                        will be saved (default: False)
  --tesseract-params PARAMS
                        (Advanced Option) additional parameters passed to
                        tesseract (default: -psm 6 makebox)

Debug

If images are not masked as expected, the environment variable DEBUG will help you. If DEBUG is set, all the characters tesseract recognized are printed with position.

$ DEBUG=1 masecret original.png -o masked.png
Processing original.png...
. ((136, 90), (160, 114))
. ((176, 90), (200, 114))
. ((216, 90), (240, 114))
I ((292, 104), (304, 126))
I ((308, 104), (320, 126))
A ((326, 104), (340, 120))
W ((341, 104), (361, 120))
S ((362, 103), (375, 120))
M ((385, 104), (401, 120))
a ((404, 108), (415, 120))
n ((417, 108), (427, 120))
a ((430, 108), (440, 120))
g ((443, 108), (453, 125))
e ((456, 108), (467, 120))
m ((469, 108), (485, 120))
e ((488, 108), (499, 120))
n ((501, 108), (511, 120))
t ((513, 105), (519, 120))
C ((528, 103), (542, 120))
o ((545, 108), (556, 120))
n ((559, 108), (569, 120))
...

License

MIT License. See: LICENSE.

Release history Release notifications

This version
History Node

0.3.0

History Node

0.2.0

History Node

0.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
masecret-0.3.0-py3-none-any.whl (10.0 kB) Copy SHA256 hash SHA256 Wheel 3.6 Feb 6, 2017
masecret-0.3.0.tar.gz (6.9 kB) Copy SHA256 hash SHA256 Source None Feb 6, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page