Skip to main content

A command to mask secret information of images using OCR

Project description

masecret is a command to mask secret information in image files using OCR.

DISCLAIMER: There is no guarantee that all the secret information is successfully masked. You must make sure that all the secret information is masked.

Prerequisite

  • Python 3.3+

  • Tesseract

    • Language data for OCR (can be specified with --lang, default is eng) must be available.

Installation

$ pip3 install masecret

You may need sudo.

masecret depends on pyocr and Pillow. If you fail to install Pillow, please see the installation instruction of Pillow.

Usage

Mask a single image file with a regular expression pattern that match AWS account number:

$ masecret -r '[-\d]{12,}' original.png -o masked.png

Mask multiple image files (output directory must exist):

$ masecret -r '[-\d]{12,}' original1.png original2.png ... -o masked_images/

Mask image files in-place with -i option:

$ masecret -i -r '[-\d]{12,}' original1.png original2.png ...

WARNING: No backup files will be created.

SECRETS.txt

When -r option is not provided, regular expressions are read from a file named SECRETS.txt in a current directory. Content of the file is regular expression patterns that match secret information you want to mask. You can include multiple patterns line by line.

Full Usage

usage:
    masecret [options] INPUT -o OUTPUT
    masecret [options] INPUT... -o OUTPUT
    masecret -i [options] INPUT...

Mask secret information in image files using OCR. Put regular expression
matches secret information into a file named SECRETS.txt or -r option.

positional arguments:
  INPUT                 input files

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -o OUTPUT, --output OUTPUT
                        output file or directory (default: None)
  -r REGEX, --regex REGEX
                        regular expression matches secret information
                        (default: None)
  -s SECRET_PATH, --secret SECRET_PATH
                        path to file containing regexes line by line that
                        match secret information (default: ./SECRETS.txt)
  -l LANG, --lang LANG  language for OCR, can be multiple languages joined by
                        + sign, e.g. eng+jpn (default: eng)
  -c COLOR, --color COLOR
                        color to fill secrets (default: #666)
  -i, --in-place        mask image files in-place. WARNING: No backup files
                        will be saved (default: False)
  --tesseract-params PARAMS
                        (Advanced Option) additional parameters passed to
                        tesseract (default: -psm 6 makebox)

Debug

If images are not masked as expected, the environment variable DEBUG will help you. If DEBUG is set, all the characters tesseract recognized are printed with position.

$ DEBUG=1 masecret original.png -o masked.png
Processing original.png...
. ((136, 90), (160, 114))
. ((176, 90), (200, 114))
. ((216, 90), (240, 114))
I ((292, 104), (304, 126))
I ((308, 104), (320, 126))
A ((326, 104), (340, 120))
W ((341, 104), (361, 120))
S ((362, 103), (375, 120))
M ((385, 104), (401, 120))
a ((404, 108), (415, 120))
n ((417, 108), (427, 120))
a ((430, 108), (440, 120))
g ((443, 108), (453, 125))
e ((456, 108), (467, 120))
m ((469, 108), (485, 120))
e ((488, 108), (499, 120))
n ((501, 108), (511, 120))
t ((513, 105), (519, 120))
C ((528, 103), (542, 120))
o ((545, 108), (556, 120))
n ((559, 108), (569, 120))
...

License

MIT License. See: LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

masecret-0.4.0.tar.gz (7.4 kB view hashes)

Uploaded Source

Built Distribution

masecret-0.4.0-py3-none-any.whl (8.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page