Skip to main content

An python CLI for analyze PII Entities with Microsoft Presidio framework.

Project description

Presidio CLI

Test 🧪 SuperLinter 🦸‍♀️ PyPI license PyPI version Maintenance PyPI download month PyPI pyversions

CLI tool that analyzes text for PII Entities with Microsoft Presidio framework.

Prerequisities

Python version: 3.8, 3.9, 3.10

pipenv app installed:

# check if app is installed
pipenv --version

# install, if not available
pip install pipenv

Install presidio-cli in a virtual env

Install from Python Package Index

install in current python env

python -m pip install presidio-cli

install required apps and presidio-cli in virtual environment

pipenv install presidio-cli

Install from source

# clone from git
git clone https://github.com/insightsengineering/presidio-cli
cd presidio-cli
# install required apps and presidio-cli
pipenv install --deploy --dev

Install language models for spaCy

Load models for the English (en) language using the command presented below. For further information please visit section models.

python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg

Configuration file syntax

The default configuration is taken from the .presidiocli file in a current directory.

Configuration file supports the following parameters in a yaml file:

  • language - by default only models and recognizers for en are available. The list of languages can be extended.

  • entities - limit list of recognized entities to be listed in parameter. It is mapped directly to presidio framework. List of supported entities

  • ignore - list of ignored files/folders/directories based on pattern. It is recommended to ignore Version Control files, for example .git

Note: a file requires at least one parameter to be set.

An example of yaml configuration file content:

---
language: en
ignore: |
  .git
  *.cfg
entities:
  - PERSON
  - CREDIT_CARD
  - EMAIL_ADDRESS

Run the Presidio CLI

Run the Presidio CLI to execute Presidio Analyzer with specified configuration: language, threshold, entities and ignore pre-configured files/paths.

Configuration from a file

An example of running script with configuration from a file.

There are two example .yaml configuration files in the conf directory:

  • default.yaml - ignore the .git directory
  • limited.yaml - limit list of entities used to only 3 of them, ignore .git directory and .cfg files.
# run with default configuration (file `.presidiocli`) in the current directory
presidio .

# run with configuration limited.yaml in the "tests" directory
presidio -c presidio_cli/conf/limited.yaml tests/

# run with configuration limited.yaml in single file only tests/test_analyzer.py
presidio -c presidio_cli/conf/limited.yaml tests/test_analyzer.py

Configuration as a parameter

An example of using configuration as data in parameter:

# ignore paths .git and *.cfg
presidio -d "ignore: |
  .git
  *.cfg" tests/

# limit list of entities to CREDIT_CARD
presidio-d "entities:
  - CREDIT_CARD" tests/

# equivalent to use -c parameter 
presidio -d "$(cat presidio_cli/conf/limited.yaml)" tests/

Formatting output

Output can be formatted using -f or --format parameter. The default format is auto.

Available formats:

  • standard - standard output format
presidio -d "entities:
  - PERSON" -f standard tests/conftest.py
# result
tests/conftest.py
  34:58     0.85     PERSON
  37:33     0.85     PERSON
  • github - similar to diff function in github
presidio -d "entities:
  - PERSON" -f github tests/conftest.py
# result
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON] 
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON] 
::endgroup::
  • colored - standard output format but with colors

  • parsable - easy to parse automaticaly

presidio -d "entities:
  - PERSON" -f parsable tests/conftest.py
# result
{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
  • auto - default format, switches automatically between those 2 modes:
    • github, if run on github - environment variables GITHUB_ACTIONS and GITHUB_WORKFLOW are set
    • colored, otherwise

List of all parameters

Simply run the following to get a list of all available options for the CLI:

presidio --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

presidio_cli-0.0.6.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

presidio_cli-0.0.6-py2.py3-none-any.whl (14.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file presidio_cli-0.0.6.tar.gz.

File metadata

  • Download URL: presidio_cli-0.0.6.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for presidio_cli-0.0.6.tar.gz
Algorithm Hash digest
SHA256 95e18bf280574eb42fd1c3f07c4f939d1f9f0bc8ff934b4abe2637753653e07a
MD5 aa135504092b6d1764ea3261a09fd785
BLAKE2b-256 1eb774853ac391f1b5efe0756f0124abe9c764fc2fc4a7f0384e29868b611027

See more details on using hashes here.

File details

Details for the file presidio_cli-0.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: presidio_cli-0.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for presidio_cli-0.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 49a8ed6948f690f2eadae5aad83bea3046d243ca62d5a1cc89e0797429202cbb
MD5 15b779c7613357972f36e6297af01a19
BLAKE2b-256 7f8255da89ec2fc0691d9caca784b38fdc2be8e7c1d0472ec28c23fef11eaca4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page