An python CLI for analyze PII Entities with Microsoft Presidio framework.
Project description
Presidio CLI
CLI tool that analyzes text for PII Entities with Microsoft Presidio framework.
Prerequisities
Python
version: 3.8, 3.9, 3.10
pipenv
app installed:
# check if app is installed
pipenv --version
# install, if not available
pip install pipenv
Install presidio-cli
in a virtual env
Install from Python Package Index
install in current python env
python -m pip install presidio-cli
install required apps and presidio-cli in virtual environment
pipenv install presidio-cli
Install from source
# clone from git
git clone https://github.com/insightsengineering/presidio-cli
cd presidio-cli
# install required apps and presidio-cli
pipenv install --deploy --dev
Install language models for spaCy
Load models for the English (en) language using the command presented below. For further information please visit section models.
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg
Configuration file syntax
The default configuration is taken from the .presidiocli
file in a current directory.
Configuration file supports the following parameters in a yaml file:
-
language - by default only models and recognizers for
en
are available. The list of languages can be extended. -
entities - limit list of recognized entities to be listed in parameter. It is mapped directly to
presidio framework
. List of supported entities -
ignore - list of ignored files/folders/directories based on pattern. It is recommended to ignore
Version Control
files, for example.git
Note: a file requires at least one parameter to be set.
An example of yaml configuration file content:
---
language: en
ignore: |
.git
*.cfg
entities:
- PERSON
- CREDIT_CARD
- EMAIL_ADDRESS
Run the Presidio CLI
Run the Presidio CLI to execute Presidio Analyzer with specified configuration: language, threshold, entities and ignore pre-configured files/paths.
Configuration from a file
An example of running script with configuration from a file.
There are two example .yaml
configuration files in the conf
directory:
- default.yaml - ignore the
.git
directory - limited.yaml - limit list of entities used to only 3 of them, ignore
.git
directory and.cfg
files.
# run with default configuration (file `.presidiocli`) in the current directory
presidio .
# run with configuration limited.yaml in the "tests" directory
presidio -c presidio_cli/conf/limited.yaml tests/
# run with configuration limited.yaml in single file only tests/test_analyzer.py
presidio -c presidio_cli/conf/limited.yaml tests/test_analyzer.py
Configuration as a parameter
An example of using configuration as data in parameter:
# ignore paths .git and *.cfg
presidio -d "ignore: |
.git
*.cfg" tests/
# limit list of entities to CREDIT_CARD
presidio-d "entities:
- CREDIT_CARD" tests/
# equivalent to use -c parameter
presidio -d "$(cat presidio_cli/conf/limited.yaml)" tests/
Formatting output
Output can be formatted using -f
or --format
parameter. The default format is auto
.
Available formats:
- standard - standard output format
presidio -d "entities:
- PERSON" -f standard tests/conftest.py
# result
tests/conftest.py
34:58 0.85 PERSON
37:33 0.85 PERSON
- github - similar to diff function in github
presidio -d "entities:
- PERSON" -f github tests/conftest.py
# result
::group::tests/conftest.py
::0.85 file=tests/conftest.py,line=34,col=58::34:58 [PERSON]
::0.85 file=tests/conftest.py,line=37,col=33::37:33 [PERSON]
::endgroup::
-
colored - standard output format but with colors
-
parsable - easy to parse automaticaly
presidio -d "entities:
- PERSON" -f parsable tests/conftest.py
# result
{"entity_type": "PERSON", "start": 57, "end": 62, "score": 0.85, "analysis_explanation": null}
{"entity_type": "PERSON", "start": 32, "end": 37, "score": 0.85, "analysis_explanation": null}
- auto - default format, switches automatically between those 2 modes:
- github, if run on github - environment variables
GITHUB_ACTIONS
andGITHUB_WORKFLOW
are set - colored, otherwise
- github, if run on github - environment variables
List of all parameters
Simply run the following to get a list of all available options for the CLI:
presidio --help
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file presidio_cli-0.0.6.tar.gz
.
File metadata
- Download URL: presidio_cli-0.0.6.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95e18bf280574eb42fd1c3f07c4f939d1f9f0bc8ff934b4abe2637753653e07a |
|
MD5 | aa135504092b6d1764ea3261a09fd785 |
|
BLAKE2b-256 | 1eb774853ac391f1b5efe0756f0124abe9c764fc2fc4a7f0384e29868b611027 |
File details
Details for the file presidio_cli-0.0.6-py2.py3-none-any.whl
.
File metadata
- Download URL: presidio_cli-0.0.6-py2.py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49a8ed6948f690f2eadae5aad83bea3046d243ca62d5a1cc89e0797429202cbb |
|
MD5 | 15b779c7613357972f36e6297af01a19 |
|
BLAKE2b-256 | 7f8255da89ec2fc0691d9caca784b38fdc2be8e7c1d0472ec28c23fef11eaca4 |