Presidio plugin for PII detection
Project description
Pii Extractor plugin: Presidio
This repository builds a Python package that installs a pii-extract-base plugin to perform PII detection for text data using the Microsoft Presidio Python library.
Requirements
The package neads
- at least Python 3.8
- the pii-data and the pii-extract-base base packages
- the presidio-analyzer package
- an NLP engine model for the desired language
Installation
- Install the package:
pip install pii-extract-plg-presidio
(it will automatically install its dependencies, includingpresidio-analyzer
) - Download the recognition model for the desired language, as instructed by
the presidio-analyzer installation instructions. For instance, for
spaCy models:
- English model:
python -m spacy download en_core_web_lg
- Spanish model:
python -m spacy download es_core_news_md
- English model:
- For additional information on model specification, see customizing NLP
models. If custom models are used, the
nlp_config
element in the plugin #configuration must be adjusted accordingly.
Usage
The package does not have any user-facing entry points (except for one console
script pii-extract-presidio-info
, which provides information about its
capabilities).
Instead, upon installation it defines a plugin entry point. This plugin is automatically picked up by executing scripts and classes in pii-extract-base, and thus its functionality is exposed to it.
Configuration
The plugin is governed by a PIISA configuration file; there is one default
file included in the package resources. The format tag for the configuration
is "piisa:config:extract-plg-presidio:main:v1
, and it has two sections:
nlp_config
defines the NLP engine to be used, and the available models (per language)pii_list
defines the PIISA instances to be detected. It contains a list of standard pii task descriptors; each one has an additionalextra
field that contains the Presidio PII entity to be mapped to the descriptor.
Building
The provided Makefile can be used to process the package:
make pkg
will build the Python package, creating a file that can be installed withpip
make unit
will launch all unit tests (using pytest, so pytest must be available)make install
will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:- the one defined in the
VENV
environment variable, if it is defined - if there is a virtualenv activated in the shell, it will be used
- otherwise, a default is chosen as
/opt/venv/pii
(it will be created if it does not exist)
- the one defined in the
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for pii-extract-plg-presidio-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a685efb0fe56e2cd037e58fcc7e3d548b360b735aa8a4d279c222a85586f6fbb |
|
MD5 | b014f30abd487977539810ebf3c68084 |
|
BLAKE2b-256 | 5461e1644f8bb5958e53102856a658234e73dcd7bda99917985be71a50af587b |