Full end-to-end processing for PII (preprocess, extract, decide, transform)
Project description
pii-process
Full end-to-end processing for PII (preprocess, extract, decide, transform)
Description
This package wraps around the relevant API blocks in the full PIISA workflow:
pii-preprocess
, to read document formatspii-extract
(plus any installed pii-extract plugins), to detect and extract PII instances from documentspii-decide
, to consolidate the list of PII instancespii-transform
, to substitute detected PII instances in documents
It provides both a Python API and a command-line interface
Installation
Dependencies have been included in the package so that all necessary PIISA packages are installed along. So what is needed is just:
- creation of a Python virtualenv (using Python >= 3.8)
- and installation of the package in the virtualenv
Choices are:
-
Simple installation: this will install the package, the packages for the four above mentioned PIISA processing steps, and the extraction plugin for PII instances using regular expressions:
pip install pii-process
the dependencies installed automatically are thus
pii-preprocess
,pii-extract-base
,pii-extract-plg-regex
,pii-decide
andpii-transform
-
Complete installation: this installs all the above, plus the extraction plugin for PII instances using trained Transformer models (usually to extract PERSON and LOCATION types for some languages):
pip install pii-processtransformers
Over the previous installation, this adds also the
pii-extract-plg-transformers
package. Note that Pytorch needs to be installed too (either GPU or CPU versionss) , so that the models used by thepii-extract-plg-transformers
package can run. See the transformers plugin documentation for more information, -
Alternate installation: this option performs the first install, and it adds the extraction plugin for PII instances using the Presidio library (usually to extract PERSON and LOCATION types for some languages):
pip install pii-processpresidio
the additional package installed is in this case
pii-extract-plg-presidio
. And in order to work the relevant models need to be downloaded, see the presidio plugin documentation for details
It is also possible to install all plugins, i.e. pip install pii-processtransformers,presidio
, though the Transformers and Presidio
plugins overlap in functionality (note that detection overlaps would be resolved
by the pii-decide
block).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pii-process-0.1.1.tar.gz
.
File metadata
- Download URL: pii-process-0.1.1.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0be99227644702e1cd42eb2428296d938ffc2991e1ad5d656916e813545411a |
|
MD5 | d7415dc400de2e35bc427e849550eb16 |
|
BLAKE2b-256 | fb0a06e225a7509622adf3b81cf70442b6af3bc4b14c4be689b0d2ed107cf3df |