Skip to main content

Full end-to-end processing for PII (preprocess, extract, decide, transform)

Project description

pii-process

Full end-to-end processing for PII (preprocess, extract, decide, transform)

Description

This package wraps around the relevant API blocks in the full PIISA workflow:

  1. pii-preprocess, to read document formats
  2. pii-extract (plus any installed pii-extract plugins), to detect and extract PII instances from documents
  3. pii-decide, to consolidate the list of PII instances
  4. pii-transform, to substitute detected PII instances in documents

It provides both a Python API and a command-line interface

Installation

Dependencies have been included in the package so that all necessary PIISA packages are installed along. So what is needed is just:

  • creation of a Python virtualenv (using Python >= 3.8)
  • and installation of the package in the virtualenv

Choices are:

  • Simple installation: this will install the package, the packages for the four above mentioned PIISA processing steps, and the extraction plugin for PII instances using regular expressions:

     pip install pii-process
    

    the dependencies installed automatically are thus pii-preprocess, pii-extract-base, pii-extract-plg-regex, pii-decide and pii-transform

  • Complete installation: this installs all the above, plus the extraction plugin for PII instances using trained Transformer models (usually to extract PERSON and LOCATION types for some languages):

     pip install pii-processtransformers
    

    Over the previous installation, this adds also the pii-extract-plg-transformers package. Note that Pytorch needs to be installed too (either GPU or CPU versionss) , so that the models used by the pii-extract-plg-transformers package can run. See the transformers plugin documentation for more information,

  • Alternate installation: this option performs the first install, and it adds the extraction plugin for PII instances using the Presidio library (usually to extract PERSON and LOCATION types for some languages):

     pip install pii-processpresidio
    

    the additional package installed is in this case pii-extract-plg-presidio. And in order to work the relevant models need to be downloaded, see the presidio plugin documentation for details

It is also possible to install all plugins, i.e. pip install pii-processtransformers,presidio, though the Transformers and Presidio plugins overlap in functionality (note that detection overlaps would be resolved by the pii-decide block).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii-process-0.1.1.tar.gz (16.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page