Skip to main content

Transform recognized PII instances in a document

Project description

pii-transform

This package takes a source document, a collection of detected PII instances, and transforms the document by replacing the PII instances in the document with a different representation.

The type of substitution done is defined by transformation policies.

Note: pii-transform does not implement or use Transformer models for PII purposes (for the extraction of PII Instances using Transformer models, see pii-extract-plg-transformers or pii-extract-plg-presidio).

Command-line scripts

The package provides three console scripts:

  • pii-transform loads a source document & a collection of already-detected PII, and produces a transformed document following the required policies.
  • pii-process is a full end-to-end script:
    • loads a document, from among the formats supported by pii-preprocess
    • detects PII instances, according to pii-extract and its installed plugins
    • transforms the detected PII instances (according to the indicated policy) and writes out the transformed documennt
  • pii-process-jsonl is also a full end-to-end script; this one reads `JSONL files and processes each line as a separate text buffer (possibly in different languages), producing a transformed JSONL document

end-to-end installation

Note that pii-process & pii-process-jsonl will need additional packages to be installed:

  • pii-preprocess (only when using pii-process)
  • pii-extract-base, together with any desired detection plugins, e.g. pii-extract-plg-regex, pii-extract-plg-transformers, and/or pii-extract-plg-presidio
  • pii-decide

This installation can be performed explicitly, choosing the packages & plugins to install. There is also an automatic dependency installation, which installs a default set of packages, by adding the e2e qualifier upon installation of this package, i.e.:

      pip install pii-transforme2e

... and this will install pii_preprocess, pii-extract-base, pii-extract-plg-regex, pii-extract-plg-transformers and pii-decide

Note that you will also need to install Pytorch, so that the models used by the pii-extract-plg-transformers package can run. See the transformers plugin documentation for more information,

API

The same functionality provided by the command-line scripts can also be accessed via a Python API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii-transform-0.6.0.tar.gz (24.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page