Skip to main content

Regex modules for the extraction of PII from text chunks

Project description

Pii Extractor plugin: regex

This repository builds a Python package that installs a pii-extract-base plugin to performs PII detection for text data based on regular expressions (with optional context). The name of the plugin entry point is piisa-detectors-regex.

The PII Tasks in the package are structured by language & country, since many of the PII elements are language- and/or -country dependent.

Requirements

The package

  • needs at least Python 3.8
  • needs the pii-data and the pii-extract-base base packages
  • uses the regex package (instead of the standard re package in the core Python library)
  • uses the python-stdnum package to validate many identifiers (and the python-phonenumbers to validate phone numbers)

Usage

The package does not have any user-facing entry points, and it is used automatically by the PIISA framework.

Building

The provided Makefile can be used to process the package:

  • make pkg will build the Python package, creating a file that can be installed with pip
  • make unit will launch all unit tests (using pytest, so pytest must be available)
  • make install will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:
    • the one defined in the VENV environment variable, if it is defined
    • if there is a virtualenv activated in the shell, it will be used
    • otherwise, a default is chosen as /opt/venv/bigscience (it will be created if it does not exist)

Contributing

To add a new PII processing task, please see the contributing instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii-extract-plg-regex-0.5.1.tar.gz (24.1 kB view details)

Uploaded Source

File details

Details for the file pii-extract-plg-regex-0.5.1.tar.gz.

File metadata

  • Download URL: pii-extract-plg-regex-0.5.1.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for pii-extract-plg-regex-0.5.1.tar.gz
Algorithm Hash digest
SHA256 1ff882fa5a36c39633aa93c38731869acecd5f71bbecc46c856e2219b07d1d85
MD5 526e698972703cf3240043bd5eadb52c
BLAKE2b-256 ab8168ee28e00787824e53f0f00c81eec7d5cdd3ef160889dd7524d1f3632112

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page