Regex modules for the extraction of PII from text chunks
Project description
Pii Extractor plugin: regex
This repository builds a Python package that installs a pii-extract-base
plugin to performs PII detection for text data based on regular expressions
(with optional context). The name of the plugin entry point is
piisa-detectors-regex
.
The PII Tasks in the package are structured by language & country, since many of the PII elements are language- and/or -country dependent.
Requirements
The package
- needs at least Python 3.8
- needs the pii-data and the pii-extract-base base packages
- uses the python-stdnum package to validate numeric identifiers
Usage
The package does not have any user-facing entry points, and it is used automatically by the PIISA framework.
Building
The provided Makefile can be used to process the package:
make pkg
will build the Python package, creating a file that can be installed withpip
make unit
will launch all unit tests (using pytest, so pytest must be available)make install
will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:- the one defined in the
VENV
environment variable, if it is defined - if there is a virtualenv activated in the shell, it will be used
- otherwise, a default is chosen as
/opt/venv/bigscience
(it will be created if it does not exist)
- the one defined in the
Contributing
To add a new PII processing task, please see the contributing instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for pii-extract-plg-regex-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4f66b9270226273e2af85f0beb6b5da9f9b595e1938b5db01f1b70de2b0302c |
|
MD5 | 9aaad61d62cc876f98a8bd72a15898b7 |
|
BLAKE2b-256 | 75e69a6ae9cb638336cf2b20f35498f9ec5815b026afa0baf89b4a95c5b2fd8d |