Regex modules for the extraction of PII from text chunks
Project description
Pii Extractor plugin: regex
This repository builds a Python package that installs a pii-extract-base plugin to performs PII detection for text data based on regular expressions (with optional context).
The PII Tasks in the package are structured by language & country, since many of the PII elements are language- and/or -country dependent.
Requirements
The package
- needs at least Python 3.8
- needs the pii-data and the pii-extract-base base packages
- uses the python-stdnum package to validate numeric identifiers
Usage
The package does not have any user-facing entry points, and it is used automatically by the PIISA framework.
Building
The provided Makefile can be used to process the package:
make pkg
will build the Python package, creating a file that can be installed withpip
make unit
will launch all unit tests (using pytest, so pytest must be available)make install
will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:- the one defined in the
VENV
environment variable, if it is defined - if there is a virtualenv activated in the shell, it will be used
- otherwise, a default is chosen as
/opt/venv/bigscience
(it will be created if it does not exist)
- the one defined in the
Contributing
To add a new PII processing task, please see the contributing instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for pii-extract-plg-regex-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f50ce18bb826622974e03505cbdf963d731dfcf1d48643fd5892c74fac22b17 |
|
MD5 | 73427bf1460b8570c13e02b3cd034d31 |
|
BLAKE2b-256 | 4ab698268fc1fae21c51373c7e8af59e444bbf172d85d462038c8a663a022f1a |