Extraction of PII from text chunks
Project description
Pii Extract Base
This repository builds a Python package providing a base library for PII detection for Source Documents i.e. extraction of PII (Personally Identifiable Information aka Personal Data) items existing in the document.
The package itself does not implement any PII Detection tasks, it only provides the base infrastructure for the process. Detection tasks must be supplied externally.
Requirements
The package needs
- at least Python 3.8
- the pii-data base package
- one or more pii-extract plugins (to actually do real detection work)
Usage
The package can be used:
- As an API, in two flavors: function-based API and object-based API
- As a command-line tool
For details, see the usage document.
Building
The provided Makefile can be used to process the package:
make pkg
will build the Python package, creating a file that can be installed withpip
make unit
will launch all unit tests (using pytest, so pytest must be available)make install
will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:- the one defined in the
VENV
environment variable, if it is defined - if there is a virtualenv activated in the shell, it will be used
- otherwise, a default is chosen as
/opt/venv/pii
(it will be created if it does not exist)
- the one defined in the
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pii-extract-base-0.7.0.tar.gz
(33.5 kB
view details)
File details
Details for the file pii-extract-base-0.7.0.tar.gz
.
File metadata
- Download URL: pii-extract-base-0.7.0.tar.gz
- Upload date:
- Size: 33.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2f74859d3f1159981db69a4f8a60663f82ee576de5a98cd0025a80cda7e7947 |
|
MD5 | d2bcd9d24d49018e251c04a7fee8f9d8 |
|
BLAKE2b-256 | 78ddab5ddf307eb112d547b9a71090d92f6f5d4902a2d25cf4353367499acb2d |