Skip to main content

Extraction of PII from text chunks

Project description

Note: this repository is obsolete. It has been superseded by pii-extract-base(https://github.com/piisa/pii-extract-base) and pii-extract-plg-regex(https://github.com/piisa/pii-extract-plg-regex)


Pii Extractor

This repository builds a Python package that performs PII detection for text data i.e. extraction of PII (Personally Identifiable Information aka Personal Data) items existing in the text.

The PII Tasks in the package are structured by language & country, since many of the PII elements are language- and/or -country dependent.

Requirements

The package

  • needs at least Python 3.8
  • needs the pii-data base package
  • uses the python-stdnum package to validate identifiers, and needs the

Usage

The package can be used:

  • As an API, in two flavors: function-based API and object-based API
  • As a command-line tool

For details, see the usage document.

Building

The provided Makefile can be used to process the package:

  • make pkg will build the Python package, creating a file that can be installed with pip
  • make unit will launch all unit tests (using pytest, so pytest must be available)
  • make install will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:
    • the one defined in the VENV environment variable, if it is defined
    • if there is a virtualenv activated in the shell, it will be used
    • otherwise, a default is chosen as /opt/venv/bigscience (it will be created if it does not exist)

Contributing

To add a new PII processing task, please see the contributing instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii-extract-0.0.2.tar.gz (26.6 kB view details)

Uploaded Source

File details

Details for the file pii-extract-0.0.2.tar.gz.

File metadata

  • Download URL: pii-extract-0.0.2.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for pii-extract-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b870c2c53e20f8902658f305c4f57293d83a535d02542d6d9e9eced078bd0dbd
MD5 27f1acdc9c3e63f45760e9ce7cbed8b4
BLAKE2b-256 3e36eed05c0abe2d3cb813df13acfaeaf966d35bb86cf48604d137e6155f3c0e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page