Skip to main content

Extraction of PII from text chunks

Project description

Pii Extract Base

This repository builds a Python package providing a base library for PII detection for Source Documents i.e. extraction of PII (Personally Identifiable Information aka Personal Data) items existing in the document.

The package itself does not implement any PII Detection tasks, it only provides the base infrastructure for the process. Detection tasks must be supplied externally.

Requirements

The package needs

  • at least Python 3.8
  • the pii-data base package
  • one or more pii-extract plugins (to actually do real detection work)

Usage

The package can be used:

  • As an API, in two flavors: function-based API and object-based API
  • As a command-line tool

For details, see the usage document.

Building

The provided Makefile can be used to process the package:

  • make pkg will build the Python package, creating a file that can be installed with pip
  • make unit will launch all unit tests (using pytest, so pytest must be available)
  • make install will install the package in a Python virtualenv. The virtualenv will be chosen as, in this order:
    • the one defined in the VENV environment variable, if it is defined
    • if there is a virtualenv activated in the shell, it will be used
    • otherwise, a default is chosen as /opt/venv/pii (it will be created if it does not exist)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii-extract-base-0.7.0.tar.gz (33.5 kB view details)

Uploaded Source

File details

Details for the file pii-extract-base-0.7.0.tar.gz.

File metadata

  • Download URL: pii-extract-base-0.7.0.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for pii-extract-base-0.7.0.tar.gz
Algorithm Hash digest
SHA256 e2f74859d3f1159981db69a4f8a60663f82ee576de5a98cd0025a80cda7e7947
MD5 d2bcd9d24d49018e251c04a7fee8f9d8
BLAKE2b-256 78ddab5ddf307eb112d547b9a71090d92f6f5d4902a2d25cf4353367499acb2d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page