Skip to main content

Base data structures for PII Processing

Project description

pii-data

This package provides base data structures for the management of PII i.e. Personally Identifiable Information (it does not contain code for processing documents, or extracting PII from documents).

For the full specification embodied by these base data structures, check the PIISA Data Specification.

Data structures

Two main data types are defined to hold PII information: PII Entities and PII Collections. There is also a Source Document data type.

PII Source Document

A PII Source Document defines the raw data from which PII is detected. This document is modeled as a number of chunks, each one having an identifier and a data contents (a raw text excerpt, or other types of content). This is managed in this package by the SrcDocument class and subclasses.

The package contains the capability to dump a Source Document to a local file, following a standardized schema, and to read it back from the file. This schema uses YAML as support file format, and is the only document read capability natively provided by the package (to read other formats into Source Document objects there is an auxiliary pii-preprocess package, or you can implement yout own).

The package can also export documents as raw text files.

PII Collection

A PII Collection contains a list of detected/extracted PII Entities. Each entity contains all the information needed to correctly identify one PII instance and locate it in the document it belongs to.

These are the PII data classes defined:

  • PiiEntity: a PII instance (which in turn contains a PiiEntityInfo object)
  • PiiCollection: the full collection of PII (the additional PiiCollectionLoader subclass can load a collection from a JSON file)
  • PiiDetector: an object to describe the module used to generate a given PiiEntity object

Online behaviour

There is partial support to use these data classes in an streaming fashion, providing a way to feed data incrementally.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii-data-0.5.0.tar.gz (29.5 kB view details)

Uploaded Source

File details

Details for the file pii-data-0.5.0.tar.gz.

File metadata

  • Download URL: pii-data-0.5.0.tar.gz
  • Upload date:
  • Size: 29.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for pii-data-0.5.0.tar.gz
Algorithm Hash digest
SHA256 0d02dc73b6a5f5b59e60344e8e3c86c5ab48b24507d258dc4ec1cfd80e574d72
MD5 243b915a3088c6484626dc122f93178e
BLAKE2b-256 f7b06703899e470e3b95e914abe03ed8e0162b62959318f28c998c2bd9190eed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page