Skip to main content

Detect PII columns in your database and warehouse

Project description

🔍 Detect PII

Detect PII is a library inspired by piicatcher and CommonRegex to detect columns in tables that may potentially contain PII. It does so by performing regex matches on column names and column values, flagging the ones that may contain PII.

Usage

Installation

pip install detectpii

Scan tables for PII

from detectpii.catalog import PostgresCatalog
from detectpii.pipeline import PiiDetectionPipeline
from detectpii.scanner import DataScanner, MetadataScanner
from detectpii.util import print_columns

# -- Create a catalog to connect to a database / warehouse
pg_catalog = PostgresCatalog(
    host="localhost",
    user="postgres",
    password="my-secret-pw",
    database="postgres",
    port=5432,
    schema="public"
)

# -- Create a pipeline to detect PII in the tables
pipeline = PiiDetectionPipeline(
    catalog=pg_catalog,
    scanners=[
        MetadataScanner(),
        DataScanner(percentage=20, times=2,),
    ]
)

# -- Scan for PII columns.
pii_columns = pipeline.scan()

# -- Print them to the console
print_columns(pii_columns)

Persist the pipeline

import json
from detectpii.pipeline import pipeline_to_dict

# -- Create a pipeline
pipeline = ...

# -- Convert it into a dictionary
dictionary = pipeline_to_dict(pipeline)

# -- Print it
print(json.dumps(dictionary, indent=4))

# {
#     "catalog": {
#         "tables": [],
#         "resolver": {
#             "name": "PlaintextResolver",
#             "_type": "PlaintextResolver"
#         },
#         "user": "postgres",
#         "password": "my-secret-pw",
#         "host": "localhost",
#         "port": 5432,
#         "database": "postgres",
#         "schema": "public",
#         "_type": "PostgresCatalog"
#     },
#     "scanners": [
#         {
#             "_type": "MetadataScanner"
#         },
#         {
#             "times": 2,
#             "percentage": 20,
#             "_type": "DataScanner"
#         }
#     ]
# }

Load the pipeline

from detectpii.pipeline import dict_to_pipeline

# -- Load the persisted pipeline as a dictionary
dictionary: dict = ...

# -- Convert it back to a pipeline object
pipeline = dict_to_pipeline(dictionary=dictionary)

For more detailed documentation, please see the docs folder.

Supported databases / warehouses

  • Postgres
  • Snowflake
  • Trino
  • Yugabyte

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectpii-0.1.3.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

detectpii-0.1.3-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file detectpii-0.1.3.tar.gz.

File metadata

  • Download URL: detectpii-0.1.3.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.4.0

File hashes

Hashes for detectpii-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0b113b3cf87d139427527405ea123ef12e8a992d1b21157d844b57b78b5122c8
MD5 f7a67f3503d470edd01ad163ee284e71
BLAKE2b-256 012c31dc78521b213c8a4d45102c405a1b3a80296f7007b0afb4b67c68244291

See more details on using hashes here.

File details

Details for the file detectpii-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: detectpii-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.4.0

File hashes

Hashes for detectpii-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 00b4dc3f5ff29f21da2205cb2a2e5f818a97fefbed6d4aa6b9844f93a218ccfa
MD5 982153ee2a7216ca70e11971ce03d060
BLAKE2b-256 a45340e96584248ecc117954fa4ff7cfbc4a987475b224b9e09150aafa049cf5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page