Find PII data in databases
Project description
PII Catcher for Databases and Data Warehouses
Overview
PIICatcher is a data catalog and scanner for PII and PHI information. It finds PII data in your databases and file systems and tracks critical data. The data catalog can be used as a foundation to build governance, compliance and security applications.
Check out AWS Glue & Lake Formation Privilege Analyzer for an example of how piicatcher is used in production.
Quick Start
PIICatcher is available as a docker image or command-line application.
Docker
docker run tokern/piicatcher:latest scan sqlite --path '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ schema │ table │ column │ has_pii │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ main │ full_pii │ a │ 1 │
│ main │ full_pii │ b │ 1 │
│ main │ no_pii │ a │ 0 │
│ main │ no_pii │ b │ 0 │
│ main │ partial_pii │ a │ 1 │
│ main │ partial_pii │ b │ 0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
Command-line
To install use pip:
python3 -m venv .env
source .env/bin/activate
pip install piicatcher
# Install Spacy English package
python -m spacy download en_core_web_sm
# run piicatcher on a sqlite db and print report to console
piicatcher scan sqlite --path '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ schema │ table │ column │ has_pii │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ main │ full_pii │ a │ 1 │
│ main │ full_pii │ b │ 1 │
│ main │ no_pii │ a │ 0 │
│ main │ no_pii │ b │ 0 │
│ main │ partial_pii │ a │ 1 │
│ main │ partial_pii │ b │ 0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
API
from piicatcher.api import scan_postgresql
# PIICatcher uses a catalog to store its state.
# The easiest option is to use a sqlite memory database.
# For production usage check, https://tokern.io/docs/data-catalog
catalog_params={'catalog_path': ':memory:'}
output = scan_postrgresql(catalog_params=catalog_params, name="pg_db", uri="127.0.0.1",
username="piiuser", password="p11secret", database="piidb",
include_table_regex=["sample"])
print(output)
# Example Output
[['public', 'sample', 'gender', 'PiiTypes.GENDER'],
['public', 'sample', 'maiden_name', 'PiiTypes.PERSON'],
['public', 'sample', 'lname', 'PiiTypes.PERSON'],
['public', 'sample', 'fname', 'PiiTypes.PERSON'],
['public', 'sample', 'address', 'PiiTypes.ADDRESS'],
['public', 'sample', 'city', 'PiiTypes.ADDRESS'],
['public', 'sample', 'state', 'PiiTypes.ADDRESS'],
['public', 'sample', 'email', 'PiiTypes.EMAIL']]
Supported Databases
PIICatcher supports the following databases:
- Sqlite3 v3.24.0 or greater
- MySQL 5.6 or greater
- PostgreSQL 9.4 or greater
- AWS Redshift
- AWS Athena
- Snowflake
Documentation
For advanced usage refer documentation PIICatcher Documentation.
Survey
Please take this survey if you are a user or considering using PIICatcher. The responses will help to prioritize improvements to the project.
Contributing
For Contribution guidelines, PIICatcher Developer documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for piicatcher-0.17.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88f7cc7aa307b145363cd33584e0a565523b6928ff4589949cd756d2c15c8cc3 |
|
MD5 | 197ca9c3e9f86cf29eff506f6e584220 |
|
BLAKE2b-256 | 3ddee300fc80ac0cc2339913e60b9a51f116c02792f5f2de18935727d151dc97 |