Skip to main content

Find PII data in databases

Project description

CircleCI codecov PyPI image image image

PII Catcher for Files and Databases

Overview

PIICatcher is a data catalog and scanner for PII and PHI information. It finds PII data in your databases and file systems and tracks critical data. The data catalog can be used as a foundation to build governance, compliance and security applications.

Check out AWS Glue & Lake Formation Privilege Analyzer for an example of how piicatcher is used in production.

Quick Start

PIICatcher is available as a docker image or command-line application.

Docker

docker run tokern/piicatcher:latest db -c '/db/sqlqb'

╭─────────────┬─────────────┬─────────────┬─────────────╮
│   schema    │    table    │   column    │   has_pii   │
├─────────────┼─────────────┼─────────────┼─────────────┤
│        main │    full_pii │           a │           1 │
│        main │    full_pii │           b │           1 │
│        main │      no_pii │           a │           0 │
│        main │      no_pii │           b │           0 │
│        main │ partial_pii │           a │           1 │
│        main │ partial_pii │           b │           0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯

Command-line

To install use pip:

python3 -m venv .env
source .env/bin/activate
pip install piicatcher

# Install Spacy English package
python -m spacy download en_core_web_sm

# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│   schema    │    table    │   column    │   has_pii   │
├─────────────┼─────────────┼─────────────┼─────────────┤
│        main │    full_pii │           a │           1 │
│        main │    full_pii │           b │           1 │
│        main │      no_pii │           a │           0 │
│        main │      no_pii │           b │           0 │
│        main │ partial_pii │           a │           1 │
│        main │ partial_pii │           b │           0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯

Supported Technologies

PIICatcher supports the following filesystems:

  • POSIX
  • AWS S3 (for files that are part of tables in AWS Glue and AWS Athena)
  • Google Cloud Storage (Coming Soon)
  • ADLS (Coming Soon)

PIICatcher supports the following databases:

  1. Sqlite3 v3.24.0 or greater
  2. MySQL 5.6 or greater
  3. PostgreSQL 9.4 or greater
  4. AWS Redshift
  5. Oracle
  6. AWS Glue/AWS Athena
  7. Snowflake

Documentation

For advanced usage refer documentation PIICatcher Documentation.

Survey

Please take this survey if you are a user or considering using PIICatcher. The responses will help to prioritize improvements to the project.

Contributing

For Contribution guidelines, PIICatcher Developer documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for piicatcher, version 0.13.0
Filename, size File type Python version Upload date Hashes
Filename, size piicatcher-0.13.0-py2.py3-none-any.whl (27.3 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size piicatcher-0.13.0.tar.gz (18.8 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page