Skip to main content

Find PII data in databases

Project description

CircleCI codecov PyPI image image

Pii Catcher for Files and Databases

Overview

PiiCatcher finds PII data in your databases. It scans all the columns in your database and finds the following types of PII information:

  • PHONE
  • EMAIL
  • CREDIT_CARD
  • ADDRESS
  • PERSON
  • LOCATION
  • BIRTH_DATE
  • GENDER
  • NATIONALITY
  • IP_ADDRESS
  • SSN
  • USER_NAME
  • PASSWORD

PiiCatcher uses two types of scanners to detect PII information:

  1. CommonRegex uses a set of regular expressions for common types of information
  2. Spacy Named Entity Recognition uses Natural Language Processing to detect named entities. Only English language is currently supported.

Supported Technologies

PiiCatcher supports the following filesystems:

  • POSIX
  • AWS S3 (Coming soon)
  • Google Cloud Storage (Coming Soon)
  • ADLS (Coming Soon)

PiiCatcher supports the following databases:

  1. Sqlite3 v3.24.0 or greater
  2. MySQL 5.6 or greater
  3. PostgreSQL 9.4 or greater
  4. AWS Redshift
  5. SQL Server
  6. Oracle

Installation

PiiCatcher is available as a command-line application.

To install use pip:

python3 -m venv .env
source .env/bin/activate
pip install piicatcher

Or clone the repo:

git clone https://github.com/vrajat/piicatcher.git
python3 -m venv .env
source .env/bin/activate
python setup.py install

Install Spacy Language Model

python -m spacy download en_core_web_sm 

Install Oracle Client

PiiCatcher on Oracle, requires a working client. Please refer to cx_Oracle documentation for more information.

Usage

# Print usage to scan databases
piicatcher db -h
usage: piicatcher db [-h] -s HOST [-R PORT] [-u USER] [-p PASSWORD]
                 [-t {sqlite,mysql,postgres}] [-c {deep,shallow}]
                 [-o OUTPUT] [-f {ascii_table,json,orm}]

optional arguments:
  -h, --help            show this help message and exit
  -s HOST, --host HOST  Hostname of the database. File path if it is SQLite
  -R PORT, --port PORT  Port of database.
  -u USER, --user USER  Username to connect database
  -p PASSWORD, --password PASSWORD
                        Password of the user
  -t {sqlite,mysql,postgres}, --connection-type {sqlite,mysql,postgres}
                        Type of database
  -c {deep,shallow}, --scan-type {deep,shallow}
                        Choose deep(scan data) or shallow(scan column names
                        only)
  -o OUTPUT, --output OUTPUT
                        File path for report. If not specified, then report is
                        printed to sys.stdout
  -f {ascii_table,json,orm}, --output-format {ascii_table,json,orm}
                        Choose output format type

usage: piicatcher files [-h] [--path PATH] [--output OUTPUT]
                    [--output-format {ascii_table,json,orm}]


piicatcher files -h
# Print usage to scan databases
optional arguments:
  -h, --help            show this help message and exit
  --path PATH           Path to file or directory
  --output OUTPUT       File path for report. If not specified, then report is
                        printed to sys.stdout
  --output-format {ascii_table,json,orm}
                        Choose output format type

Example

# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│   schema    │    table    │   column    │   has_pii   │
├─────────────┼─────────────┼─────────────┼─────────────┤
│        main │    full_pii │           a │           1 │
│        main │    full_pii │           b │           1 │
│        main │      no_pii │           a │           0 │
│        main │      no_pii │           b │           0 │
│        main │ partial_pii │           a │           1 │
│        main │ partial_pii │           b │           0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piicatcher-0.5.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

piicatcher-0.5.0-py2.py3-none-any.whl (17.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file piicatcher-0.5.0.tar.gz.

File metadata

  • Download URL: piicatcher-0.5.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for piicatcher-0.5.0.tar.gz
Algorithm Hash digest
SHA256 68014b8f935f779ae635277e5bd9981710b029273ee2dd06ee7020b5e3398618
MD5 7126b8f15dd2712947750d709cf4ac8b
BLAKE2b-256 524856e63d5891cbaa9fbe3f2d44d45494d36dfecc1bd349a1f83894b1582ef7

See more details on using hashes here.

File details

Details for the file piicatcher-0.5.0-py2.py3-none-any.whl.

File metadata

  • Download URL: piicatcher-0.5.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for piicatcher-0.5.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3d2769abeea44de24e30fc73bec06589061d9c9b31fceae6233decb7bb5fb548
MD5 b7ee41b0a8745a91883288c9789b622e
BLAKE2b-256 2055bb774e3a7e77682b0675a15a5fb2c5b8ff723a232a6b9a92ecdef128702b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page