Find PII data in databases
Project description
Pii Catcher for MySQL, PostgreSQL & AWS Redshift
Overview
PiiCatcher finds PII data in your databases. It scans all the columns in your database and finds the following types of PII information:
- PHONE
- CREDIT_CARD
- ADDRESS
- PERSON
- LOCATION
PiiCatcher uses two types of scanners to detect PII information:
- CommonRegex uses a set of regular expressions for common types of information
- Spacy Named Entity Recognition uses Natural Language Processing to detect named entities. Only English language is currently supported.
PiiCatcher supports the following databases:
- Sqlite3 v3.24.0 or greater
- MySQL 5.6 or greater
- PostgreSQL 9.4 or greater
- AWS Redshift
Installation
PiiCatcher is available as a command-line application.
To install use pip:
python3 -m venv .env
source .env/bin/activate
pip install piicatcher
Or clone the repo:
git clone https://github.com/vrajat/piicatcher.git
python3 -m venv .env
source .env/bin/activate
python setup.py install
Install Spacy Language Model
python -m spacy download en_core_web_sm
Usage
# Print usage
$ piicatcher -h
usage: piicatcher [-h] -s HOST [-u USER] [-p PASSWORD] [-t {sqlite,mysql}]
[-o OUTPUT] [-f {ascii_table}]
optional arguments:
-h, --help show this help message and exit
-s HOST, --host HOST Hostname of the database. File path if it is SQLite
-u USER, --user USER Username to connect database
-p PASSWORD, --password PASSWORD
Password of the user
-t {sqlite,mysql}, --connection-type {sqlite,mysql}
Type of database
-o OUTPUT, --output OUTPUT
File path for report. If not specified, then report is
printed to sys.stdout
-f {ascii_table}, --output-format {ascii_table}
Choose output format type
Example
# run piicatcher on a sqlite db and print report to console
piicatcher -c '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ schema │ table │ column │ has_pii │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ main │ full_pii │ a │ 1 │
│ main │ full_pii │ b │ 1 │
│ main │ no_pii │ a │ 0 │
│ main │ no_pii │ b │ 0 │
│ main │ partial_pii │ a │ 1 │
│ main │ partial_pii │ b │ 0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
piicatcher-0.4.2.tar.gz
(9.3 kB
view hashes)
Built Distribution
Close
Hashes for piicatcher-0.4.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8976658ea26b9f3a08c50b565fa5c1ab4279ff99e9cfb42a8131bfbc6c6ce58 |
|
MD5 | 0e1715af909bcf637eb2929a2b501f8a |
|
BLAKE2b-256 | eca92e695737a271013d785889c1631cadf4d89d9d70260e01e1b132d64899a8 |