Find PII data in databases
Project description
Pii Catcher for Files and Databases
Overview
PiiCatcher finds PII data in your databases. It scans all the columns in your database and finds the following types of PII information:
- PHONE
- CREDIT_CARD
- ADDRESS
- PERSON
- LOCATION
- BIRTH_DATE
- GENDER
- NATIONALITY
- IP_ADDRESS
- SSN
- USER_NAME
- PASSWORD
PiiCatcher uses two types of scanners to detect PII information:
- CommonRegex uses a set of regular expressions for common types of information
- Spacy Named Entity Recognition uses Natural Language Processing to detect named entities. Only English language is currently supported.
Supported Technologies
PiiCatcher supports the following filesystems:
- POSIX
- AWS S3 (Coming soon)
- Google Cloud Storage (Coming Soon)
- ADLS (Coming Soon)
PiiCatcher supports the following databases:
- Sqlite3 v3.24.0 or greater
- MySQL 5.6 or greater
- PostgreSQL 9.4 or greater
- AWS Redshift
- SQL Server
- Oracle
Installation
PiiCatcher is available as a command-line application.
To install use pip:
python3 -m venv .env
source .env/bin/activate
pip install piicatcher
Or clone the repo:
git clone https://github.com/vrajat/piicatcher.git
python3 -m venv .env
source .env/bin/activate
python setup.py install
Install Spacy Language Model
python -m spacy download en_core_web_sm
Install Oracle Client
PiiCatcher on Oracle, requires a working client. Please refer to cx_Oracle documentation for more information.
Usage
# Print usage to scan databases
piicatcher db -h
usage: piicatcher db [-h] -s HOST [-R PORT] [-u USER] [-p PASSWORD]
[-t {sqlite,mysql,postgres}] [-c {deep,shallow}]
[-o OUTPUT] [-f {ascii_table,json,orm}]
optional arguments:
-h, --help show this help message and exit
-s HOST, --host HOST Hostname of the database. File path if it is SQLite
-R PORT, --port PORT Port of database.
-u USER, --user USER Username to connect database
-p PASSWORD, --password PASSWORD
Password of the user
-t {sqlite,mysql,postgres}, --connection-type {sqlite,mysql,postgres}
Type of database
-c {deep,shallow}, --scan-type {deep,shallow}
Choose deep(scan data) or shallow(scan column names
only)
-o OUTPUT, --output OUTPUT
File path for report. If not specified, then report is
printed to sys.stdout
-f {ascii_table,json,orm}, --output-format {ascii_table,json,orm}
Choose output format type
usage: piicatcher files [-h] [--path PATH] [--output OUTPUT]
[--output-format {ascii_table,json,orm}]
piicatcher files -h
# Print usage to scan databases
optional arguments:
-h, --help show this help message and exit
--path PATH Path to file or directory
--output OUTPUT File path for report. If not specified, then report is
printed to sys.stdout
--output-format {ascii_table,json,orm}
Choose output format type
Example
# run piicatcher on a sqlite db and print report to console
piicatcher db -c '/db/sqlqb'
╭─────────────┬─────────────┬─────────────┬─────────────╮
│ schema │ table │ column │ has_pii │
├─────────────┼─────────────┼─────────────┼─────────────┤
│ main │ full_pii │ a │ 1 │
│ main │ full_pii │ b │ 1 │
│ main │ no_pii │ a │ 0 │
│ main │ no_pii │ b │ 0 │
│ main │ partial_pii │ a │ 1 │
│ main │ partial_pii │ b │ 0 │
╰─────────────┴─────────────┴─────────────┴─────────────╯
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
piicatcher-0.5.0.tar.gz
(11.7 kB
view hashes)
Built Distribution
Close
Hashes for piicatcher-0.5.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d2769abeea44de24e30fc73bec06589061d9c9b31fceae6233decb7bb5fb548 |
|
MD5 | b7ee41b0a8745a91883288c9789b622e |
|
BLAKE2b-256 | 2055bb774e3a7e77682b0675a15a5fb2c5b8ff723a232a6b9a92ecdef128702b |