Data PII cleaning/masking for PostgreSQL
Project description
datamask
Mask sensitive data in a PostgreSQL database (PII/PHI) for development/testing purposes.
Uses native PostgreSQL operations for masking - no data leaves the database.
Installation
pip install datamask
Usage
1. Create a data dictionary
Generate a CSV data dictionary from your database schema:
datadict 'postgresql://<user>:<password>@<host>/<database>' <schema> my_pii_dd.csv
Edit the CSV and set pii to yes for columns that need masking, and pii_type to one of the
available faker types. Run datamask -l to list all available fakers.
2. Mask the data
datamask -d 'postgresql://<user>:<password>@<host>/<database>' -f my_pii_dd.csv
3. Updating the data dictionary
When your schema changes, regenerate the data dictionary using your existing one as a seed:
datadict 'postgresql://<user>:<password>@<host>/<database>' <schema> -i my_existing_dd.csv my_new_pii_dd.csv
Advanced options
Skip specific rows from masking using --keep with a YAML file:
# keep.yaml
schema.table_name:
- pk_value_1
- pk_value_2
Set fixed values for specific rows using --fixed with a YAML file:
# fixed.yaml
schema.table_name:
pk_value:
column_name: "fixed value"
Available fakers
Run datamask -l to see all available faker types. Includes: person_name, person_firstname,
person_familyname, email, address, city, zipcode, phonenumber, business_name,
username, password, url, url_image, inet_addr, text, text_short, filename,
slug, serial, int, tla, user_agent, static_str, null.
Caveats
Never run this against a production database. I'm not responsible for your data.
License
MIT License - Copyright (c) 2021, Fredrik Håård
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datamask-3.0.0.tar.gz.
File metadata
- Download URL: datamask-3.0.0.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.11.6 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c322bcb1b56b3e1b1f219bf229e37c5269accbfd532befea354504189bf2ea1
|
|
| MD5 |
e3aa9c2cba04319a203ab5f16d090a43
|
|
| BLAKE2b-256 |
cd7743a9bf3cfd0940cf584be44f1f26f7864ffd6344a1bc27e76cd9726eb980
|
File details
Details for the file datamask-3.0.0-py3-none-any.whl.
File metadata
- Download URL: datamask-3.0.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.3 CPython/3.11.6 Linux/6.6.87.2-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4372caf1c1adc4d58c6b7efb0166bfd033e2b3994dc5153f5d697b5d9a24fcff
|
|
| MD5 |
0c1d0e9e7a07b849af58c5876d66928a
|
|
| BLAKE2b-256 |
33ed3097f55bc74a1685032aef7bd70b06abb2aebb9ec7e6c80dce65ce0ecbf9
|