Skip to main content

Data PII cleaning/masking for PostgreSQL

Project description

datamask

Mask sensitive data in a PostgreSQL database (PII/PHI) for development/testing purposes.

Uses native PostgreSQL operations for masking - no data leaves the database.

Installation

pip install datamask

Usage

1. Create a data dictionary

Generate a CSV data dictionary from your database schema:

datadict 'postgresql://<user>:<password>@<host>/<database>' <schema> my_pii_dd.csv

Edit the CSV and set pii to yes for columns that need masking, and pii_type to one of the available faker types. Run datamask -l to list all available fakers.

2. Mask the data

datamask -d 'postgresql://<user>:<password>@<host>/<database>' -f my_pii_dd.csv

3. Updating the data dictionary

When your schema changes, regenerate the data dictionary using your existing one as a seed:

datadict 'postgresql://<user>:<password>@<host>/<database>' <schema> -i my_existing_dd.csv my_new_pii_dd.csv

Advanced options

Skip specific rows from masking using --keep with a YAML file:

# keep.yaml
schema.table_name:
  - pk_value_1
  - pk_value_2

Set fixed values for specific rows using --fixed with a YAML file:

# fixed.yaml
schema.table_name:
  pk_value:
    column_name: "fixed value"

Available fakers

Run datamask -l to see all available faker types. Includes: person_name, person_firstname, person_familyname, email, address, city, zipcode, phonenumber, business_name, username, password, url, url_image, inet_addr, text, text_short, filename, slug, serial, int, tla, user_agent, static_str, null.

Caveats

Never run this against a production database. I'm not responsible for your data.

License

MIT License - Copyright (c) 2021, Fredrik Håård

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamask-3.0.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datamask-3.0.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file datamask-3.0.0.tar.gz.

File metadata

  • Download URL: datamask-3.0.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.11.6 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for datamask-3.0.0.tar.gz
Algorithm Hash digest
SHA256 4c322bcb1b56b3e1b1f219bf229e37c5269accbfd532befea354504189bf2ea1
MD5 e3aa9c2cba04319a203ab5f16d090a43
BLAKE2b-256 cd7743a9bf3cfd0940cf584be44f1f26f7864ffd6344a1bc27e76cd9726eb980

See more details on using hashes here.

File details

Details for the file datamask-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: datamask-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.11.6 Linux/6.6.87.2-microsoft-standard-WSL2

File hashes

Hashes for datamask-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4372caf1c1adc4d58c6b7efb0166bfd033e2b3994dc5153f5d697b5d9a24fcff
MD5 0c1d0e9e7a07b849af58c5876d66928a
BLAKE2b-256 33ed3097f55bc74a1685032aef7bd70b06abb2aebb9ec7e6c80dce65ce0ecbf9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page