Skip to main content

Commandline tool to anonymize PostgreSQL databases

Project description

A commandline tool to anonymize PostgreSQL databases for DSGVO/GDPR purposes.

It uses a YAML file to define which tables and fields should be anonymized and provides various methods of anonymization. The tool requires a direct PostgreSQL connection to perform the anonymization.

PyPI - Python Version license pypi Download count build pganonymize

docs/_static/demo.gif

Features

  • Intentionally compatible with Python 2.7 (for old, productive platforms)

  • Anonymize PostgreSQL tables on data level entry with various providers (some examples in the table below)

  • Exclude data for anonymization depending on regular expressions or SQL WHERE clauses

  • Truncate entire tables for unwanted data

Field

Value

Provider

Output

first_name

John

choice

(Bob|Larry|Lisa)

title

Dr.

clear

street

Irving St

faker.street_name

Miller Station

password

dsf82hFxcM

mask

XXXXXXXXXX

credit_card

1234-567-890

partial_mask

1??????????0

email

jane.doe@example.com

md5

0cba00ca3da1b283a57287bcceb17e35

email

jane.doe@example.com

faker.unique.email

alex7@sample.com

phone_num

65923473

md5 as_number: True

3948293448

ip

157.50.1.20

set

127.0.0.1

uuid_col

00010203-0405-……

uuid4

f7c1bd87-4d….

  • Note: faker.unique.[provider] only supported on Python 3.6+ (Faker library min. supported python version)

  • Note: uuid4 - only for (native uuid4) columns

See the documentation for a more detailed description of the provided anonymization methods.

Installation

The default installation method is to use pip:

$ pip install pganonymize

Usage

usage: pganonymize [-h] [-v] [-l] [--schema SCHEMA] [--dbname DBNAME]
               [--user USER] [--password PASSWORD] [--host HOST]
               [--port PORT] [--dry-run] [--dump-file DUMP_FILE]

Anonymize data of a PostgreSQL database

optional arguments:
-h, --help            show this help message and exit
-v, --verbose         Increase verbosity
-l, --list-providers  Show a list of all available providers
--schema SCHEMA       A YAML schema file that contains the anonymization
                        rules
--dbname DBNAME       Name of the database
--user USER           Name of the database user
--password PASSWORD   Password for the database user
--host HOST           Database hostname
--port PORT           Port of the database
--dry-run             Don't commit changes made on the database
--dump-file DUMP_FILE
                        Create a database dump file with the given name
--init-sql INIT_SQL   SQL to run before starting anonymization

Despite the database connection values, you will have to define a YAML schema file, that includes all anonymization rules for that database. Take a look at the schema documentation or the YAML sample schema.

Example calls:

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    -v

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    --init-sql "set search_path to non_public_search_path; set work_mem to '1GB';" \
    -v

Database dump

With the --dump-file argument it is possible to create a dump file after anonymizing the database. Please note, that the pg_dump command from the postgresql-client-common library is necessary to create the dump file for the database, e.g. under Linux:

$ sudo apt-get install postgresql-client-common

Example call:

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    --dump-file=/tmp/dump.gz \
    -v

Docker

If you want to run the anonymizer within a Docker container you first have to build the image:

$ docker build -t pganonymize .

After that you can pass a schema file to the container, using Docker volumes, and call the anonymizer:

$ docker run \
    -v <path to your schema>:/schema.yml \
    -it pganonymize \
    /usr/local/bin/pganonymize \
    --schema=/schema.yml \
    --dbname=<database> \
    --user=<user> \
    --password=<password> \
    --host=<host> \
    -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pganonymize-0.9.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pganonymize-0.9.0-py2.py3-none-any.whl (13.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pganonymize-0.9.0.tar.gz.

File metadata

  • Download URL: pganonymize-0.9.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for pganonymize-0.9.0.tar.gz
Algorithm Hash digest
SHA256 b579b337f56cb4cd00fa311f936aacdff0d3fb3474392d74f49c36c8749ddc40
MD5 d3fb8adbfbc5a34c58817d75c7a22ec3
BLAKE2b-256 4b69512071de2eb1b42bfe11d7a6caf1b78761f2a256913b13630af8d26c3b7c

See more details on using hashes here.

File details

Details for the file pganonymize-0.9.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pganonymize-0.9.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for pganonymize-0.9.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ba0ca9c52482ec3a15ba751a911307a02506cf6d8027af5c06c190409b2cdd84
MD5 ce3d1da8b798f015d4f80953cff62fb8
BLAKE2b-256 ce6a91493b1df3c04ab3ee3bf4575fb2b617a483f790583a1b968089fabafa0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page