Skip to main content

Commandline tool to anonymize PostgreSQL databases

Project description

This commandline tool makes PostgreSQL database anonymization easy. It uses a YAML definition file to define which tables and fields should be anonymized and provides various methods of anonymization.

license pypi Download count build

1 Features

  • Anonymize PostgreSQL tables on data level entry with various methods (s. table below)

  • Exclude data for anonymization depending on regular expressions

  • Truncate entire tables for unwanted data

Field

Value

Provider

Output

first_name

John

choice

(Bob|Larry|Lisa)

title

Dr.

clear

street

Irving St

faker.street_name

Miller Station

password

dsf82hFxcM

mask

XXXXXXXXXX

email

jane.doe@example.com

md5

0cba00ca3da1b283a57287bcceb17e35

ip

157.50.1.20

set

127.0.0.1

See the documentation for a more detailed description of the provided anonymization methods.

2 Installation

The default installation method is to use pip:

$ pip install pganonymize

3 Usage

usage: pganonymize [-h] [-v] [-l] [--schema SCHEMA] [--dbname DBNAME]
               [--user USER] [--password PASSWORD] [--host HOST]
               [--port PORT] [--dry-run] [--dump-file DUMP_FILE]

Anonymize data of a PostgreSQL database

optional arguments:
-h, --help            show this help message and exit
-v, --verbose         Increase verbosity
-l, --list-providers  Show a list of all available providers
--schema SCHEMA       A YAML schema file that contains the anonymization
                        rules
--dbname DBNAME       Name of the database
--user USER           Name of the database user
--password PASSWORD   Password for the database user
--host HOST           Database hostname
--port PORT           Port of the database
--dry-run             Don't commit changes made on the database
--dump-file DUMP_FILE
                        Create a database dump file with the given name

Despite the database connection values, you will have to define a YAML schema file, that includes all anonymization rules for that database. Take a look at the schema documentation or the YAML sample schema.

Example call:

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    -v

3.1 Database dump

With the --dump-file argument it is possible to create a dump file after anonymizing the database. Please note, that the pg_dump command from the postgresql-client-common library is necessary to create the dump file for the database, e.g. under Linux:

sudo apt-get install postgresql-client-common

Example call:

$ pganonymize --schema=myschema.yml \
    --dbname=test_database \
    --user=username \
    --password=mysecret \
    --host=db.host.example.com \
    --dump-file=/tmp/dump.gz \
    -v

4 Quickstart

Clone repo:

$ git clone git@github.com:rheinwerk-verlag/postgresql-anonymizer.git
$ cd postgresql-anonymizer

For making changes and developing pganonymizer, you need to install poetry:

$ sudo pip install poetry

Now you can install all requirements and activate the virtualenv:

$ poetry install
$ poetry shell

5 Docker

If you want to run the anonymizer within a Docker container you first have to build the image:

$ docker build -t pganonymizer .

After that you can pass a schema file to the container, using Docker volumes, and call the anonymizer:

$ docker run \
    -v <path to your schema>:/schema.yml \
    -it pganonymizer \
    /usr/local/bin/pganonymize \
    --schema=/schema.yml \
    --dbname=<database> \
    --user=<user> \
    --password=<password> \
    --host=<host> \
    -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pganonymize-0.3.3.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

pganonymize-0.3.3-py2.py3-none-any.whl (11.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pganonymize-0.3.3.tar.gz.

File metadata

  • Download URL: pganonymize-0.3.3.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/2.7.17rc1

File hashes

Hashes for pganonymize-0.3.3.tar.gz
Algorithm Hash digest
SHA256 8974b9a73a816037a840da1b5dadfb0fba9756e126a91d3bed3c8fcb1c35005b
MD5 473bc3dc59218670c858401c6a893367
BLAKE2b-256 246f6f9f3cdd8d0eed3adb1cbb688ac18d79e28043239c05e58663d9017e3881

See more details on using hashes here.

Provenance

File details

Details for the file pganonymize-0.3.3-py2.py3-none-any.whl.

File metadata

  • Download URL: pganonymize-0.3.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/2.7.17rc1

File hashes

Hashes for pganonymize-0.3.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b48d6eb97a39bf15fd5c1ee5a6a1291f1c0c5020c071e27970656785a74863de
MD5 e43d1b0d04f5d8b356a653a43726267d
BLAKE2b-256 2659c381946e293e01a2f825371333089d59e94899ea7415b91c8a3484dd403a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page