Commandline tool to anonymize PostgreSQL databases
Project description
This commandline tool makes PostgreSQL database anonymization easy. It uses a YAML definition file to define which tables and fields should be anonymized and provides various methods of anonymization.
1 Features
Anonymize PostgreSQL tables on data level entry with various methods (s. table below)
Exclude data for anonymization depending on regular expressions
Truncate entire tables for unwanted data
Field |
Value |
Provider |
Output |
---|---|---|---|
first_name |
John |
choice |
(Bob|Larry|Lisa) |
title |
Dr. |
clear |
|
street |
Irving St |
faker.street_name |
Miller Station |
password |
dsf82hFxcM |
mask |
XXXXXXXXXX |
md5 |
0cba00ca3da1b283a57287bcceb17e35 |
||
ip |
157.50.1.20 |
set |
127.0.0.1 |
See the documentation (https://python-postgresql-anonymizer.readthedocs.io/en/latest/) for a more detailed description of the provided anonymization methods.
2 Installation
The default installation method is to use pip:
$ pip install pganonymize
3 Usage
usage: pganonymize [-h] [-v] [-l] [--schema SCHEMA] [--dbname DBNAME] [--user USER] [--password PASSWORD] [--host HOST] [--port PORT] [--dry-run] [--dump-file DUMP_FILE] Anonymize data of a PostgreSQL database optional arguments: -h, --help show this help message and exit -v, --verbose Increase verbosity -l, --list-providers Show a list of all available providers --schema SCHEMA A YAML schema file that contains the anonymization rules --dbname DBNAME Name of the database --user USER Name of the database user --password PASSWORD Password for the database user --host HOST Database hostname --port PORT Port of the database --dry-run Don't commit changes made on the database --dump-file DUMP_FILE Create a database dump file with the given name
Example call:
$ pganonymize --schema=myschema.yml \ --dbname=test_database \ --user=username \ --password=mysecret \ --host=db.host.example.com \ -v
3.1 Database dump
With the --dump-file argument it is possible to create a database dump file after anonymizing the database. Please note, that the pg_dump command from the postgresql-client-common library is necessary to create the dump file for t he connected database, e.g. under Linux:
sudo apt-get install postgresql-client-common
Example call:
$ pganonymize --schema=myschema.yml \ --dbname=test_database \ --user=username \ --password=mysecret \ --host=db.host.example.com \ --dump-file=/tmp/dump.gz \ -v
4 Quickstart
Clone repo:
$ git clone git@github.com:rheinwerk-verlag/postgresql-anonymizer.git $ cd postgresql-anonymizer
For making changes and developing pganonymizer, you need to install poetry:
$ sudo pip install poetry
Now you can install all requirements and activate the virtualenv:
$ poetry install $ poetry shell
5 Docker
If you want to run the anonymizher within a Docker container you first have to build the image:
$ docker build -t pganonymizer .
After that you can pass a schema file to the container, using Docker volumes, and call the anonymizer:
$ docker run \ -v <path to your schema>:/schema.yml \ -it pganonymizer \ /usr/local/bin/pganonymize \ --schema=/schema.yml \ --dbname=<database> \ --user=<user> \ --password=<password> \ --host=<host> \ -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pganonymize-0.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e99fce2df390da7780caaf70a3eddab2f56fe2caded678ffcee1cc07198a938 |
|
MD5 | ae59526cc7266eece15a34508f3d1619 |
|
BLAKE2b-256 | 329c44dc07fe3ba03a88ac2b3268a4188312619ebae66f18514ce25eb758e699 |