Commandline tool to anonymize PostgreSQL databases
Project description
A commandline tool to anonymize PostgreSQL databases for DSGVO/GDPR purposes.
It uses a YAML file to define which tables and fields should be anonymized and provides various methods of anonymization. The tool requires a direct PostgreSQL connection to perform the anonymization.
1 Features
Intentionally compatible with Python 2.7 (for old, productive platforms)
Anonymize PostgreSQL tables on data level entry with various methods (s. table below)
Exclude data for anonymization depending on regular expressions
Truncate entire tables for unwanted data
Field |
Value |
Provider |
Output |
---|---|---|---|
first_name |
John |
choice |
(Bob|Larry|Lisa) |
title |
Dr. |
clear |
|
street |
Irving St |
faker.street_name |
Miller Station |
password |
dsf82hFxcM |
mask |
XXXXXXXXXX |
md5 |
0cba00ca3da1b283a57287bcceb17e35 |
||
faker.unique.email |
|||
ip |
157.50.1.20 |
set |
127.0.0.1 |
Note: faker.unique.[provider] only supported on python3.5+ (Faker library min supported python version)
See the documentation for a more detailed description of the provided anonymization methods.
2 Installation
The default installation method is to use pip:
$ pip install pganonymize
3 Usage
usage: pganonymize [-h] [-v] [-l] [--schema SCHEMA] [--dbname DBNAME] [--user USER] [--password PASSWORD] [--host HOST] [--port PORT] [--dry-run] [--dump-file DUMP_FILE] Anonymize data of a PostgreSQL database optional arguments: -h, --help show this help message and exit -v, --verbose Increase verbosity -l, --list-providers Show a list of all available providers --schema SCHEMA A YAML schema file that contains the anonymization rules --dbname DBNAME Name of the database --user USER Name of the database user --password PASSWORD Password for the database user --host HOST Database hostname --port PORT Port of the database --dry-run Don't commit changes made on the database --dump-file DUMP_FILE Create a database dump file with the given name
Despite the database connection values, you will have to define a YAML schema file, that includes all anonymization rules for that database. Take a look at the schema documentation or the YAML sample schema.
Example call:
$ pganonymize --schema=myschema.yml \ --dbname=test_database \ --user=username \ --password=mysecret \ --host=db.host.example.com \ -v
3.1 Database dump
With the --dump-file argument it is possible to create a dump file after anonymizing the database. Please note, that the pg_dump command from the postgresql-client-common library is necessary to create the dump file for the database, e.g. under Linux:
sudo apt-get install postgresql-client-common
Example call:
$ pganonymize --schema=myschema.yml \ --dbname=test_database \ --user=username \ --password=mysecret \ --host=db.host.example.com \ --dump-file=/tmp/dump.gz \ -v
4 Quickstart
Clone repo:
$ git clone git@github.com:rheinwerk-verlag/postgresql-anonymizer.git $ cd postgresql-anonymizer
For making changes and developing pganonymizer, you need to install poetry:
$ sudo pip install poetry
Now you can install all requirements and activate the virtualenv:
$ poetry install $ poetry shell
5 Docker
If you want to run the anonymizer within a Docker container you first have to build the image:
$ docker build -t pganonymizer .
After that you can pass a schema file to the container, using Docker volumes, and call the anonymizer:
$ docker run \ -v <path to your schema>:/schema.yml \ -it pganonymizer \ /usr/local/bin/pganonymize \ --schema=/schema.yml \ --dbname=<database> \ --user=<user> \ --password=<password> \ --host=<host> \ -v
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file pganonymize-0.5.0.tar.gz
.
File metadata
- Download URL: pganonymize-0.5.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/2.7.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4671757844eb98bc3255e93915cf82ab7c850c2f77ed02041a5970d5abe03a89 |
|
MD5 | d9eda730322f191de2bba9de8c34a000 |
|
BLAKE2b-256 | 9c6263c04c3161247baa37e1398c54bcaa167f68b1091599a99f985966484d57 |
Provenance
File details
Details for the file pganonymize-0.5.0-py3.9.egg
.
File metadata
- Download URL: pganonymize-0.5.0-py3.9.egg
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/2.7.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9b7fcfeeca18c2d26144f582fa071c63ec6965815bdcd464eec111eeeb5d511 |
|
MD5 | 26f8aa5e4c5762645147ffd274a6568e |
|
BLAKE2b-256 | 47aef167e5ef139695b4f1a7caa62ddfcd70645b616aee6442c14e5ebaee4062 |
Provenance
File details
Details for the file pganonymize-0.5.0-py2.py3-none-any.whl
.
File metadata
- Download URL: pganonymize-0.5.0-py2.py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.24.0 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/2.7.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cad6d06878492c1766e5e593e22bcc057ce5e4d0be6e77afc1152c4d3b7ea0fe |
|
MD5 | 1ad53921efe367234738c7ad06a3cdd2 |
|
BLAKE2b-256 | a4ef47ea865c1bf7e26098e62b6050e528d914f020968b06ab7c1c99bb1399ee |