Sanitizes contents of a database.
Project description
Database sanitation tool
database-sanitizer is a tool which retrieves an database dump from
relational database and performs sanitation on the retrieved data
according to rules defined in a configuration file. Currently the
sanitation tool supports both PostgreSQL and MySQL databases.
Installation
database-sanitizer can be installed from PyPI with pip like this:
$ pip install database-sanitizer
If you are using MySQL, you need to install the package like this instead, so that additional requirements are included:
$ pip install database-sanitizer[MySQL]
Usage
Once the package has been installed, database-sanitizer can be used
like this:
$ database-sanitizer <DATABASE-URL>
Command line argument DATABASE-URL needs to be provided so the tool
knows how to retrieve the dump from the database. With PostgreSQL, it
would be something like this:
$ database-sanitizer postgres://user:password@host/database
However, unless an configuration file is provided, no sanitation will be performed on the retrieved database dump, which leads us to the next section which will be...
Configuration
Rules for the sanitation can be given in a configuration file written in
YAML. Path to the configuration file is then given to the command line
utility with --config argument (-c for shorthand) like this:
$ database-sanitizer -c config.yml postgres://user:password@host/database
The configuration file uses following kind of syntax:
config:
addons:
- some.other.package
- yet.another.package
extra_parameters: # These parameters will be passed to the dump tool CLI
mysqldump:
- "--single-transaction" # Included by default
pg_dump:
- "--exclude-table=something"
strategy:
user:
first_name: name.first_name
last_name: name.last_name
secret_key: string.empty
access_log: skip_rows
In the example configuration above, there are first listed two "addon
packages", which are names of Python packages where the sanitizer will
be looking for sanitizer functions. They are completely optional and can
be omitted, in which case only sanitizer functions defined in package
called sanitizers and built-in sanitizers will be used instead.
It's also possible to define extra parameters to pass to the dump tool (
mysqldump or pg_dump). By default, mysqldump will include the
--single-transaction extra parameter. You can disable this by defining the
extra parameters in the config file explicitly, e.g. with an empty array [].
The strategy portion of the configuration contains the actual
sanitation rules. First you define name of the database table (in the
example that would be user) followed by column names in that table
which each one mapped to sanitation function name. The name of the
sanitation function consists from two parts separated from each other by
a dot: Python module name and name of the actual function, which will
be prefixed with sanitize_, so name.first_name would be a function
called sanitize_first_name in a file called name.py.
Table content can be left out completely from the sanitized dump by
setting table strategy to skip_rows (check access_log table in the
example config). This will leave out all INSERT INTO (MySQL) or COPY
(PostgreSQL) statements from the sanitized dump file. CREATE TABLE
statements will not be removed.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file database-sanitizer-1.1.0.tar.gz.
File metadata
- Download URL: database-sanitizer-1.1.0.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.23.2 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14d93f6eefcb08a4a96d5a075ba6e5a5e3e3ac2b8c57374114b6be889b5ea97a
|
|
| MD5 |
b8d52c338400a8b538baafd7854a2863
|
|
| BLAKE2b-256 |
16b660328d604247cabe04f7fe65aed23e7b67b7b7c3738c8e4709fce8a0a65b
|
File details
Details for the file database_sanitizer-1.1.0-py2.py3-none-any.whl.
File metadata
- Download URL: database_sanitizer-1.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.23.2 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f717ed4e9f64b193f580d0d744c96ec2f95c0e853b69b5bccdb85e5807e9bbca
|
|
| MD5 |
abd6abff677d9f48929fa386121b89e7
|
|
| BLAKE2b-256 |
e046463838e59da24ff5f82e846ca176bd8c805d46869e3e51e5c0da6152ceca
|