Skip to main content

Sanitizes contents of a database.

Project description

Database sanitation tool

pypi travis codecov

database-sanitizer is a tool which retrieves an database dump from relational database and performs sanitation on the retrieved data according to rules defined in a configuration file. Currently the sanitation tool supports both PostgreSQL and MySQL databases.

Installation

database-sanitizer can be installed from PyPI with pip like this:

$ pip install database-sanitizer

If you are using MySQL, you need to install the package like this instead, so that additional requirements are included:

$ pip install database-sanitizer[MySQL]

Usage

Once the package has been installed, database-sanitizer can be used like this:

$ database-sanitizer <DATABASE-URL>

Command line argument DATABASE-URL needs to be provided so the tool knows how to retrieve the dump from the database. With PostgreSQL, it would be something like this:

$ database-sanitizer postgres://user:password@host/database

However, unless an configuration file is provided, no sanitation will be performed on the retrieved database dump, which leads us to the next section which will be...

Configuration

Rules for the sanitation can be given in a configuration file written in YAML. Path to the configuration file is then given to the command line utility with --config argument (-c for shorthand) like this:

$ database-sanitizer -c config.yml postgres://user:password@host/database

The configuration file uses following kind of syntax:

config:
  addons:
    - some.other.package
    - yet.another.package
  extra_parameters: # These parameters will be passed to the dump tool CLI
    mysqldump:
      - "--single-transaction" # Included by default
    pg_dump:
      - "--exclude-table=something"
strategy:
  user:
    first_name: name.first_name
    last_name: name.last_name
    secret_key: string.empty
  access_log: skip_rows

In the example configuration above, there are first listed two "addon packages", which are names of Python packages where the sanitizer will be looking for sanitizer functions. They are completely optional and can be omitted, in which case only sanitizer functions defined in package called sanitizers and built-in sanitizers will be used instead.

It's also possible to define extra parameters to pass to the dump tool ( mysqldump or pg_dump). By default, mysqldump will include the --single-transaction extra parameter. You can disable this by defining the extra parameters in the config file explicitly, e.g. with an empty array [].

The strategy portion of the configuration contains the actual sanitation rules. First you define name of the database table (in the example that would be user) followed by column names in that table which each one mapped to sanitation function name. The name of the sanitation function consists from two parts separated from each other by a dot: Python module name and name of the actual function, which will be prefixed with sanitize_, so name.first_name would be a function called sanitize_first_name in a file called name.py.

Table content can be left out completely from the sanitized dump by setting table strategy to skip_rows (check access_log table in the example config). This will leave out all INSERT INTO (MySQL) or COPY (PostgreSQL) statements from the sanitized dump file. CREATE TABLE statements will not be removed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

database-sanitizer-1.1.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

database_sanitizer-1.1.0-py2.py3-none-any.whl (24.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file database-sanitizer-1.1.0.tar.gz.

File metadata

  • Download URL: database-sanitizer-1.1.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.23.2 CPython/3.6.5

File hashes

Hashes for database-sanitizer-1.1.0.tar.gz
Algorithm Hash digest
SHA256 14d93f6eefcb08a4a96d5a075ba6e5a5e3e3ac2b8c57374114b6be889b5ea97a
MD5 b8d52c338400a8b538baafd7854a2863
BLAKE2b-256 16b660328d604247cabe04f7fe65aed23e7b67b7b7c3738c8e4709fce8a0a65b

See more details on using hashes here.

File details

Details for the file database_sanitizer-1.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: database_sanitizer-1.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.23.2 CPython/3.6.5

File hashes

Hashes for database_sanitizer-1.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f717ed4e9f64b193f580d0d744c96ec2f95c0e853b69b5bccdb85e5807e9bbca
MD5 abd6abff677d9f48929fa386121b89e7
BLAKE2b-256 e046463838e59da24ff5f82e846ca176bd8c805d46869e3e51e5c0da6152ceca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page