Skip to main content

An anonymization tool for production databases

Project description

pynonymizer pynonymizer on PyPI Downloads License

pynonymizer

pynonymizer is a tool for anonymizing sensitive production database dumps, allowing you to create realistic test datasets while maintaining GDPR/Data Protection compliance. It replaces personally identifiable information (PII) in your database with random, yet realistic data, using the Faker library and other functions.

Key features:

  • Supports MySQL, PostgreSQL, and MSSQL databases
  • Accepts various input formats (SQL, compressed files)
  • Generates anonymized output in multiple formats
  • Flexible data generation strategies for different use cases
  • Easy to use command-line interface and Python library

With pynonymizer, you can safely share production database copies with developers and testers, enabling better staging environments, integration tests, and database migration simulations, without compromising user privacy.

How does it work?

pynonymizer replaces personally identifiable data in your database with realistic pseudorandom data, from the Faker library or from other functions. There are a wide variety of data types available which should suit the column in question, for example:

  • unique_email
  • company
  • file_path
  • [...]

Pynonymizer's main data replacement mechanism fake_update is a random selection from a small pool of data (--seed-rows controls the available Faker data). This process is chosen for compatibility and speed of operation, but does not guarantee uniqueness. This may or may not suit your exact use-case. For a full list of data generation strategies, see the docs on strategyfiles

Examples

You can see strategyfile examples for existing databases, in the the examples folder.

Process outline

  1. Restore from dumpfile to temporary database.
  2. Anonymize temporary database with strategy.
  3. Dump resulting data to file.
  4. Drop temporary database.

If this workflow doesnt work for you, see process control to see if it can be adjusted to suit your needs.

mysql

  • mysql/mysqldump Must be in $PATH
  • Local or remote mysql >= 5.5
  • Supported Inputs:
    • Plain SQL over stdout
    • Plain SQL file .sql
    • GZip-compressed SQL file .gz
  • Supported Outputs:
    • Plain SQL over stdout
    • Plain SQL file .sql
    • GZip-compressed SQL file .gz
    • LZMA-compressed SQL file .xz

mssql

  • Requires extra dependencies: install package pynonymizer[mssql]
  • MSSQL >= 2008
  • For RESTORE_DB/DUMP_DB operations, the database server must be running locally with pynonymizer. This is because MSSQL RESTORE and BACKUP instructions are received by the database, so piping a local backup to a remote server is not possible.
  • The anonymize process can be performed on remote servers, but you are responsible for creating/managing the target database.
  • Supported Inputs:
    • Local backup file
  • Supported Outputs:
    • Local backup file

postgres

  • psql/pg_dump Must be in $PATH
  • Local or remote postgres server
  • Supported Inputs:
    • Plain SQL over stdout
    • Plain SQL file .sql
    • GZip-compressed SQL file .gz
  • Supported Outputs:
    • Plain SQL over stdout
    • Plain SQL file .sql
    • GZip-compressed SQL file .gz
    • LZMA-compressed SQL file .xz

Getting Started

Usage

CLI

  1. Write a strategyfile for your database
  2. Check out the help for a description of options pynonymizer --help
  3. Start Anonymizing!

Docker

Docker Image Version

pynonymizer is available as a docker image so that you dont have to install the client tools for your database.

See https://hub.docker.com/repository/docker/rwnxt/pynonymizer

# As pynonymizer depends on strategyfiles, you'll need to create a file mount so the file can be read.
docker run --mount type=bind,source=./strategyfile.yml,target=/tmp/strategyfile.yml rwnxt/pynonymizer -s /tmp/strategyfile.yml --db-host [...]

Package

Pynonymizer can also be invoked programmatically / from other python code. See the module entrypoint pynonymizer or pynonymizer/pynonymize.py

import pynonymizer

pynonymizer.run(input_path="./backup.sql", strategyfile_path="./strategy.yml" [...] )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynonymizer-2.4.0.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

pynonymizer-2.4.0-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file pynonymizer-2.4.0.tar.gz.

File metadata

  • Download URL: pynonymizer-2.4.0.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for pynonymizer-2.4.0.tar.gz
Algorithm Hash digest
SHA256 f94410dfa8921403202b9aa939e5f7dea908ff55831eb90b654464473d506a1d
MD5 49db958a9d9410e08c5132bdcaaf13b3
BLAKE2b-256 84cb732a821b8aa75d77bf71dc756726b076d475907996621a6c908b52d3ecf8

See more details on using hashes here.

File details

Details for the file pynonymizer-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: pynonymizer-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 38.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for pynonymizer-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b9942680c09eaa620e1595f8514d503965ee8b2c50fad1a3c61361cf6264e908
MD5 41f8ed518fb9d8c249386ed29c1491ec
BLAKE2b-256 eda32f279c5d67861e21ec4d7e37c4e37a219b1519677bb8536342306470ed0c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page