An anonymization tool for production databases

These details have not been verified by PyPI

Project links

Homepage

Project description

`pynonymizer`

pynonymizer

pynonymizer is a tool for anonymizing sensitive production database dumps, allowing you to create realistic test datasets while maintaining GDPR/Data Protection compliance. It replaces personally identifiable information (PII) in your database with random, yet realistic data, using the Faker library and other functions.

Key features:

Supports MySQL, PostgreSQL, and MSSQL databases
Accepts various input formats (SQL, compressed files)
Generates anonymized output in multiple formats
Flexible data generation strategies for different use cases
Easy to use command-line interface and Python library

With pynonymizer, you can safely share production database copies with developers and testers, enabling better staging environments, integration tests, and database migration simulations, without compromising user privacy.

How does it work?

pynonymizer replaces personally identifiable data in your database with realistic pseudorandom data, from the Faker library or from other functions. There are a wide variety of data types available which should suit the column in question, for example:

unique_email
company
file_path
[...]

Pynonymizer's main data replacement mechanism fake_update is a random selection from a small pool of data (--seed-rows controls the available Faker data). This process is chosen for compatibility and speed of operation, but does not guarantee uniqueness. This may or may not suit your exact use-case. For a full list of data generation strategies, see the docs on strategyfiles

Examples

You can see strategyfile examples for existing databases, in the the examples folder.

Process outline

Restore from dumpfile to temporary database.
Anonymize temporary database with strategy.
Dump resulting data to file.
Drop temporary database.

If this workflow doesnt work for you, see process control to see if it can be adjusted to suit your needs.

mysql

mysql/mysqldump Must be in $PATH
Local or remote mysql >= 5.5
Supported Inputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
Supported Outputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
- LZMA-compressed SQL file .xz

mssql

Requires extra dependencies: install package pynonymizer[mssql]
MSSQL >= 2008
For RESTORE_DB/DUMP_DB operations, the database server must be running locally with pynonymizer. This is because MSSQL RESTORE and BACKUP instructions are received by the database, so piping a local backup to a remote server is not possible.
The anonymize process can be performed on remote servers, but you are responsible for creating/managing the target database.
Supported Inputs:
- Local backup file
Supported Outputs:
- Local backup file

postgres

psql/pg_dump Must be in $PATH
Local or remote postgres server
Supported Inputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
Supported Outputs:
- Plain SQL over stdout
- Plain SQL file .sql
- GZip-compressed SQL file .gz
- LZMA-compressed SQL file .xz

Getting Started

Usage

CLI

Write a strategyfile for your database
Check out the help for a description of options pynonymizer --help
Start Anonymizing!

Docker

Docker Image Version

pynonymizer is available as a docker image so that you dont have to install the client tools for your database.

See https://hub.docker.com/repository/docker/rwnxt/pynonymizer

# As pynonymizer depends on strategyfiles, you'll need to create a file mount so the file can be read.
docker run --mount type=bind,source=./strategyfile.yml,target=/tmp/strategyfile.yml rwnxt/pynonymizer -s /tmp/strategyfile.yml --db-host [...]

Package

Pynonymizer can also be invoked programmatically / from other python code. See the module entrypoint pynonymizer or pynonymizer/pynonymize.py

import pynonymizer

pynonymizer.run(input_path="./backup.sql", strategyfile_path="./strategy.yml" [...] )

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.5.0

Dec 27, 2024

2.4.0

Jul 30, 2024

2.3.1

May 27, 2024

2.2.1

Apr 30, 2024

2.2.0

Apr 14, 2024

2.1.1

Apr 6, 2024

2.1.0

Apr 3, 2024

2.0.0

Mar 28, 2024

1.25.0

Mar 29, 2023

1.24.0

Sep 7, 2022

1.23.0

Aug 21, 2022

1.22.0

Feb 6, 2022

1.21.3

Nov 14, 2021

1.21.2

Sep 6, 2021

1.21.1

Jun 22, 2021

1.21.0

May 31, 2021

1.20.0

May 6, 2021

1.19.0

Apr 24, 2021

1.18.1

Apr 12, 2021

1.18.0

Apr 11, 2021

1.17.0

Mar 29, 2021

1.16.0

Mar 16, 2021

1.15.0

Jan 29, 2021

1.14.0

Dec 7, 2020

1.13.0

Oct 22, 2020

1.12.0

Sep 25, 2020

1.11.2

Sep 23, 2020

1.11.1

Aug 29, 2020

1.10.1

Jul 22, 2020

1.10.0

Jul 22, 2020

1.9.0

Jun 25, 2020

1.8.0

Jan 17, 2020

1.7.0

Jan 10, 2020

1.6.2

Sep 17, 2019

1.6.1

Aug 2, 2019

1.6.0

Aug 1, 2019

1.5.0

Jul 13, 2019

1.4.1

Jun 29, 2019

1.4.0

Jun 23, 2019

1.3.0

Jun 17, 2019

1.2.0

Jun 14, 2019

1.1.2

Jun 8, 2019

1.0.0

Jun 4, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynonymizer-2.5.0.tar.gz (30.5 kB view details)

Uploaded Dec 27, 2024 Source

Built Distribution

pynonymizer-2.5.0-py3-none-any.whl (39.1 kB view details)

Uploaded Dec 27, 2024 Python 3

File details

Details for the file pynonymizer-2.5.0.tar.gz.

File metadata

Download URL: pynonymizer-2.5.0.tar.gz
Upload date: Dec 27, 2024
Size: 30.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for pynonymizer-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`fa6a68a4c3f898ee15446aeb86948bf7bbe27f5987b314e9b41de37cd5bbd519`
MD5	`7947ff865247f143302485c9ca586af3`
BLAKE2b-256	`09e89d1ed8e2a3ea849bddb96daff0b6108010d8a9cb9b2ff212f231d7258e24`

See more details on using hashes here.

File details

Details for the file pynonymizer-2.5.0-py3-none-any.whl.

File metadata

Download URL: pynonymizer-2.5.0-py3-none-any.whl
Upload date: Dec 27, 2024
Size: 39.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for pynonymizer-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31372212e6d6e9e273cb8f90c6bb87be4c20ea94121e507ca7e7bb0c43ff6c04`
MD5	`b7a1c8731013ccb9747943675b431e9a`
BLAKE2b-256	`133eecd8b213a28945ad7e502f62047718a5b2e1bc1d873db995497717d683c4`

See more details on using hashes here.

pynonymizer 2.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`pynonymizer`

pynonymizer

How does it work?

Examples

Process outline

mysql

mssql

postgres

Getting Started

Usage

CLI

Docker

Package

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes