Skip to main content

Anonymization of data in pg_dump

Project description

pg_stage

A utility for generating a database dump, the data in which will be obfuscated. This dump can be used in development and stage servers without fear of their theft.

Content

How does it work?

The utility processes the output of the pg_dump command line by line and decides whether to obfuscate data at the level of comments to a table or column.

Usage example

  1. You need to create a file with approximately the following contents:
# main.py
from pg_stage.obfuscators.plain import PlainObfuscator
from pg_stage.obfuscators.custom import CustomObfuscator

# Initialize obfuscator based on dump format
# Use PlainObfuscator for default SQL text format
# Use CustomObfuscator for custom format dumps (pg_dump -Fc)
obfuscator = PlainObfuscator(locale='ru_RU')  # or CustomObfuscator(locale='ru_RU')
obfuscator.run()
  1. Add comments to a column or table:
COMMENT ON COLUMN table_1.first_name IS 'anon: [{"mutation_name": "first_name"}]';
  1. Run pg_dump and redirect the stream to the running script process:
# Run pg_dump with appropriate format and pipe to main.py for obfuscation
# For PlainObfuscator: use default SQL text format
pg_dump -d database | python3 main.py > backup.sql

# For CustomObfuscator: use custom format (-Fc)
pg_dump -Fc -d database | python3 main.py > backup.dump
  1. After that you will get the obfuscated data in the table

Supported types of obfuscation

You can see the current list here.

Why did I write my utility?

I also adhere to the rule that you do not need to place third-party plugins in the working database for its security (most utilities are in the form of database extensions).

Also, in similar utilities, I could not find the functionality for uniform obfuscation of data in related tables. This prompted me to write my own utility that will be able to obfuscate data in related tables with the same result by a foreign key.

Example:

COMMENT ON COLUMN table_1.first_name IS 'anon: [{"mutation_name": "first_name", "relations": [{"table_name": "table_1", "column_name": "last_name", "from_column_name": "id", "to_column_name": "id"}]}]';

where relations - links on tables where it is necessary to obfuscate fields according to the current field.

Thanks for the inspiration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_stage-0.4.1.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_stage-0.4.1-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file pg_stage-0.4.1.tar.gz.

File metadata

  • Download URL: pg_stage-0.4.1.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.2

File hashes

Hashes for pg_stage-0.4.1.tar.gz
Algorithm Hash digest
SHA256 8024b53a5b97f4d23de329866ceb784194c687d448822af2401ed3c123c12f78
MD5 2deae283c467b76a9483e5b8dbe4c8a9
BLAKE2b-256 705a99faa2ffb65388165562aed9f0e4e64183ccbab878392a3fbd2ad70a2fd3

See more details on using hashes here.

File details

Details for the file pg_stage-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: pg_stage-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.2

File hashes

Hashes for pg_stage-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 606ab029a24449de19d21f09179932055e7641d4c0709a3d50608876da347966
MD5 74bb2dac1501cddf9032bcd17da6b918
BLAKE2b-256 1dc65d1d48011c2923e5a88a4162fbf31fef17265f447b620aed52f77b24a753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page