Skip to main content

Django app to create configurable anonymised DB dumps.

Project description

django-db-anonymiser

Django app to create configurable anonymised DB dumps.

django-db-anonymiser provides a django app with a management command dump_and_anonymise. This command runs a pg_dump against a postgresql DB, applies anonymisation functions to data dumped from the DB and then writes the anonymised dump to S3. See here for lite-api's example anonymisation configuration; https://github.com/uktrade/lite-api/blob/dev/api/conf/anonymise_model_config.yaml

This pattern is designed as a replacement for Lite's old DB anonymisation process (although it is general purpose and can be used for any django project which uses postgresql). The previous process was baked in to an airflow installation and involved making a pg_dump from production, anonymising that dump with python and pushing the file to S3. See; https://github.com/uktrade/lite-airflow-dags/blob/master/dags/export_lite_db.py

django-db-anonymiser follows the same overall pattern, but aims to achieve it through a django management command instead of running on top of airflow. In addition, the configuration for how DB columns are anonymised can be configured in simple YAML.

Note: This repository depends upon code forked from https://github.com/andersinno/python-database-sanitizer This is housed under the database_sanitizer directory and has been forked from the above repository because it is unmaintained.

Getting started

  • Add faker>=4.18.0, boto3>=1.26.17 to python requirements; it is assumed python/psycopg and co are already installed.
  • Either add this github repository as a submodule to your django application named django_db_anonymiser or install the python package (django-db-anonymiser)[https://pypi.org/project/django-db-anonymiser/] from PyPI.
  • Add django_db_anonymiser.db_anonymiser to INSTALLED_APPS
  • Set the following django settings;
    • DB_ANONYMISER_CONFIG_LOCATION - the location of your anonymisation yaml file
    • DB_ANONYMISER_AWS_ENDPOINT_URL - optional, custom URL for AWS (e.g. if using minio)
    • DB_ANONYMISER_AWS_ACCESS_KEY_ID - AWS access key ID for the S3 bucket to upload dumps to
    • DB_ANONYMISER_AWS_SECRET_ACCESS_KEY - AWS secret key for the S3 bucket to upload dumps to
    • DB_ANONYMISER_AWS_REGION - AWS region for the S3 bucket to upload dumps to
    • DB_ANONYMISER_AWS_STORAGE_BUCKET_NAME - AWS bucket name for the S3 bucket to upload dumps to
    • DB_ANONYMISER_DUMP_FILE_NAME - Name for dumped DB file
    • DB_ANONYMISER_AWS_STORAGE_KEY - optional, key under which file will be stored in AWS S3 bucket

Running tests

For local unit testing from the root of the repository run:

$ poetry run pytest django_db_anonymiser

Note: Currently for full test coverage, it is necessary to run tests in circleci, where we spin up a postgres db and test the db_anonymiser command directly

Publishing

Publishing to PyPI is currently a manual process:

  1. Acquire API token from Passman.
    • Request access from the SRE team.
    • Note: You will need access to the platform group in Passman.
  2. Run poetry config pypi-token.pypi <token> to add the token to your Poetry configuration.

Update the version, as the same version cannot be published to PyPI.

poetry version patch

More options for the version command can be found in the Poetry documentation. For example, for a minor version bump: poetry version minor.

Build the Python package.

poetry build

Publish the Python package.

Note: Make sure your Pull Request (PR) is approved and contains the version upgrade in pyproject.toml before publishing the package.

poetry publish

Check the PyPI Release history to make sure the package has been updated.

For an optional manual check, install the package locally and test everything works as expected.

Checking migration fields

After installing the package in your project, we recommend setting up a pre-commit hook to run the check_migration_fields command. For example:

repo: local
hooks:
-   id: check-migration-fields
    name: Check new fields added to anonymiser config
    entry: python manage.py check_migration_fields
    language: system
    pass_filenames: false

This command checks staged files for DB migrations. If a migration file introduces new model fields, it then checks the file at DB_ANONYMISER_CONFIG_LOCATION to see whether these new fields have been added there. If the fields are not in the config, the command will prompt the user to confirm that the new fields do not represent any sensitive user data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_db_anonymiser-0.5.7.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

django_db_anonymiser-0.5.7-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file django_db_anonymiser-0.5.7.tar.gz.

File metadata

  • Download URL: django_db_anonymiser-0.5.7.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.0 CPython/3.10.12 Linux/5.15.0-1106-aws-fips

File hashes

Hashes for django_db_anonymiser-0.5.7.tar.gz
Algorithm Hash digest
SHA256 54c2f024125cd82f9d706ed2784b77f5360962c37189287bad64015e3349a5c4
MD5 69e99924a42fab5a23eba56c5ed2c1e9
BLAKE2b-256 c56bb7a27b81514dce3a8d51a44ca088ecb3db2d63b0ececb7e6f9922f860944

See more details on using hashes here.

File details

Details for the file django_db_anonymiser-0.5.7-py3-none-any.whl.

File metadata

  • Download URL: django_db_anonymiser-0.5.7-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.0 CPython/3.10.12 Linux/5.15.0-1106-aws-fips

File hashes

Hashes for django_db_anonymiser-0.5.7-py3-none-any.whl
Algorithm Hash digest
SHA256 477d21d1c094d1678d07c018e359b8c0fb2530c0d2457f1a04a815a800c35069
MD5 7dc0cf405ca2f58972649124309cd8f1
BLAKE2b-256 42ad1a1f48d89411efb4364bc1a031cb82c5b8e986d27002c399f725aebf8076

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page