Django app to create configurable anonymised DB dumps.
Project description
django-db-anonymiser
Django app to create configurable anonymised DB dumps.
django-db-anonymiser provides a django app with a management command dump_and_anonymise.
This command runs a pg_dump against a postgresql DB, applies anonymisation functions to
data dumped from the DB and then writes the anonymised dump to S3.
See here for lite-api's example anonymisation configuration; https://github.com/uktrade/lite-api/blob/dev/api/conf/anonymise_model_config.yaml
This pattern is designed as a replacement for Lite's old DB anonymisation process (although it is general purpose and can be used for any django project which uses postgresql).
The previous process was baked in to an airflow installation and involved making
a pg_dump from production, anonymising that dump with python and pushing the
file to S3. See; https://github.com/uktrade/lite-airflow-dags/blob/master/dags/export_lite_db.py
django-db-anonymiser follows the same overall pattern, but aims to achieve it through a django management command instead of running on top of airflow. In addition, the configuration for how DB columns are anonymised can be configured in simple YAML.
Note: This repository depends upon code forked from https://github.com/andersinno/python-database-sanitizer
This is housed under the database_sanitizer directory and has been forked from the above repository
because it is unmaintained.
Getting started
- Add
faker>=4.18.0,boto3>=1.26.17to python requirements; it is assumed python/psycopg and co are already installed. - Either add this github repository as a submodule to your django application named
django_db_anonymiseror install the python package (django-db-anonymiser)[https://pypi.org/project/django-db-anonymiser/] from PyPI. - Add
django_db_anonymiser.db_anonymisertoINSTALLED_APPS - Set the following django settings;
DB_ANONYMISER_CONFIG_LOCATION- the location of your anonymisation yaml fileDB_ANONYMISER_AWS_ENDPOINT_URL- optional, custom URL for AWS (e.g. if using minio)DB_ANONYMISER_AWS_ACCESS_KEY_ID- AWS access key ID for the S3 bucket to upload dumps toDB_ANONYMISER_AWS_SECRET_ACCESS_KEY- AWS secret key for the S3 bucket to upload dumps toDB_ANONYMISER_AWS_REGION- AWS region for the S3 bucket to upload dumps toDB_ANONYMISER_AWS_STORAGE_BUCKET_NAME- AWS bucket name for the S3 bucket to upload dumps to
Running tests
For local unit testing from the root of the repository run:
$ poetry run pytest django_db_anonymiser
Note: Currently for full test coverage, it is necessary to run tests in circleci, where we spin up a postgres db and test
the db_anonymiser command directly
Publishing
Publishing to PyPI is currently a manual process:
- Acquire API token from Passman.
- Request access from the SRE team.
- Note: You will need access to the
platformgroup in Passman.
- Run
poetry config pypi-token.pypi <token>to add the token to your Poetry configuration.
Update the version, as the same version cannot be published to PyPI.
poetry version patch
More options for the version command can be found in the Poetry documentation. For example, for a minor version bump: poetry version minor.
Build the Python package.
poetry build
Publish the Python package.
Note: Make sure your Pull Request (PR) is approved and contains the version upgrade in pyproject.toml before publishing the package.
poetry publish
Check the PyPI Release history to make sure the package has been updated.
For an optional manual check, install the package locally and test everything works as expected.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django_db_anonymiser-0.2.3.tar.gz.
File metadata
- Download URL: django_db_anonymiser-0.2.3.tar.gz
- Upload date:
- Size: 30.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.10.12 Linux/6.8.0-1042-aws
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a86cbd885e56e20994a99285be491dccc5950ca8b254c1627a91e063ec13d109
|
|
| MD5 |
48a22b59fe43d1e76f1f429fcf69f45f
|
|
| BLAKE2b-256 |
07d611765b74cba8d5df4ad8dad647bfed84b512699fc5d95c9aa16abef85b5c
|
File details
Details for the file django_db_anonymiser-0.2.3-py3-none-any.whl.
File metadata
- Download URL: django_db_anonymiser-0.2.3-py3-none-any.whl
- Upload date:
- Size: 42.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.4 CPython/3.10.12 Linux/6.8.0-1042-aws
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df74ab33255640629840b7cf6e255dcb2d4f0ef851eb9626de9f36dff355987
|
|
| MD5 |
a8a4cc5ec9e6d47b8c2cc0c937f27a8c
|
|
| BLAKE2b-256 |
38f1e4e0128a30c51bbe5907d7586d21c2461e6de36d3f3c085d28e79f19edd7
|