django-scrubber

Data Anonymizer for Django

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Framework
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language
Topic
- Security
- Software Development

Project description

# Django Scrubber

[![Build Status](https://travis-ci.org/RegioHelden/django-scrubber.svg?branch=master)](https://travis-ci.org/RegioHelden/django-scrubber)
![PyPI](https://img.shields.io/pypi/v/django-scrubber.svg)

`django_scrubber` is a django app meant to help you anonymize your project's database data. It destructively alters data directly on the DB and therefore **should not be used on production**.

The main use case is providing developers with realistic data to use during development, without having to distribute your customers' or users' potentially sensitive information.
To accomplish this, `django_scrubber` should be plugged in a step during the creation of your database dumps.

Simply mark the fields you want to anonymize and call the `scrub_data` management command. Data will be replaced based on different *scrubbers* (see below), which define how the anonymous content will be generated.

## Installation

Simply run:
```
pip install django-scrubber
```

And add `django_scrubber` to your django `INSTALLED_APPS`. I.e.: in `settings.py` add:
```
INSTALLED_APPS = [
...
'django_scrubber',
...
]
```

## Selecting data to scrub

There are a few different ways to select which data should be scrubbed, namely: explicitly per model field; or globally per name or field type.

Adding scrubbers directly to model:
```python
class MyModel(Model):
somefield = CharField()

class Scrubbers:
somefield = scrubbers.Hash('somefield')
```

Adding scrubber globally, either by field name or field type:

```python
# (in settings.py)

SCRUBBER_GLOBAL_SCRUBBERS = {
'name': scrubbers.Hash,
EmailField: scrubbers.Hash,
}
```

Model scrubbers override field-name scrubbers, which in turn override field-type scrubbers.

To disable global scrubbing in a specific model, simply set the field scrubber to `None`.

By default, `django_scrubber` will affect all registered apps. This may lead to issues with third-party apps if the global scrubbers are too general. This can be avoided with the `SCRUBBER_APPS_LIST` setting. Using this, you might for instance split your `INSTALLED_APPS` into multiple `SYSTEM_APPS` and `LOCAL_APPS`, then set `SCRUBBER_APPS_LIST = LOCAL_APPS`, to scrub only your own apps.

Finally just run `./manage.py scrub_data` to **destructively** scrub the registered fields.

## Built-In scrubbers

### Hash

Simple hashing of content:
```python
class Scrubbers:
somefield = scrubbers.Hash # will use the field itself as source
someotherfield = scrubbers.Hash('somefield') # can optionally pass a different field name as hashing source
```

Currently this uses the MD5 hash which is supported in a wide variety of DB engines. Additionally, since security is not the main objective, a shorter hash length has a lower risk of being longer than whatever field it is supposed to replace.

### Lorem

Simple scrubber meant to replace `TextField` with a static block of text. Has no options.
```python
class Scrubbers:
somefield = scrubbers.Lorem
```

### Faker

Replaces content with the help of [faker](https://pypi.python.org/pypi/Faker).

```python
class Scrubbers:
first_name = scrubbers.Faker('first_name')
last_name = scrubbers.Faker('last_name')
```

The replacements are done on the database-level and should therefore be able to cope with large amounts of data with reasonable performance.

Any [faker providers](https://faker.readthedocs.io/en/latest/providers.html) are supported and you can also register your own custom providers.

#### Locales

Faker will be initialized with the current django `LANGUAGE_CODE` and will populate the DB with localized data. If you want localized scrubbing, simply set it to some other value.

#### Idempotency

By default, the faker instance used to populate the DB uses a fixed random seed, in order to ensure different scrubbings of the same data generate the same output. This is particularly useful if the scrubbed data is imported as a dump by developers, since changing data during troubleshooting would otherwise be confusing.

This behaviour can be changed by setting `SCRUBBER_RANDOM_SEED=None`, which ensures every scrubbing will generate random source data.

#### Limitations

Scrubbing unique fields may lead to `IntegrityError`s, since there is no guarantee that the random content will not be repeated. Playing with different settings for `SCRUBBER_RANDOM_SEED` and `SCRUBBER_ENTRIES_PER_PROVIDER` may alleviate the problem.
Unfortunately, for performance reasons, the source data for scrubbing with faker is added to the database, and arbitrarily increasing `SCRUBBER_ENTRIES_PER_PROVIDER` will significantly slow down scrubbing (besides still not guaranteeing uniqueness).

## Settings

### `SCRUBBER_GLOBAL_SCRUBBERS`:
Dictionary of global scrubbers. Keys should be either field names as strings or field type classes. Values should be one of the scrubbers provided in `django_scrubber.scrubbers`.

Alternatively, values may be anything that can be used as a value in a `QuerySet.update()` call (like a `Func`), or a `callable` that returns such an object when called with a field name as argument.

Example:
```python
SCRUBBER_GLOBAL_SCRUBBERS = {
'name': scrubbers.Hash,
EmailField: scrubbers.Hash,
}
```

### `SCRUBBER_RANDOM_SEED`:
The seed used when generating random content by the Faker scrubber. Setting this to `None` means each scrubbing will generate different data.

(default: 42)

### `SCRUBBER_ENTRIES_PER_PROVIDER`:
Number of entries to use as source for Faker scrubber. Increasing this value will increase the randomness of generated data, but decrease performance.

(default: 1000)

### `SCRUBBER_SKIP_UNMANAGED`:
Do not attempt to scrub models which are not managed by the ORM.

(default: True)

### `SCRUBBER_APPS_LIST`:
Only scrub models belonging to these specific django apps. If unset, will scrub all installed apps.

(default: None)

## Making a new release

[bumpversion](https://github.com/peritus/bumpversion) is used to manage releases.

Add your changes to the [CHANGELOG](./CHANGELOG.md) and run `bumpversion <major|minor|patch>`, then push (including tags)

# 0.1.3 - Fix import

* fixed [import issue #1](https://github.com/RegioHelden/django-scrubber/pull/1) - Thanks to [Charlie Denton](https://github.com/meshy)

# 0.1.2 - Bumpversion support

* Use bumpversion and travis to make new releases

# 0.1.1 - Project renaming

* add pip package
* rename project: django\_scrubber → django-scrubber

# 0.1.0 - First release

* initial working release

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Framework
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Natural Language
- English
Programming Language
Topic
- Security
- Software Development

Release history Release notifications | RSS feed

2.0.0

Jun 27, 2024

1.3.0

Jun 5, 2024

1.2.2

Nov 6, 2023

1.2.1

Nov 3, 2023

1.2.0

Apr 1, 2023

1.1.0

Jul 11, 2022

1.0.0

Jul 11, 2022

0.9.0

Jun 27, 2022

0.8.0

May 1, 2022

0.7.0

Feb 24, 2022

0.6.2

Feb 8, 2022

0.6.1

Jan 25, 2022

0.6.0

Oct 18, 2021

0.5.6

Oct 8, 2021

0.5.4

Apr 13, 2021

0.5.3

Feb 4, 2021

0.5.2

Jan 12, 2021

0.5.1

Oct 16, 2020

0.4.4

Dec 10, 2019

0.4.3

Dec 4, 2019

0.4.1

Nov 15, 2019

0.4.0

Nov 13, 2019

0.3.1

Sep 10, 2018

0.3.0

Sep 7, 2018

0.2.1

Aug 14, 2018

0.2.0

Aug 13, 2018

0.1.4

Aug 12, 2018

This version

0.1.3

Aug 12, 2018

0.1.2

Jun 22, 2018

0.1.0

Jun 22, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-scrubber-0.1.3.tar.gz (9.2 kB view hashes)

Uploaded Aug 12, 2018 Source

Hashes for django-scrubber-0.1.3.tar.gz

Hashes for django-scrubber-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`88c31ad60f5ca6acf204211672eae360604ab245fd081ff5613f9bd1716e96e1`
MD5	`6a960e4505dbff0a021f91df9e4ef695`
BLAKE2b-256	`a4718f82b3558167af82f1411370d401b6a8840225ec6b02a34455706dc3093c`