Skip to main content

App to anonymize data in Django models.

Project description

This app aims to help you anonymize data in a database used for development.

It is common practice in develpment to use a database that is very similar in content to the real data. The problem is that this can lead to have copies of sensitive customer data on development machines (and backups etc). This Django app helps by giving an easy and customizable way to anonymize data in your models.

The basic method is go through all the models that you specify, and generate fake data for all the fields specified. Introspection of the models will produce an anonymizer that will attempt to provide sensible fake data for each field, leaving you to fill in the gaps.

Please note that the methods provided will not provide full anonymity. Even if you anonymize the names and other details of your customers, there may well be enough data to identify them. Relationships between records in the database are not altered, in order to preserve the characteristic structure of data in your application, but this may leave you open to information leaks which might not be acceptable for your data. This application should be good enough for simpler policies like ‘remove all real telephone numbers from the database’.

Usage:

  • Install using setup.py or pip/easy_install.

  • Add ‘anonymizer’ to your INSTALLED_APPS setting.

  • To create some stub files for your anonymizers, do:

    ./manage.py create_anonymizers app_name1 [app_name2...]

    This will create a file anonymizers.py in each of the apps you specify. (It will not overwrite existing files).

    The file will contain autogenerated classes that attempt to use appropriate functions for generating fake data.

  • Edit the generated anonymizers.py files, filling out the details, and adding any filtering. You can override any of the public methods defined in anonymizer.base.Anonymizer in order to do filtering and other customization.

    The ‘attributes’ dictionary is the key attribute to edit. The keys are the attribute names of attributes on the model that need to be set. The values are either strings (as a shortcut, see below), or callables that take the following arguments:

    • The Anonymizer instance

    • The object being edited.

    • The field being edited

    • The current value of the field.

    The Anonymizer instance has an attribute ‘faker’ attribute which is useful for generating faked data.

    If the value is a string, e.g. ‘email’, it is turned into a lambda as follows:

    lambda self, obj, field, val: self.faker.email(field=field)

    For some fields, you will want to remove them from the list of attributes, so that the values will be unchanged - especially things like denormalised fields. You may need to override the ‘alter_object’ to fix up any fields like that.

    An example Anonymizer for django.contrib.auth.models.User might look like this:

    from datetime import datetime
    
    from anonymizer import Anonymizer
    from django.contrib.auth.models import User
    
    class UserAnonymizer(Anonymizer):
    
        model = User
    
        attributes = {
            'username':   'username',
            'first_name': 'first_name',
            'last_name':  'last_name',
            'email':      'email',
            # Set the date_joined to a similar time to when they actually
            # joined, by passing the 'val' parameter to faker.datetime
            'date_joined': lambda self, obj, field, val: self.faker.datetime(field=field, val=val),
            # Set to today:
            'last_login': lambda *args: datetime.now()
        }
    
        def alter_object(self, obj):
            if obj.is_superuser:
                return False # don't change, so we can still log in.
            super(UserAnonymizer, self).alter_object(obj)
            # Destroy all passwords for everyone else
            obj.set_unusable_password()
  • If you need to create anonymizers for apps that you do not control, you may want to move the contents of the anonymizers.py file to an app that you do control. It doesn’t matter if the anonymizer classes are for models that do not correspond to the applications they are contained it.

    (For example, if you want to anonymize the models in django.contrib.auth, you will probably want to move the contents of django/contrib/auth/anonymizers.py into yourprojectapp/anonymizers.py)

  • To run the anonymizers, do:

    ./manage.py anonymize_data app_name1 [app_name2...]

    This will DESTRUCTIVELY UPDATE all your data. Make sure you have backups, use at own risk, yada yada.

Version 0.1.1

  • Removed some debug code

  • Better handling of SlugField and skipped fields in introspection

Version 0.1

Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-anonymizer-0.1.1.tar.gz (7.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page