Skip to main content

Sync database between Django backends

Project description

Django Data Sync

Enables you to sync insensitive data (including FileField) between environments with any Django backends (as long the model definitions are the same) directly from admin interface.

DISCLAIMER

There are no rigorous tests, yet. I haven't got the chance to explore how it behaves with complex relationships. So far, it has been used in two production grade projects where the models are not too complex (ManyToMany is not yet properly tested).

Please use this at your own risk of data lost when syncing, or you can do rigorous testing at your development phase.

Features

  • enables you to sync insensitive data between the same Django environments (as long the model definitions are the same) directly from admin interface
  • relation fields are supported (ManyToMany needs to be tested)
  • synchronous sync or in background (only Cloud Tasks is supported)

TO BE ADDED

  • add support for ImageField and FileField DONE
  • support multiple tasks queues, current plan is to support GCP Cloud Tasks DONE
  • add authorization and authentication at data export endpoint
  • add tests, since it's not possible to test with two Django servers locally (or there is?), I have to think how to implement this correctly

MIGHT GET ADDED

  • compare data in JSON for audit purpose
  • add support for another tasks queues so that is cloud platform agnostic

Installation

pip install django-data-sync

add data_sync to your INSTALLED_APPS

    ...
    ...

    'data_sync',
    ....
    ....

Run migrate

python manage.py migrate data_sync

Add to urlpatterns. Please do take note of the prefix URLs it will be used later. e.g. most likely we will include this in api App, thus the prefix is /api.

    path('', include('data_sync.urls')),

Preface

Data Sync works by making use of natural key. So I heavily recommend to read django docs on this topic before going further.

You need to analyze your models and define their natural keys. You can infer their natural keys usually from unique fields (and or unique_together).

Fields that are defined as unique or in unique_together can be defined by only using the field name e.g. a Language is related to a Country. In Language definition, the unique_together is usually the Country + the Language's ISO 639-1.

In code it'll look something like this

unique_together = (( 'country', 'code'),)

Notice that country in unique_together itself is abstract. What defines a country? In context of unique_together it will be their ID, but ID is not natural key. Country's natural key should be their ISO 2 code.

So we can infer that natural key of Language, programmatically, is the Country's ISO 2 code + the Language's ISO 639-1

It'll look like this when you implement in code

class Language(models.Model):
    def natural_key(self):
        return (self.country.code, self.code,)

In essence, natural key is usually combination of unique fields and or unique_together, but it needs to be more verbose.

Usage

To get Data Sync working, you need to register the models that want to be synced. Only register insensitive models e.g. copy. Never sync sensitive models e.g. User as it can expose very sensitive data.

To register the models, you need to decorate them and use custom managers.

from django.db import models

import data_sync



@data_sync.register_model(natural_key=['code'])
class Country(models.Model):
    objects = data_sync.managers.DataSyncEnhancedManager()

    code = models.CharField(max_length=2)  # iso2
    ....
    ....


@data_sync.register_model(natural_key=['country.code', 'code'])
class Language(models.Model):
    objects = data_sync.managers.DataSyncEnhancedManager()

    code = models.CharField(max_length=2)  # iso 639-1
    ....
    ....


@data_sync.register_model(
    natural_key=['language.country.code', 'language.code', 'key'],
    fields=('value', 'key', 'language'),
    file_fields=('thumbnail',)
)
class Copy(models.Model):
    objects = data_sync.managers.DataSyncEnhancedManager()

    language = models.ForeignKey(Language, on_delete=models.CASCADE)
    value = models.TextField()
    key = models.CharField(max_length=255)
    default = models.TextField()
    thumbnail = models.ImageField()
    ....
    ....

@data_sync.register_model

Here you need to define your natural key (read Preface for further topic).
If natural key has value in related field, you need to use . (dot) notation.

You can also pass argument to fields parameter if you want to limit which fields that you want to be synced.

To add FileField into Data Sync, add them into file_fields parameter.

DataSyncEnhancedManager

It looks like manager initialization is done at class loading. So adding custom manager programmatically might be considered hacky (I would really like to love input on this).

For now, I'm afraid you must define custom manager, with the default attribute name i.e. objects to use DataSyncEnhancedManager.

DataSyncEnhancedManager just adds a get_by_natural_key method and no other else.

Worker tasks

When the code is deployed to GAE (and GAE only, flex and kube not supported yet), data_sync automatically uses Cloud Tasks with the queue id of data_sync.

Settings and Configuration

Data sync should work without additional settings (if using synchronous mode which is the default).

If you are deploying to GAE, it automatically uses Cloud Tasks, which you should fill the optionals below.

Optionals

DATA_SYNC_SERVICE_ACCOUNT_EMAIL

Defaults to `` (empty string). You need to fill this with GCP service account. You can use GAE default service account. It is needed for OIDC validation as recommended by GCP.

DATA_SYNC_FORCE_SYNC

Defaults to False. Set this to True if you want to use synchronous when deployed to GAE.

DATA_SYNC_CLOUD_TASKS_QUEUE_ID

Defaults to data_sync

DATA_SYNC_CLOUD_TASKS_LOCATION

Defaults to europe-west1

DATA_SYNC_GOOGLE_CLOUD_PROJECT

Defaults to value of env var of GOOGLE_CLOUD_PROJECT.

DATA_SYNC_GAE_VERSION

Defaults to value of env var of GAE_VERSION, which is already set by GAE.

DATA_SYNC_GAE_SERVICE

Defaults to value of env var of GAE_SERVICE, which is already set by GAE.

Data Source

Data Source holds information about an environment from which you want your data to be synced.

The URL is dependant on where and how you include the data_sync.urls at installation phase.

For example, if you include data_sync.urls in your api App urlpatterns, then the URL in data source must be appended with your api URL. Thus it might look something like this https://example.com/api.

If you include data_sync.urls in your root urls, then Data Source URL will look like this https://example.com.

Do not include endslash.

The Sync

To do a sync, simply create a Data Pull

Compatibility

Python 3.7, Django 2.2 and up

Testing

No automated tests (yet.....).

To test locally, you can spawn two django servers with different ports and different database and set the Data Source accordingly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_data_sync-0.5.2.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

django_data_sync-0.5.2-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file django_data_sync-0.5.2.tar.gz.

File metadata

  • Download URL: django_data_sync-0.5.2.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3

File hashes

Hashes for django_data_sync-0.5.2.tar.gz
Algorithm Hash digest
SHA256 0ead9f371509c05b3e16849a7ab3103385a3ef4137a9194e8976f46e66b19a51
MD5 1abe065feab6d5708702e2e948e4fb66
BLAKE2b-256 95fb5cfcb1ff7ed67c2db7f69544d2470dbe58c885287c3410bb5dccbaa771d4

See more details on using hashes here.

File details

Details for the file django_data_sync-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: django_data_sync-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.7.3

File hashes

Hashes for django_data_sync-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d438e975604fc948a57b4917d9396d0478316992509afd0f720fbf64cbdb7edb
MD5 a0ab68ec605a97af3b68e8b17209449b
BLAKE2b-256 754f37279ad37feb32198bbf5b6544c8006252762c93b6305c074ecca56eae06

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page