Efficient serialization of Django Models for use in Celery that ensure the state of the world.
It supports Django 1.7, 1.8 and 1.9 for Python versions 2.7, 3.3, 3.4, 3.5 and pypy (where Django supports the Python version).
If you’re using Django and Celery you’re most likely passing instances of models back and forth between tasks or, as the Celery docs suggest, you’re passing just the primary key to a task and then retrieving the the model instance with the primary key.
If you’re doing the former, it’s potentially inefficient and certainly dangerous as by the time the task executes the models data could be changed!
If you’re using the later, you’re probably wondering to yourself, surely there is a better way?! While it’s efficient and certainly readable it’s not exactly much fun continually fetching the model at the start of each task…
You may also be using model methods as tasks, but unless you’re using something similar to this refresh decorator, you’ll potentially have stale model data.
django-cereal to the rescue…
django-cereal works by using an alternative serializer before the task is sent to the message bus and then retrieves a fresh instance of the model during deserialization. Currently only pickle is supported (feel free to fork and implement for JSON or YAML).
Essentially when the model is serialized only the primary key and the model’s class are pickled. This is obviously not quite as efficient as pickling just the models primary key, but it’s certainly better than serializing the entire model!
When the task is picked up by a Celery worker and deserialized an instance of
the model is retrieved using
YourModel.objects.get(pk=xxx) and thus this
approach is also safe as you’re not using stale model data in your task.
The serializer is registered with kombu and safely patches
django.db.Model.__reduce__ - it only operates inside the scope of kombu
and thus doesn’t mess with a model’s pickling outside of kombu.
You can install django-cereal either via the Python Package Index (PyPI) or from github.
To install using pip;
$ pip install django-cereal
$ pip install git+https://github.com/alexhayes/django-cereal.git
All that is required is that you specify the kwarg
defining a task.
from django_cereal.pickle import DJANGO_CEREAL_PICKLE @app.task(serializer=DJANGO_CEREAL_PICKLE) def my_task(my_model): ...
There is also a helper task that you can use which defines the serializer if it’s not set.
from django_cereal.pickle import task @task def my_task(my_model): ...
Another approach is to set
You can also use task methods on your Django models, so you don’t have to define them in a tasks.py. For example;
from celery.contrib.methods import task_method from django_cereal.pickle import DJANGO_CEREAL_PICKLE from yourproject.celery import app task_method_kwargs = dict(filter=task_method, serializer=DJANGO_CEREAL_PICKLE) class MyModel(models.Model): @app.task(name='MyModel.foo', **task_method_kwargs) def foo(self): # self is an instance of MyModel
Then, you can call your task as follows;
bar = MyModel.objects.get(...) bar.foo.delay()
Just like your would a normal task but you can stop defining tasks that simply orchestrate calls on a model and just call the model directly.
While not directly related to serialization of Django models, if you are using Django Model methods as tasks, or any class methods as tasks for that matter, and you are chaining these tasks you may be interested in the @ensure_self decorator (see Celery issue #2137 for more details).
Note that if you use the
--maxtasksperworker flag in Celery, or under
other similar situations, the connection to a database in Django could become
unusable, with errors such as the following thrown;
OperationalError(2006, 'MySQL server has gone away')
This is now handled by the unpickling by closing down the database connection which forces a new connection to be created.
Perhaps in the future there may be a nicer way of handling this, for instance, a new connection is created each time a worker is created, but for now the fix in place works, even if it’s not ideal.
This software is licensed under the MIT License. See the LICENSE file in the top distribution directory for the full license text.