Skip to main content

batch select many-to-many and one-to-many fields (to help avoid n+1 query problem)

Project description

====================
Django Batch Select
====================

The idea of Django Batch Select is to provide an equivalent to Django's
select_related_ functionality. As of such it's another handy tool for avoiding
the "n+1 query problem".

select_related_ is handy for minimizing the number of queries that need
to be made in certain situations. However it is only usual for pre-selecting
ForeignKey_ relations.

batch_select is handy for pre-selecting ManyToManyField_ relations and
reverse ForeignKey_ relations.

It works by performing a single extra SQL query after a QuerySet_ has
been evaluated to stitch in the the extra fields asked for. This requires the
addition of a custom Manager_, which in turn returns a custom QuerySet_ with
extra methods attached.

Example Usage
=============

Assuming we have models defined as the following:

::

from batch_select.models import BatchManager

class Tag(models.Model):
name = models.CharField(max_length=32)

class Section(models.Model):
name = models.CharField(max_length=32)

objects = BatchManager()

class Entry(models.Model):
title = models.CharField(max_length=255)
section = models.ForeignKey(Section, blank=True, null=True)
tags = models.ManyToManyField(Tag)

objects = BatchManager()

I'll also define a helper function to show the SQL queries generated:

::

from django import db

def show_queries():
for query in db.connection.queries:
print query["sql"]
db.reset_queries()

Here are a few example (with generated sql queries):

::

>>> Entry.objects.batch_select('tags').all()
[]
>>> show_queries() # no results, so no 2nd query
SELECT "batch_select_entry"."id", "batch_select_entry"."title",
"batch_select_entry"."section_id" FROM "batch_select_entry"
>>> Entry.objects.create()
>>> Entry.objects.create()
>>> tag1 = Tag.objects.create(name='tag1')
>>> tag2 = Tag.objects.create(name='tag2')
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title",
"batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id",
"batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER
JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" =
"batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entry.tags.add(tag1)
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[<Tag: Tag object>]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title",
"batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id",
"batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER
JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" =
"batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entries = Entry.objects.batch_select('tags').all()
>>> for entry in entries:
.... print entry.tags_all
....
[<Tag: Tag object>]
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title",
"batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id",
"batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER
JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" =
"batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN
(1, 2)

Re-running that same last for loop without using batch_select generate
three queries instead of two (n+1 queries):

::

>>> entries = Entry.objects.all()
>>> for entry in entries:
.... print entry.tags.all()
....
[<Tag: Tag object>]
[]

>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title",
"batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM
"batch_select_tag" INNER JOIN "batch_select_entry_tags" ON
("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE
"batch_select_entry_tags"."entry_id" = 1
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM
"batch_select_tag" INNER JOIN "batch_select_entry_tags" ON
("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE
"batch_select_entry_tags"."entry_id" = 2

More Advanced Usage
=========================

By default the batch fields are inserted into fields named <name>_all,
on each object. So:

::

Entry.objects.batch_select('tags').all()

Results in the Entry instances having fields called 'tags_all'
containing the Tag objects associated with that Entry.

If you want to give the field a different name just use a named
parameter - in the same way as using the Aggregation_ API:

::

Entry.objects.batch_select(selected_tags='tags').all()

Would means the Tag objects would be assigned to fields called
'selected_tags'.

If you want to perform filtering of the related objects you will need to
use a Batch object. By doing this you can pass extra named parameters in the
same way as when using the filter method of a QuerySet:

::

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags',
name__contains='blue'))

Would return Entry objects with fields called 'tags_containing_name'
with only those Tags whose name contains 'blue'.

In addition to filtering by names parameters you can also the following
methods on a Batch object, with their effects being passed on to the underlying
QuerySet_ object:

* filter_
* exclude_
* annotate_
* order_by_
* reverse_
* select_related_
* extra_
* defer_
* only_

(Note that distinct(), values() etc are not included as they would have
side-effects on how the extra query is associated with the original query)
So for example to achieve the same effect as the filter above you could
do the following:

::

from batch_select.models import Batch


Entry.objects.batch_select(tags_containing_blue=Batch('tags').filter(name__contains='blue'))

Whereas the following would exclude tags containing "blue" and order by
name:

::

from batch_select.models import Batch

batch = Batch('tags').exclude(name__contains='blue').order_by('name')
Entry.objects.batch_select(tags_not_containing_blue=batch)


TODOs and BUGS
==============
See: http://github.com/lilspikey/django-batch-select/issues

.. _select_related:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
.. _ForeignKey:
http://docs.djangoproject.com/en/dev/ref/models/fields/#foreignkey
.. _ManyToManyField:
http://docs.djangoproject.com/en/dev/ref/models/fields/#manytomanyfield
.. _QuerySet: http://docs.djangoproject.com/en/dev/ref/models/querysets/
.. _Manager: http://docs.djangoproject.com/en/dev/topics/db/managers/
.. _Aggregation: http://docs.djangoproject.com/en/dev/topics/db/aggregation/
.. _filter:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#filter-kwargs
.. _exclude:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#exclude-kwargs
.. _annotate:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#annotate-args-kwargs
.. _order_by:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#order-by-fields
.. _reverse:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#reverse
.. _extra:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#extra-select-none-where-none-params-none-tables-none-order-by-none-select-params-none
.. _defer:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer-fields
.. _only:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#only-fields

Project details


Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page