Skip to main content

batch select many-to-many and one-to-many fields (to help avoid n+1 query problem)

Project description

The idea of Django Batch Select is to provide an equivalent to Django’s select_related functionality. As of such it’s another handy tool for avoiding the “n+1 query problem”.

select_related is handy for minimizing the number of queries that need to be made in certain situations. However it is only usual for pre-selecting ForeignKey relations.

batch_select is handy for pre-selecting ManyToManyField relations and reverse ForeignKey relations.

It works by performing a single extra SQL query after a QuerySet has been evaluated to stitch in the the extra fields asked for. This requires the addition of a custom Manager, which in turn returns a custom QuerySet with extra methods attached.

Example Usage

Assuming we have models defined as the following:

from batch_select.models import BatchManager

class Tag(models.Model):
    name = models.CharField(max_length=32)

class Section(models.Model):
    name = models.CharField(max_length=32)

    objects = BatchManager()

class Entry(models.Model):
    title = models.CharField(max_length=255)
    section = models.ForeignKey(Section, blank=True, null=True)
    tags = models.ManyToManyField(Tag)

    objects = BatchManager()

I’ll also define a helper function to show the SQL queries generated:

from django import db

def show_queries():
    for query in db.connection.queries:
        print query["sql"]
    db.reset_queries()

Here are a few example (with generated sql queries):

>>> Entry.objects.batch_select('tags').all()
[]
>>> show_queries() # no results, so no 2nd query
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
>>> Entry.objects.create()
>>> Entry.objects.create()
>>> tag1 = Tag.objects.create(name='tag1')
>>> tag2 = Tag.objects.create(name='tag2')
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entry.tags.add(tag1)
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[<Tag: Tag object>]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entries = Entry.objects.batch_select('tags').all()
>>> for entry in entries:
....     print entry.tags_all
....
[<Tag: Tag object>]
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1, 2)

Re-running that same last for loop without using batch_select generate three queries instead of two (n+1 queries):

>>> entries = Entry.objects.all()
>>> for entry in entries:
....     print entry.tags.all()
....
[<Tag: Tag object>]
[]

>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags"."entry_id" = 1
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags"."entry_id" = 2

This also works with reverse foreign keys. So for example we can get this entries that belong to each section:

>>> section1 = Section.objects.create(name='section1')
>>> section2 = Section.objects.create(name='section2')
>>> Entry.objects.create(section=section1)
>>> Entry.objects.create(section=section1)
>>> Entry.objects.create(section=section2)
>>> db.reset_queries()
>>> Section.objects.batch_select('entry_set')
[<Section: Section object>, <Section: Section object>]
>>> show_queries()
SELECT "batch_select_section"."id", "batch_select_section"."name" FROM "batch_select_section" LIMIT 21
SELECT ("batch_select_entry"."section_id") AS "__section_id", "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id", "batch_select_entry"."location_id" FROM "batch_select_entry" WHERE "batch_select_entry"."section_id" IN (1, 2)

Each section object in that query will have an entry_set_all field containing the relevant entries.

You need to pass batch_select the “related name” of the foreign key, in this case “entry_set”. NB by default the related name for a foreign key does not actually include the _set suffix, so you can use just “entry” in this case. I have made sure that the _set suffix version also works to try and keep the API simpler.

More Advanced Usage

By default the batch fields are inserted into fields named <name>_all, on each object. So:

Entry.objects.batch_select('tags').all()

results in the Entry instances having fields called 'tags_all' containing the Tag objects associated with that Entry.

If you want to give the field a different name just use a keyword argument - in the same way as using the Aggregation API:

Entry.objects.batch_select(selected_tags='tags').all()

Would means the Tag objects would be assigned to fields called 'selected_tags'.

If you want to perform filtering of the related objects you will need to use a Batch object. By doing this you can pass extra keyword arguments in the same way as when using the filter method of a QuerySet:

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags', name__contains='blue'))

Would return Entry objects with fields called ‘tags_containing_name’ with only those Tags whose name contains ‘blue’.

In addition to filtering using keyword arguments, you can also call the following methods on a Batch object, with their effects being passed on to the underlying QuerySet object:

(Note that distinct(), values() etc are not included as they would have side-effects on how the extra query is associated with the original query) So for example to achieve the same effect as the filter above you could do the following:

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags').filter(name__contains='blue'))

Whereas the following would exclude tags containing “blue” and order by name:

from batch_select.models import Batch

batch = Batch('tags').exclude(name__contains='blue').order_by('name')
Entry.objects.batch_select(tags_not_containing_blue=batch)

Compatibility

Django batch select should work with Django 1.1-1.3 at least.

TODOs and BUGS

See: http://github.com/lilspikey/django-batch-select/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-batch-select-0.2.4.tar.gz (13.3 kB view details)

Uploaded Source

File details

Details for the file django-batch-select-0.2.4.tar.gz.

File metadata

File hashes

Hashes for django-batch-select-0.2.4.tar.gz
Algorithm Hash digest
SHA256 0acfb2c6b9e228cc20c77aac2f3feed084c94c32223ea0684fda5cb7248a27a1
MD5 fe4214f01d9dd334742430b74b5f9593
BLAKE2b-256 bb4ccb1af358a4d366ae5e69f5ffc6178b5d903395fc09730df6f28620f13755

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page