A memoized prefetch for Django.
Project description
django-memoized-prefetch
A Django package that provides efficient memoized prefetching for processing data in chunks, reducing database queries through intelligent caching. In some cases it can be useful even when not processing data in chunks, for example, when there are multiple foreign keys to the same table.
Overview
django-memoized-prefetch optimizes Django ORM queries when processing large datasets by:
- Reusing previously fetched objects across chunks
- Memoizing prefetched objects using LRU (Least Recently Used) cache
- Supporting both foreign key and many-to-many relationships
- Minimizing database queries across chunk processing operations
Installation
pip install django-memoized-prefetch
Requirements
- Python 3.9+
- Django 4.2+
- lru-dict 1.3.0+
Usage Examples
Models used in examples, click to expand
from django.db import models
class Author(models.Model):
name = models.CharField(max_length=255)
email = models.EmailField()
class Publisher(models.Model):
name = models.CharField(max_length=255)
country = models.CharField(max_length=100)
class Category(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=255)
isbn = models.CharField(max_length=13)
author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="books")
translator = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="translations", null=True)
publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE, related_name="books")
categories = models.ManyToManyField(Category, related_name="books")
class Review(models.Model):
book = models.ForeignKey(Book, on_delete=models.CASCADE, related_name="reviews")
rating = models.IntegerField()
comment = models.TextField()
Basic Usage
Imagine you want to process all books, but there are too many of them to load them all into memory at once. You therefore need to process them in chunks.
If you use just native django, it will look something like this:
from chunkator import chunkator_page
for chunk in chunkator_page(Book.objects.all().prefetch_related("author", "translator", "publisher"), 10_000):
for book in chunk:
print(book.author.name, book.translator.name if book.translator is not None else None)
print(book.publisher.name)
This will work, with two caveats:
- On each chunk, Django will make separate queries to fetch the author and translator
- The author, translator and publisher objects will be fetched from the database for each chunk
This is the primary usecase for this package. When used like this:
from django_memoized_prefetch import MemoizedPrefetch, MemoizedPrefetchConfig
from chunkator import chunkator_page
memoized_prefetch = MemoizedPrefetch(
MemoizedPrefetchConfig(Author, ["author", "translator"]),
MemoizedPrefetchConfig(Publisher, ["publisher"], prefetch_all=True),
)
for chunk in chunkator_page(Book.objects.all(), 10_000):
memoized_prefetch.process_chunk(chunk)
for book in chunk:
print(book.author.name, book.translator.name if book.translator is not None else None)
print(book.publisher.name)
The processing will be more efficient, because:
- All publishers will get fetched before processing any chunks, and they will be reused across all chunks
- The author and translator objects will be fetched using one query
- Any authors and translators that appeared in previous chunks will not be fetched again
Nested attributes
You can also prefetch nested attributes using both dotted notation and undersore notation, for example, in this example both would work.
memoized_prefetch = MemoizedPrefetch(
MemoizedPrefetchConfig(Publisher, ["book.publisher"]),
MemoizedPrefetchConfig(Author, ["book__author"]),
)
for chunk in chunkator_page(Review.objects.all(), 10000):
memoized_prefetch.process_chunk(chunk)
...
Many-to-Many Relationships
Many-to-many relationships are supported as well, caching the target model, while fetching the through model for each chunk.
from django_memoized_prefetch import MemoizedPrefetch, MemoizedPrefetchConfig
from chunkator import chunkator_page
# Configure for many-to-many relationships
memoized_prefetch = MemoizedPrefetch(
MemoizedPrefetchConfig(
model=Category,
attributes=["categories"],
is_many_to_many=True,
through_model=Book.categories.through,
source_field="book_id",
target_field="category_id",
)
)
# Process books with their categories
for chunk in chunkator_page(Book.objects.all(), 10000):
memoized_prefetch.process_chunk(chunk)
for book in chunk:
# Categories are prefetched and available
category_names = [cat.name for cat in book.categories.all()]
print(f"Book: {book.title}, Categories: {', '.join(category_names)}")
Usage outside chunked processing
If you have multiple foreign keys to the same table, this package can be used to optimise the database queries even when not processing data in chunks.
Configuration Options
MemoizedPrefetchConfig Parameters
model(required): The Django model class to prefetchattributes(required): List of attribute names to prefetch on your objectsqueryset(optional): Custom queryset for the model (for additional select_related/prefetch_related)prefetch_all(optional, default: False): Whether to prefetch all objects at initialisationlru_cache_size(optional, default: 10,000): Maximum number of objects to keep in cacheis_many_to_many(optional, default: False): Set to True for many-to-many relationshipsthrough_model(optional): Through model for many-to-many relationshipssource_field(optional): Source field name in the through modeltarget_field(optional): Target field name in the through model
Advanced Configuration
from django.db import models
# Custom queryset with select_related
config = MemoizedPrefetchConfig(
model=Author,
attributes=["author"],
queryset=Author.objects.select_related(...),
lru_cache_size=5000,
)
# Prefetch all objects at startup (useful for small, frequently accessed tables)
config = MemoizedPrefetchConfig(
model=Publisher,
attributes=["publisher"],
prefetch_all=True,
)
Integrations with other packages.
The package automatically supports django-seal when available, all querysets which are sealable will be automatically sealed.
This package works when using django-tenants.
Best Practices
- Use appropriate cache sizes: Set
lru_cache_sizebased on your expected data volume and available memory - Prefetch related objects: Use custom querysets with
select_relatedorprefetch_relatedfor nested relationships - Consider prefetch_all: Use
prefetch_all=Truefor small, frequently accessed reference tables - Process in reasonable chunks: Balance memory usage with query efficiency when choosing chunk sizes
- Monitor cache hit rates: Ensure your cache size is appropriate for your data access patterns
Testing
Run the test suite:
uv run pytest
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Authors
- Mikuláš Poul (mikulas.poul@xelix.com)
- Cameron Hobbs (cameron.hobbs@xelix.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django_memoized_prefetch-0.1.1.tar.gz.
File metadata
- Download URL: django_memoized_prefetch-0.1.1.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc762b9412f0d831953ddfd5bc16992eff35a0d69eb5f80cdee9619520c4a025
|
|
| MD5 |
d3342afc43424258ec383f6f61f47ac9
|
|
| BLAKE2b-256 |
1577fe62506ead9dbc9abb635ea27ea6313afc589fe9a9f3b7728fb49cc21b1a
|
Provenance
The following attestation bundles were made for django_memoized_prefetch-0.1.1.tar.gz:
Publisher:
build.yml on xelixdev/django-memoized-prefetch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_memoized_prefetch-0.1.1.tar.gz -
Subject digest:
fc762b9412f0d831953ddfd5bc16992eff35a0d69eb5f80cdee9619520c4a025 - Sigstore transparency entry: 434523027
- Sigstore integration time:
-
Permalink:
xelixdev/django-memoized-prefetch@f8f59377c58d8426fdb2437e93c3faca7a7e605b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/xelixdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@f8f59377c58d8426fdb2437e93c3faca7a7e605b -
Trigger Event:
push
-
Statement type:
File details
Details for the file django_memoized_prefetch-0.1.1-py3-none-any.whl.
File metadata
- Download URL: django_memoized_prefetch-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fe2cccebdd142e08d51dc1a5154589656ecf133b651e42d9168369abc91e9a1
|
|
| MD5 |
514630d706fabb5925b85ce013945618
|
|
| BLAKE2b-256 |
b75fd51cc9e76e379091d00735442488864641b69f30dece3bb843b78583f1ba
|
Provenance
The following attestation bundles were made for django_memoized_prefetch-0.1.1-py3-none-any.whl:
Publisher:
build.yml on xelixdev/django-memoized-prefetch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_memoized_prefetch-0.1.1-py3-none-any.whl -
Subject digest:
2fe2cccebdd142e08d51dc1a5154589656ecf133b651e42d9168369abc91e9a1 - Sigstore transparency entry: 434523054
- Sigstore integration time:
-
Permalink:
xelixdev/django-memoized-prefetch@f8f59377c58d8426fdb2437e93c3faca7a7e605b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/xelixdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@f8f59377c58d8426fdb2437e93c3faca7a7e605b -
Trigger Event:
push
-
Statement type: