Skip to main content

No project description provided

Project description

django-data-fetcher

What is this?

This library contains 3 utilities:

  1. a function to access the currently-executing django request from anywhere
    • This can help with a wide variety of fringe use-cases, e.g. wanting to access the current request from a decoupled log helper function
  2. A caching abstraction to cache a function's return value on the request
    • This is the most frequently useful thing in this package
  3. A data-fetcher abstraction to ease batching
    • In addition to caching, this helps with batching, i.e. N+1 queries
    • This is the least important, most complicated and most specific use case. For historical reasons, the entire library is named after this part. Naming things is hard :(

Installation

pip install django-data-fetcher

After installing, you'll need to add our middleware:

# settings.py 
# ...
MIDDLEWARE = [
    # ...
    "data_fetcher.middleware.GlobalRequestMiddleware",
    # ...
]

Usage

Accessing the global request object

Thanks to the middleware, accessing the global request is simple:

from data_fetcher.util import get_request

def some_random_util_function():
    request = get_request()
    do_something_with_request(request)

Caching

If you'd just like to cache a function so you don't repeat it in different places, you can just use the cache_within_request decorator:

from data_fetcher import cache_within_request

@cache_within_request
def get_most_recent_order(user_id):
    return Order.objects.filter(user_id=user_id).order_by('-created_at').first()

Now you can call get_most_recent_order as many times as you want within a request, e.g. in template helpers and in views, and it will only hit the database once (assuming you use the same user_id). This is a wrapper around functools.cache, so it will also cache across calls to the same function with the same arguments.

Batching

This library also supports batching fetching logic. You need to subclass our DataFetcher class and implement a batch_load (or batch_load_dict) method with the batching logic. Then you can use its factory method to get an instance of your fetcher class, and call its get(), get_many(), or prefetch_keys() methods.

For example, it's usually pretty difficult to efficiently fetch permissions for a list of objects without coupling your view to your templates/helpers. With this library, we can offload the work to a data-fetcher instance and have a re-usable template-helper that checks permissions. When we notice performance problems, we simply add a prefetch_keys call to our view to pre-populate the cache.

# my_app/fetchers.py

from data_fetcher import DataFetcher

class ArticlePermissionFetcher(DataFetcher):
    def batch_load_dict(self, article_ids):
        permissions = ArticlePermission.objects.filter(article_id__in=article_ids)
        return {p.article_id: p for p in permissions}

# my_app/template_helpers.py

from my_app.fetchers import ArticlePermissionFetcher

@register.simple_tag(takes_context=True)
def can_read_article(context, article):
    """
        called in a loop in article_list.html, e.g. to condtionally render a link
    """
    fetcher = ArticlePermissionFetcher.get_instance()
    permission = fetcher.get(article.id)
    return permission.can_read

# my_app/views.py

from my_app.fetchers import ArticlePermissionFetcher

def article_list(request):
    articles = Article.objects.all()
    fetcher = ArticlePermissionFetcher.get_instance()
    fetcher.prefetch_keys([a.id for a in articles])
    return render(request, 'article_list.html', {'articles': articles})

Behind the scenes, fetchers' get_instance will use the global-request middleware's request object to always return the same instance of the fetcher for the same request. This allows the fetcher to call your batch function once, when the view calls prefetch_keys, and then use the cached results for all subsequent calls to get or get_many.

Fetchers also cache values that were called with get or get_many. If you request a key that isn't cached, it will call your batch method again for that single key. It's recommended to monitor your queries while developing with a tool like django-debug-toolbar.

Fetcher API

Public method:

  • get(key) : fetch a single resource by key
  • get_many(keys) : fetch multiple resources by key, returns a list
  • get_many_as_dict(keys) : like get_many, but returns a dict indexed by your requested keys
  • prefetch_keys(keys) : Like get-many but returns nothing. Pre-populates the cache with a list of keys. This is useful when you know you're going to need a lot of objects, and you want to avoid N+1 queries.
  • prime(key,value) manually set a value in the cache. This isn't recommended, but it can be useful for performance in certain cases
  • enqueue_keys(keys) : Keys get added to queue, which gets fetched the next time get, get_many or prefetch_keys is called. It is often more convenient to use this than to collect all required keys and call prefetch_keys.
  • get_lazy/get_lazy_many: (*experimental) enqueues the key and returns a lazy object wrapper. The lazy object's get() method will return the value when called. This API might be replaced with smarter lazy objects in the future.

Subclass-API:

You can implement batch_load(keys) OR batch_load_dict(keys).

  • batch_load(keys) needs to return a list of resources in the same order (and length) as the keys. If a resource is missing, you need an explicit None in the returned list.
  • batch_load_dict(keys) should return a dict of resources, indexed by the keys. If a value is missing, None will be returned when that key is requested (it tolerates missing keys).

Shortcuts

It's extremely common to want to fetch a single object by id, or by a parent's foreign key. We provide a few baseclasses for this:

from data_fetcher import AbstractModelByIdFetcher, AbstractChildModelByAttrFetcher

class ArticleByIdFetcher(AbstractModelByIdFetcher):
    model = Article

class ArticleByAuthorIdFetcher(AbstractChildModelByAttrFetcher):
    model = Article
    parent_attr = 'author_id'

In fact, the ID fetcher was so common we have a factory for it. This factory returns the same class every time, so you can use it in multiple places without worrying about creating multiple classes with distinct caches.

from data_fetcher import PrimaryKeyFetcherFactory

ArticleByIdFetcher = PrimaryKeyFetcherFactory.get_model_by_id_fetcher(Article)
ArticleByIdFetcher2 = PrimaryKeyFetcherFactory.get_model_by_id_fetcher(Article)
assert ArticleByIdFetcher == ArticleByIdFetcher2

article_1 = ArticleByIdFetcher.get_instance().get(1)

Testing data-fetchers

Batch logic is often complex and error-prone. We recommend writing tests for your fetchers. provides a mock request object that you can use to test your fetchers. Without this context-manager, your fetchers won't be able to cache anything and might raise errors. Here's an example in pytest:

from data_fetcher.util import GlobalRequest

def test_article_permission_fetcher(django_assert_num_queries):
    with GlobalRequest():
        with django_assert_num_queries(1):
            fetcher = ArticlePermissionFetcher.get_instance(request)
            fetcher.prefetch_keys([1, 2, 3])
            assert fetcher.get(1).can_read
            assert not fetcher.get(2).can_read
            assert not fetcher.get(3).can_read

Note that this context-manager also allows you to use the cache decorator and data-fetchers inside other scenarios, such as celery tasks.

How to provide non-key data to fetchers

Data-fetcher's main feature is not performance, but enabling decoupling. The view layer no longer has to be responsible for passing data to downstream consumers (e.g. utils, template-helpers, service-objects, etc.).

This paradigm shift can be a challenging adjustment. For instance, our ArticlePermissionFetcher above was naïve. Permission records should be fetched with respect to a user. How can we provide the user's ID to the fetcher?

It's tempting to subclass DataFetcher and add a user argument to its get_instance() method. Unfortunately, extending the factory pattern is rather complex. There are broadly 3 different ways to solve this problem:

  1. Use the global request to get the user. This is the simplest solution, but it limits your data-fetcher to the current user. You couldn't, for example, build a view that shows a list of articles available to other users.
  2. Create composite-keys: instead of loading permissions by article id, you load them by (user_id, article_id) pairs. This is a good solution, but is often complex to implement and you usually don't need this flexibility.
  3. Dynamically create a data-fetcher class that has a reference to the user

The 3rd solution fulfills the OOP temptation of adding a user argument to the constructor, but it's a "higher-order" solution. Rather than attaching the user to the fetcher-class, we would dynamically create a class that has a reference to the user, and then use a factory to ensure we recycle the same class for the same user.

There's a builtin shortcut for this pattern, too, called ValueBoundDataFetcher. ValueBoundDataFetcher classes have a bound_value attribute available inside their batch-load methods.

# my_app/fetchers.py
from data_fetcher import ValueBoundDataFetcher

class UserBoundArticlePermissionFetcher(ValueBoundDataFetcher):
    def batch_load_dict(self, article_ids):
        user_id = self.bound_value
        permissions = ArticlePermission.objects.filter(user_id=user_id, article_id__in=article_ids)
        return { p.article_id: p for p in permissions }

# my_app/views.py
from my_app.fetchers import UserBoundArticlePermissionFetcher

def article_list(request):
    # generate a class that has a reference to the user
    UserBoundArticlePermissionFetcher = ValueBoundDataFetcher.get_value_bound_class(
        UserBoundArticlePermissionFetcher, 
        request.user.id
    )
    fetcher = UserBoundArticlePermissionFetcher.get_instance()
    articles = Article.objects.all()
    fetcher.prefetch_keys([a.id for a in articles])
    return render(request, 'article_list.html', {'articles': articles})

With this solution, we're still able to create fetchers for multiple users. However, it won't be as efficient as the composite-key solution (e.g. one query per user vs. one query for all users).

Note that bound_value can be anything, so you can use this pattern to provide more than a single piece of data to your fetcher, just make sure it's hashable so it can be used as a key (otherwise, you'll want to pass separate value and key kwargs to get_value_bound_class.

Recipe: Caching a single data-structure with complex data

Batching logic often has a high-cognitive load, it may not be worth it to batch everything. Fortunately, the @cache_within_request decorator can cache anything, there's no need to restrict ourselves to a single resource. For instance, let's say we have a complex home-feed page that needs to fetch a lot of data for a particular user. We can use the cache_within_request decorator to cache the entire data-structure.

@cache_within_request
def get_home_feed_data(user_id):
    user = User.objects.filter(id=user_id).prefetch_related(
        Prefetch('articles', queryset=Article.objects.filter(deleted=False), to_attr='article_list'),
        Prefetch('articles__comments', queryset=Comment.objects.filter(deleted=False), to_attr='comment_list'),
        Prefetch('articles__author', queryset=User.objects.all(), to_attr='author_list'),
        # ...
    )
    more_data = get_more_data(user)
    # assemble a rich data structure with convenient API
    return {
        'user': user,
        'articles': user.article_list,
        'comments': flatten([article.comment_list for article in user.article_list]),
        'articles_by_id': # ...
        'comments_by_article_id': # ...
        'comments_by_id': # ...
        # ...
    }

Now any function can request the entire structure and use its rich API. We can isolate the ugly fetching logic and don't need to pass data around (e.g. view -> template -> helpers) to remain efficient.

This is not a perfect approach, as it couples our consumers (e.g. views, helpers) to this data-structure. This makes it difficult to re-use those helpers, or parts of the data-structure. However, in a pinch, it may be preferable to setting up fetchers (e.g. article-by-id, comments-by-article-id) for every atomic piece of data. A neat compromise might be to split this up into multiple cache functions, or a class that that executes other cached functions lazily.

Async

Like most ORM-consuming code, data-fetcher is synchronous. You'll need to use sync_to_async to use it inside async views. Behind the scenes, the global-request middleware uses context-vars, which are both thread-safe and async-safe.

Cache invalidation

You can probably ignore cache invalidation, since the cache is cleared at the end of each request. However, if you change data that has been cached and want updated data during the same request, you can use the clear_request_cache function. This will clear all data-fetchers and @cache_within_request caches.

from data_fetcher.util import clear_request_caches

def update_article(request, article_id):
    article = ArticleFetcher.get_instance().get(article_id)
    article.title = 'new title'
    article.save()

    # in case render_page uses the article-fetcher, 
    # we clear all data-fetchers
    clear_request_caches()
    return render_page(article_id)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-data-fetcher-2.2.tar.gz (16.3 kB view hashes)

Uploaded Source

Built Distribution

django_data_fetcher-2.2-py3-none-any.whl (11.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page