Skip to main content

Python caching library with tag-based invalidation and dogpile effect prevention

Project description

https://badge.fury.io/py/HermesCache.svg

HermesCache

Hermes is a Python caching library. It is designed to fulfil the following requirements:

  • Tag-based cache invalidation
  • Dogpile effect prevention
  • Thread-safety
  • Straightforward design
  • Simple, at the same time, flexible decorator API
  • Interface for implementing custom backends

Implemented backends: redis, memcached, dict.

Note

Hermes 0.8+ is Python 3-only. Previous versions supported Python 2.

Installation

pip install HermesCache

For Redis and Memcached it has the following extra dependencies.

HermesCache[redis] Pure Python Redis client
HermesCache[redis-ext] Pure Python Redis client & C extension parser
HermesCache[memcached] Pure Python Memcached client
HermesCache[memcached-ext] C extension Memcached client

Usage

The following demonstrates all end-user API.

import hermes.backend.redis


cache = hermes.Hermes(hermes.backend.redis.Backend, ttl = 600, host = 'localhost', db = 1)

@cache
def foo(a, b):
  return a * b

class Example:

  @cache(tags = ('math', 'power'), ttl = 1200)
  def bar(self, a, b):
    return a ** b

  @cache(tags = ('math', 'avg'), key = lambda fn, a, b: f'avg:{a}:{b}')
  def baz(self, a, b):
    return (a + b) / 2

print(foo(2, 333))

example = Example()
print(example.bar(2, 10))
print(example.baz(2, 10))

foo.invalidate(2, 333)
example.bar.invalidate(2, 10)
example.baz.invalidate(2, 10)

cache.clean(['math']) # invalidate entries tagged 'math'
cache.clean()         # flush cache

Note

The API encourages import-time instantiation of Hermes to allow decoration of existing module classes and functions, to make caching tansparent to them. The instantiation has no side effects. Underlying backend server connections are lazy.

Moreover, if backend configuration is only available at runtime, hermes.backend instance permits runtime replacement.

For advanced examples look in the test suite.

Tagging cache entries

First let’s look how basic caching works.

import hermes.backend.dict


cache = hermes.Hermes(hermes.backend.dict.Backend)

@cache
def foo(a, b):
  return a * b

foo(2, 2)
foo(2, 4)

print(cache.backend.dump())
#  {
#    'cache:entry:foo:515d5cb1a98de31d': 8,
#    'cache:entry:foo:a1c97600eac6febb': 4
#                            ↓
#                      argument hash
#  }

Basically we have a key-value storage with O(1) complexity for set, get and delete. This means that the speed of the operations is constant and irrelevant of number of items already stored. When a callable (function or method) is cached, the key is calculated per invocation from callable itself and passed arguments. Callable’s return value is saved to the key. Next invocation we can use the value from cache.

“There are only two hard problems in Computer Science: cache invalidation and naming things.” — Phil Karlton

So it comes in a complex application. There’s a case that certain group of methods operate the same data and it’s impractical to invalidate individual entries. In particular, it often happens when method returns complex values, spanning multiple entities. Cache tagging makes it possible to mark this group of method results with a tag and invalidate them all at once.

To keep invalidation fast here’s an implementation of Eric Florenzano’s idea that he explained in Tagging cache keys for O(1) batch invalidation [6]. Let’s look at the code.

import hermes.backend.dict


cache = hermes.Hermes(hermes.backend.dict.Backend)

@cache(tags = ('tag1', 'tag2'))
def foo(a, b):
  return a * b

foo(2, 2)

print(cache.backend.dump())
#  {
#    'cache:tag:tag1': '0674536f9eb4eb19',
#    'cache:tag:tag2': 'db22b5ab2e504895',
#    'cache:entry:foo:a1c97600eac6febb:c1da510b3d42bad6': 4
#                                              ↓
#                                           tag hash
#  }

When we want to tag a cache entry, first we need to create its tags’ entries. Each tag is represented by its own entry. The value of a tag entry is set to a random value each time tag is created. Once all tags values exist, they are joined and hashed. The tag hash is added to the cache entry key.

Once we want to invalidate tagged entries by a tag, we just need to remove the tag entry. Without any of tag values tag hash was created with, it is impossible to construct the entry key so the tagged cache entries become inaccessible, hence invalidated.

What’s happens with performance? Do all operations become O(n) where n is the number of entry tags? Actually, no. Since we rarely need more than a few dozens of tags, practically it is still O(1). Tag entry operations are batched so the implications on the number of network operations go as follows:

  • set – 3x backend calls (get + 2 * set) in worst case. Average is expected to be 2x when all used tag entries are created.
  • get – 2x backend calls.
  • delete – 2x backend calls.

Memory overhead consists of tag entries and stale cache entries. Demonstrated below.

import hermes.backend.dict


cache = hermes.Hermes(hermes.backend.dict.Backend)

@cache(tags = ('tag1', 'tag2'))
def foo(a, b):
  return a * b

foo(2, 2)

print(cache.backend.dump())
#  {
#    'cache:tag:tag1': '047820ac777abe8a',
#    'cache:tag:tag2': '126365ec7175e851',
#    'cache:entry:foo:a1c97600eac6febb:5cae80f5e7d58329': 4
#  }

cache.clean(['tag1'])
foo(2, 2)

print(cache.backend.dump())
#  {
#    'cache:tag:tag1': '66336fec212def16',  ← recreated tag entry
#    'cache:tag:tag2': '126365ec7175e851',
#    'cache:entry:foo:a1c97600eac6febb:8e7e24cf70c1f0ab': 4,
#    'cache:entry:foo:a1c97600eac6febb:5cae80f5e7d58329': 4  ← garbage
#  }

So the TTLs should be chosen elaborately. With Redis backend it’s also recommended to set maxmemory-policy [1] to volatile-lru.

Backend and client library

This section explains extra dependencies.

Redis

hermes.backend.redis depends on redis [2]. Optionally hiredis [3] can be used to boost Redis protocol parsing. However, hiredis gives significant advantage on big bulk operations and in context of the package improves performance on ~10%.

Memcached

hermes.backend.memcached depends either on pure-python python3-memcached [4] or on, libmemcached wrapper, pylibmc [5]. pylibmc gives ~50% performance improvement.

Dict

hermes.backend.dict is neither complete backend, nor it is suited for distributed use. The original purpose was a development need. And in fact it’s just a wrapper on Python dict. It doesn’t have any memory limiting. Though, it can be used in the limited number of cases where cache size is a priori small.


[1]https://redis.io/topics/config
[2]https://pypi.org/project/redis/
[3]https://pypi.org/project/hiredis/
[4]https://pypi.org/project/python3-memcached/
[5]https://pypi.org/project/pylibmc/
[6]http://web.archive.org/web/20160420192724/http://eflorenzano.com/blog/2009/03/02/tagging-cache-keys-o1-batch-invalidation/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for HermesCache, version 0.8.0
Filename, size File type Python version Upload date Hashes
Filename, size HermesCache-0.8.0.tar.gz (19.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page