django-s3-express-cache
Project description
S3 Express Cache Backend for Django
A scalable, open source Django cache backend powered by Amazon S3 Express One Zone — cheaper, durable, and ready for production.
Why
Django ships with two main distributed cache backends, but neither is a great fit for many or large objects:
| Backend | Pros | Cons |
|---|---|---|
| Database cache | Easy to set up | Does a COUNT(*) on every get, set, or touch, which does not perform on large cache tables. |
| Redis/Memcached | Fast, widely used | Expensive to run at scale (large RAM bills, cluster management) |
On the other hand, S3 Express One Zone provides an S3 bucket with single-digit-millisecond latency that is cheap, durable, and can scale to millions of objects, large and small.
S3 Express does not support automatic item expiration, so we use S3 lifecycle rules, a fixed-width header prepended to each item, and clever key names to manage and cull the cache as needed.
Features
-
Scalable & cost-effective - cache huge datasets without memory overhead. By using S3, you can scale to virtually unlimited capacity at a fraction of the cost.
-
Simpler large-scale cleanup - delegates stale object removal to S3 Lifecycle Rules, minimizing application-level logic.
-
Faster reads & fewer bytes - supports header-only range requests to detect expiry and skip downloading full objects on misses.
-
Future-proof format - compact binary header with versioning and reserved fields inspired by TCP frames for future functionality..
-
Easy integration — configure your Django CACHES settings, add the necessary S3 Lifecycle Rules, and you're ready to go.
Trade-offs
-
S3 Express specifics: biggest wins come if you can use S3 Express One Zone (directory buckets); Lifecycle rules in directory buckets are prefix-based only, so prefixes must be carefully planned.
-
Lifecycle rule setup: initial setup requires scripts to create rules, introducing a small implementation overhead. Once configured, cleanup is automatic, but planning and provisioning are required upfront.
Requirements
- Django 5.x
- Python ≥ 3.13
- boto3 v1.38.36+
- Works in any AWS region where S3 Express One Zone is available
- Best used in the same Availability Zone as your application
Design overview
Motivation
This backend was inspired by an issue raised in CourtListener’s repository. In short:
-
Django’s DB cache can become a performance bottleneck under heavy load, especially when culling expired rows. Queries like
SELECT COUNT(*) FROM django_cachecaused significant slowdowns once the cache table grows large. In our experience running CourtListener, the DB cache is one of the heaviest consumers of database resources. -
Django's in-memory caches do not scale well when caching large objects or many small ones.
-
S3 is highly scalable, cost-effective, and capable of storing very large objects. Instead of relying on costly culling queries (like the DB cache), we can use S3 lifecycle rules to automatically clean up stale entries, keeping performance stable without scripts or app-level logic.
This implementation builds on those ideas and delivers a production-ready, efficient, and extensible cache backend, designed to integrate naturally with Django’s caching framework.
Key design for Maximal S3 Throughput and Automatic Culling
-
S3 Express One Zone uses directory buckets, which support Lifecycle policies but only with limited filters (prefix and size, no tags).
To work within these constraints, our design relies on explicit time-based key prefixes (e.g.,
1-days/,7-days/,30-days/) that reflect the expiration period of each item. Expirations are supported for up to 1,000 days, and each cache key must use the prefix corresponding to the next whole day beyond the item’s expiration. For example:- An item expiring today should use a key like
1-days:foo. - An item expiring in 25 hours should have a key like
2-days:bar.
This approach allows cache entries to be automatically removed using simple prefix-based lifecycle rules.
- An item expiring today should use a key like
-
Keys of the form
N-days:actual_keyare rewritten toN-days/actual_key(with a slash instead of a colon). This spreads objects across S3 key prefixes, improving S3 partitioning and request throughput. -
When adding something to the cache, the key name is validated against the expiration date for the item. If the expiration exceeds the
N-dayslimit, the write is rejected. This prevents accidentally storing long-lived items under a short-lived namespace and keeps lifecycle-based culling predictable. Such errors will generally be caught during development.
Header format (fixed-width, versioned)
We prepend a compact header to every object. Current layout (struct format: QHHQ):
| Field | Type | Bytes | Notes |
|---|---|---|---|
| expiration_time | Q | 8 | UNIX timestamp in seconds (int). 0 means persistent. |
| header_version | H | 2 | Starts at 1. Used for compatibility checks. |
| compression_type | H | 2 | 0 = none. Reserved for future use (e.g., zlib, zstd). |
| extra (reserved) | Q | 8 | Reserved for future metadata |
Using a fixed-width header allows the cache to Range-read only the header. Items remain in the cache until S3 Lifecyle rules complete, so this allows your application to check the expiration of an object before downloading it. If the item is expired, that's a cache miss. If not, the entire object is downloaded and returned.
[!NOTE] The code is written to treat mismatched versions as unsupported (safe default). You can add backward parsers in the future if needed.
Performance Optimizations
To optimize data transfer and improve performance, the backend implements early exits:
-
has_key:
Uses an S3Rangerequest to fetch only the header bytes.- If the item is expired → treated as a cache miss without downloading the full value.
- If the item is persistent or still valid → considered a hit.
-
get:
Streams the object in header-sized chunks.
After reading the header (first chunk), expiry is evaluated.- If expired → the operation exits immediately without fetching the remaining data.
- If valid → streaming continues to reconstruct the cached object.
Lazy boto3 Client Initialization
Creating a boto3 client (and even importing boto3 itself) can be relatively expensive. To avoid adding this overhead to Django’s general startup time, the backend initializes the client lazily using a @cached_property.
This means:
- The boto3 client is created only on first use.
- Subsequent accesses reuse the cached client instance.
- Application startup remains fast, while still ensuring efficient reuse of the client once it’s needed.
Security
This backend uses Python’s pickle with HIGHEST_PROTOCOL, providing fast serialization and broad support for Python object types.
-
Why pickle?
Django’s own file-based and database-backed cache backends both rely on pickle internally, each with their own write method. We chose to follow this pattern for consistency, compatibility, and flexibility—especially since our goal was a backend as capable as Django’s built-ins.
-
Why not JSON or other formats?
Alternatives like JSON (and faster variants such as orjson or ujson) are safer but limited to basic types. This prevents caching complex objects like templates or query results, which are common use cases for Django’s cache system. We also tested msgpack, which offers more flexibility, but it failed to serialize some of the objects we needed.
[!CAUTION] Pickle should only be used with trusted data that your own application writes and reads. Never unpickle untrusted payloads. If your use case requires stricter, data-only serialization, formats like JSON or MessagePack are safer but keep in mind their type limitations.
Usage
There are five steps to using this cache:
-
Install it
-
Configure it in your django settings
-
Set up the S3 Express bucket
-
Configure lifecycle rules for automatic cache culling
-
Use it!
Installation
From PyPI:
pip install django-s3-express-cache
From GitHub (latest dev):
pip install git+https://github.com/freelawproject/django-s3-express-cache.git@master
Configuration
We do not recommend this cache as your primary, default cache. Instead, it should be used as a secondary cache for larger or longer-living objects by putting something like the following in your Django settings:
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": REDIS_URL,
},
"s3": {
"BACKEND": "django_s3_express_cache.S3ExpressCacheBackend",
"LOCATION": "S3_CACHE_BUCKET_NAME",
"OPTIONS": {
"HEADER_VERSION": 1,
}
}
}
Bucket Set Up
You must use an S3 Express One Zone (Directory bucket). Directory bucket names must follow this format and comply with the rules for directory bucket naming:
bucket-base-name--zone-id--x-s3
For example, the following directory bucket name contains the Availability Zone ID usw2-az1:
bucket-base-name--usw2-az1--x-s3
When you create a directory bucket you must also provide configuration details:
aws s3api create-bucket --bucket test-cache-personal-express--usw2-az1--x-s3 --create-bucket-configuration 'Location={Type=AvailabilityZone,Name=usw2-az1},Bucket={DataRedundancy=SingleAvailabilityZone,Type=Directory}' --region us-west-2
Lifecycle Rule Set Up
A timestamp stored in the item's fixed-width header is used to ensure that items expire at the correct time.
Lifecycle rules are used to cull stale items from the cache. Rules should be configured to cull objects by prefix.
For example, without a KEY_PREFIX:
- Objects under 7-days/ expire after 7 days
- Objects under 30-days/ expire after 30 days
{
"Rules": [
{
"ID": "Expire-7-days-prefix", (1)
"Filter": { "Prefix": "7-days/" }, (2)
"Status": "Enabled", (3)
"Expiration": { "Days": 7 } (4)
},
{
"ID": "Expire-30-days-prefix",
"Filter": { "Prefix": "30-days/" },
"Status": "Enabled",
"Expiration": { "Days": 30 }
}
]
}
① Give the rule a name
② Set the rule to the "7-days" directory
③ Enable the rule
④ Set the expiration time to match the directory name
[!NOTE] If you configure
KEY_PREFIXin your Django settings, this prefix is prepended to all keys. Your S3 Lifecycle rules must include theKEY_PREFIXwhen defining the filter. For example, ifKEY_PREFIX = "cache-v1"then the7-daysrule should filtercache-v1/7-days/instead of just7-days/.
These lifecycle rules complement the cache’s in-object header expiration. The header allows our implementation to short-circuit reads (treating expired items as misses), while S3 lifecycle policies ensure expired data is eventually deleted from the bucket.
The following script demonstrates how to configure up to 1,000 lifecycle rules in a bucket. To run it, your IAM must have at least the following permissions:
s3:PutLifecycleConfiguratios3:GetLifecycleConfiguration
import boto3
# Replace with your bucket name
BUCKET_NAME = "your-bucket-name"
s3 = boto3.client("s3")
rules = []
for i in range(1, 1000):
# Handle pluralization
suffix = "days" if i > 1 else "day"
prefix = f"{i}-{suffix}"
rules.append({
"ID": f"expire-{i}-{suffix}",
"Filter": {"Prefix": prefix},
"Status": "Enabled",
"Expiration": {"Days": i},
})
lifecycle_config = {"Rules": rules}
response = s3.put_bucket_lifecycle_configuration(
Bucket=BUCKET_NAME,
LifecycleConfiguration=lifecycle_config
)
Use It!
Once your backend is configured and lifecycle rules are in place, you can start using it like any other Django cache client.
1. Basic set/get operations
from django.core.cache import caches
client = caches["s3"]
# Store a value for 60 seconds
client.set("1-days:example-key", {"foo": "bar"}, timeout=60)
# Retrieve the value
value = client.get("1-days:example-key")
print(value) # {"foo": "bar"}
# Check existence
exists = client.has_key("1-days:example-key")
print(exists) # True
2. Time-based prefixes
# Allowed: timeout <= 1 day
client.set("1-days:short-lived", "value", timeout=60 * 60) # 1 hour
# Not allowed: timeout exceeds prefix
client.set("1-days:too-long", "value", timeout=7 * 24 * 60 * 60)
# Raises ValueError
3. Expiration checks
The backend embeds an expiration timestamp in the object header. Expired objects still exist in S3 until lifecycle rules delete them, but reads will return None automatically.
import time
client.set("1-days:temp", "hello", timeout=5)
time.sleep(10)
print(client.get("1-days:temp")) # None
4. Deleting a value
client.delete("1-days:example-key")
- Persistent objects (never expire)
You can store a persistent object by passing
timeout=None. These objects are never considered expired by the backend, and their header expiration timestamp is set to 0. Be careful not to use a time-based prefix (N-days:) for persistent items, as that will raise aValueError.
# Persistent key (never expires)
client.set("persistent:config", {"feature_flag": True}, timeout=None)
# Retrieve persistent object
value = client.get("persistent:config")
print(value) # {"feature_flag": True}
# Check existence
exists = client.has_key("persistent:config")
print(exists) # True
# Deleting persistent object
client.delete("persistent:config")
# Attempting to store a persistent object under a time-based prefix
client.set("1-days:persistent_config", {"feature_flag": True}, timeout=None)
# Raises ValueError
Roadmap
- Reserved header fields allow future compression support (zlib/zstd).
clear()andtouch()methods open for contribution.- Performance benchmarks welcome.
Testing
python -m django test --settings 'tests.settings'
License
This repository is available under the permissive BSD license, making it easy and safe to incorporate in your own libraries.
Pull and feature requests are welcome.
Acknowledgements
Inspired by CourtListener issue #5304 and Django issue 32785.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django_s3_express_cache-0.1.0.tar.gz.
File metadata
- Download URL: django_s3_express_cache-0.1.0.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de993257e79161e3ccb95d7f8cd54db50d329347524a97a23d1484c528c65541
|
|
| MD5 |
1a9e6ab76abe27d4f76b15d0493700a4
|
|
| BLAKE2b-256 |
603e2a1a092c6c7d7f13172bb6c61b86a0136338c4772e664d06808573c2ca60
|
File details
Details for the file django_s3_express_cache-0.1.0-py3-none-any.whl.
File metadata
- Download URL: django_s3_express_cache-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4afdcf962c8c7001d39ba0fb1db2e1ed2fb755b5aead35c75d87e2675bdabfd6
|
|
| MD5 |
91b3030709e756704d8102deb054c3ff
|
|
| BLAKE2b-256 |
2f7affad8e5118a11da20b8398bb38732167110da8cdfb0f2c47366c44faaaa2
|