On-Disk Input-keyed Cache — disk-backed memoization with pydantic-aware encoding
Project description
emboss
On-Disk Input-keyed Cache — disk-backed memoization with pydantic-aware encoding.
Version: 0.2.0
pip install emboss # core (just diskcache)
pip install emboss[pydantic] # + pydantic v2 BaseModel support
Why
functools.lru_cache is per-process. diskcache survives invocations but pickles values as-is — which breaks the moment your cached return type is a pydantic BaseModel defined in __main__ (the new process can't unpickle __main__.MyModel). emboss fixes that by detecting BaseModel return annotations and converting to/from plain dicts at the cache boundary.
Plus: a None-aware sentinel so functions returning None actually cache instead of re-running every call.
Quick start
import diskcache
from emboss import cached
cache = diskcache.Cache("/tmp/my-cache")
@cached(cache)
def fetch(url: str) -> dict:
import requests
return requests.get(url).json()
fetch("https://api.example.com/users/1") # network
fetch("https://api.example.com/users/1") # cached, no network
Pydantic BaseModel returns
emboss reads the function's return type annotation. If it sees a BaseModel, list[BaseModel], dict[str, BaseModel], or BaseModel | None, it serialises via model.model_dump() before pickling and rehydrates via Model.model_validate(...) on read. The cached value on disk is a plain dict — round-trips cleanly across process boundaries, even for models defined in __main__.
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
@cached(cache)
def get_user(uid: int) -> User | None:
...
@cached(cache)
def list_users() -> list[User]:
...
@cached(cache)
def users_by_id() -> dict[str, User]:
...
Functions returning non-BaseModel types continue to pickle as-is — fully backward-compatible.
None caching
@cached(cache)
def lookup(query: str) -> str | None:
return external_api(query)
lookup("missing") # returns None, cached
lookup("missing") # returns cached None, no re-run
The previous behaviour (skip-cache-on-None) is replaced by a _MISSING sentinel internally so None is a valid cached value.
Cache key
Arguments are converted via safe_jsonable_encoder (recursive JSON-friendly conversion handling sets, bytes, dates, Path, BaseModel, and objects with __dict__), then hashed with the function source + name. Re-decorating the same function body → same key; changing the function body → new key (transparent cache invalidation on code change).
Custom or strict encoder (default=)
safe_jsonable_encoder mirrors json.dumps(default=): pass a callable that handles types no built-in handler matched, or None for strict mode that raises on unknown types.
# strict mode — raise on anything we can't serialise
@cached(cache, default=None)
def f(x: dict) -> str:
...
# custom fallback — e.g. include a deterministic hash for opaque objects
def my_default(obj):
return obj.cache_key() if hasattr(obj, "cache_key") else hashlib.md5(repr(obj).encode()).hexdigest()
@cached(cache, default=my_default)
def g(complicated_input) -> dict:
...
The package default is default=str, which preserves the loose 0.1 behaviour of falling back to str(obj). Use strict mode when your inputs include objects without __dict__ whose str(obj) includes a memory address — those addresses change every process invocation and would silently bust the cache key.
Pluggable backends (Cache protocol)
cached accepts any object satisfying the runtime-checkable Cache protocol:
from typing import Any, Protocol, runtime_checkable
@runtime_checkable
class Cache(Protocol):
def get(self, key: str, default: Any = None) -> Any: ...
def set(self, key: str, value: Any) -> Any: ...
Structural typing — no inheritance required. diskcache.Cache, emboss.FileCache, and any custom Redis / in-memory adapter you write all work out of the box.
FileCache backend — NFS-safe alternative to diskcache
from emboss import FileCache, cached
cache = FileCache(".data/cache")
@cached(cache)
def expensive(x: int) -> dict:
...
diskcache stores entries in SQLite, and SQLite over NFS has broken file-locking — two cluster nodes hitting the same .data/cache mount on VAST get sqlite3.OperationalError: locking protocol. FileCache writes one file per key via tempfile + os.replace (atomic rename, NFS-safe), with (key, value) pickled. Concurrent writers race on the same file path but POSIX rename is atomic and the winning version is by construction equally correct (cache values are pure functions of the key).
Drop-in for the subset of diskcache.Cache API @cached uses (get, set, __contains__, __getitem__, __setitem__, __delitem__, delete, clear, close, context-manager). Extra diskcache kwargs (timeout, size_limit, eviction_policy) are accepted and ignored so call sites switch with no code changes.
Async support
@cached(cache)
async def fetch_async(url: str) -> dict:
async with httpx.AsyncClient() as c:
return (await c.get(url)).json()
Cache hits return a fresh awaitable wrapping the cached value, so the call site keeps await-ing as normal.
Daily-rolling caches
The diskcache.Cache instance you pass is yours to manage. A common pattern for "expire daily" without thinking about it:
from datetime import date
import diskcache
cache = diskcache.Cache(f"/tmp/my-cache-{date.today()}")
Each new day → new dir → effectively fresh cache. Old dirs land in /tmp and get reaped by the OS.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emboss-0.2.0.tar.gz.
File metadata
- Download URL: emboss-0.2.0.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2d534440bac4ea2c422637a3e87bb2a999ad1d3a792fa38c93d5117776fbb52
|
|
| MD5 |
cf98adf236d8e50e3c3641d54ff468d3
|
|
| BLAKE2b-256 |
f1ea15e108014aef2eaa45c1c019f5608e835c7ada8d09d5a9329003f87784a5
|
Provenance
The following attestation bundles were made for emboss-0.2.0.tar.gz:
Publisher:
release.yml on DJRHails/emboss
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emboss-0.2.0.tar.gz -
Subject digest:
b2d534440bac4ea2c422637a3e87bb2a999ad1d3a792fa38c93d5117776fbb52 - Sigstore transparency entry: 1630770355
- Sigstore integration time:
-
Permalink:
DJRHails/emboss@f314deedd3f935cf1064ecf87f21dcddf955f072 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/DJRHails
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f314deedd3f935cf1064ecf87f21dcddf955f072 -
Trigger Event:
push
-
Statement type:
File details
Details for the file emboss-0.2.0-py3-none-any.whl.
File metadata
- Download URL: emboss-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7400a28eefb0e2be030f64e675ec844f1affe10c6fd56d9395c84d7fbb06c90
|
|
| MD5 |
7d35c3c28f5eaf217f215e7567095c8f
|
|
| BLAKE2b-256 |
701d5f981a7390a47c3001170d83d46d42f922267216d5ed444be76746b911af
|
Provenance
The following attestation bundles were made for emboss-0.2.0-py3-none-any.whl:
Publisher:
release.yml on DJRHails/emboss
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emboss-0.2.0-py3-none-any.whl -
Subject digest:
f7400a28eefb0e2be030f64e675ec844f1affe10c6fd56d9395c84d7fbb06c90 - Sigstore transparency entry: 1630770358
- Sigstore integration time:
-
Permalink:
DJRHails/emboss@f314deedd3f935cf1064ecf87f21dcddf955f072 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/DJRHails
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f314deedd3f935cf1064ecf87f21dcddf955f072 -
Trigger Event:
push
-
Statement type: