Skip to main content

On-Disk Input-keyed Cache — disk-backed memoization with pydantic-aware encoding

Project description

emboss

On-Disk Input-keyed Cache — disk-backed memoization with pydantic-aware encoding.

Version: 0.2.0

pip install emboss              # core (just diskcache)
pip install emboss[pydantic]    # + pydantic v2 BaseModel support

Why

functools.lru_cache is per-process. diskcache survives invocations but pickles values as-is — which breaks the moment your cached return type is a pydantic BaseModel defined in __main__ (the new process can't unpickle __main__.MyModel). emboss fixes that by detecting BaseModel return annotations and converting to/from plain dicts at the cache boundary.

Plus: a None-aware sentinel so functions returning None actually cache instead of re-running every call.

Quick start

import diskcache
from emboss import cached

cache = diskcache.Cache("/tmp/my-cache")

@cached(cache)
def fetch(url: str) -> dict:
    import requests
    return requests.get(url).json()

fetch("https://api.example.com/users/1")  # network
fetch("https://api.example.com/users/1")  # cached, no network

Pydantic BaseModel returns

emboss reads the function's return type annotation. If it sees a BaseModel, list[BaseModel], dict[str, BaseModel], or BaseModel | None, it serialises via model.model_dump() before pickling and rehydrates via Model.model_validate(...) on read. The cached value on disk is a plain dict — round-trips cleanly across process boundaries, even for models defined in __main__.

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

@cached(cache)
def get_user(uid: int) -> User | None:
    ...

@cached(cache)
def list_users() -> list[User]:
    ...

@cached(cache)
def users_by_id() -> dict[str, User]:
    ...

Functions returning non-BaseModel types continue to pickle as-is — fully backward-compatible.

None caching

@cached(cache)
def lookup(query: str) -> str | None:
    return external_api(query)

lookup("missing")  # returns None, cached
lookup("missing")  # returns cached None, no re-run

The previous behaviour (skip-cache-on-None) is replaced by a _MISSING sentinel internally so None is a valid cached value.

Cache key

Arguments are converted via safe_jsonable_encoder (recursive JSON-friendly conversion handling sets, bytes, dates, Path, BaseModel, and objects with __dict__), then hashed with the function source + name. Re-decorating the same function body → same key; changing the function body → new key (transparent cache invalidation on code change).

Custom or strict encoder (default=)

safe_jsonable_encoder mirrors json.dumps(default=): pass a callable that handles types no built-in handler matched, or None for strict mode that raises on unknown types.

# strict mode — raise on anything we can't serialise
@cached(cache, default=None)
def f(x: dict) -> str:
    ...

# custom fallback — e.g. include a deterministic hash for opaque objects
def my_default(obj):
    return obj.cache_key() if hasattr(obj, "cache_key") else hashlib.md5(repr(obj).encode()).hexdigest()

@cached(cache, default=my_default)
def g(complicated_input) -> dict:
    ...

The package default is default=str, which preserves the loose 0.1 behaviour of falling back to str(obj). Use strict mode when your inputs include objects without __dict__ whose str(obj) includes a memory address — those addresses change every process invocation and would silently bust the cache key.

Pluggable backends (Cache protocol)

cached accepts any object satisfying the runtime-checkable Cache protocol:

from typing import Any, Protocol, runtime_checkable

@runtime_checkable
class Cache(Protocol):
    def get(self, key: str, default: Any = None) -> Any: ...
    def set(self, key: str, value: Any) -> Any: ...

Structural typing — no inheritance required. diskcache.Cache, emboss.FileCache, and any custom Redis / in-memory adapter you write all work out of the box.

FileCache backend — NFS-safe alternative to diskcache

from emboss import FileCache, cached

cache = FileCache(".data/cache")

@cached(cache)
def expensive(x: int) -> dict:
    ...

diskcache stores entries in SQLite, and SQLite over NFS has broken file-locking — two cluster nodes hitting the same .data/cache mount on VAST get sqlite3.OperationalError: locking protocol. FileCache writes one file per key via tempfile + os.replace (atomic rename, NFS-safe), with (key, value) pickled. Concurrent writers race on the same file path but POSIX rename is atomic and the winning version is by construction equally correct (cache values are pure functions of the key).

Drop-in for the subset of diskcache.Cache API @cached uses (get, set, __contains__, __getitem__, __setitem__, __delitem__, delete, clear, close, context-manager). Extra diskcache kwargs (timeout, size_limit, eviction_policy) are accepted and ignored so call sites switch with no code changes.

Async support

@cached(cache)
async def fetch_async(url: str) -> dict:
    async with httpx.AsyncClient() as c:
        return (await c.get(url)).json()

Cache hits return a fresh awaitable wrapping the cached value, so the call site keeps await-ing as normal.

Daily-rolling caches

The diskcache.Cache instance you pass is yours to manage. A common pattern for "expire daily" without thinking about it:

from datetime import date
import diskcache
cache = diskcache.Cache(f"/tmp/my-cache-{date.today()}")

Each new day → new dir → effectively fresh cache. Old dirs land in /tmp and get reaped by the OS.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emboss-0.2.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emboss-0.2.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file emboss-0.2.0.tar.gz.

File metadata

  • Download URL: emboss-0.2.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for emboss-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b2d534440bac4ea2c422637a3e87bb2a999ad1d3a792fa38c93d5117776fbb52
MD5 cf98adf236d8e50e3c3641d54ff468d3
BLAKE2b-256 f1ea15e108014aef2eaa45c1c019f5608e835c7ada8d09d5a9329003f87784a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for emboss-0.2.0.tar.gz:

Publisher: release.yml on DJRHails/emboss

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file emboss-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: emboss-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for emboss-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7400a28eefb0e2be030f64e675ec844f1affe10c6fd56d9395c84d7fbb06c90
MD5 7d35c3c28f5eaf217f215e7567095c8f
BLAKE2b-256 701d5f981a7390a47c3001170d83d46d42f922267216d5ed444be76746b911af

See more details on using hashes here.

Provenance

The following attestation bundles were made for emboss-0.2.0-py3-none-any.whl:

Publisher: release.yml on DJRHails/emboss

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page