Skip to main content

A Python manager for the cache

Project description

Tests Coverage

cache-manager

Description

cache-manager is a lightweight, Pythonic cache for files with an SQLite registry. It lets you:

  • Store files under a cache directory while tracking metadata in SQLite.
  • Version entries automatically based on a stable key derived from a URI/parameters.
  • Track and query item status (UNINITIALIZED, WRITE, READY, FAILED, TRASH).
  • Attach type-aware attributes (int, float, varchar, datetime, text) and search by them.
  • Open plain and compressed files (gz, zip, tar(.gz|.bz2)) via a unified interface.
  • Clean up stale records/files and keep the best/most recent ready entries.

This is ideal for reproducible data pipelines and ETL steps where you want deterministic, discoverable artifacts.

Table of Contents

Installation

Clone and install in editable mode (no extra tools required):

git clone https://github.com/saezlab/cache-manager.git
cd cache-manager
python -m venv .venv
source .venv/bin/activate
pip install -e .

Alternatively, if you prefer Poetry:

git clone https://github.com/saezlab/cache-manager.git
cd cache-manager
poetry install

Usage

The API centers around two types: Cache (manager) and CacheItem (one file + metadata). Create a cache, create or retrieve items, write files, mark them READY, and open them later.

Minimal example:

import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")
item = cache.create(
    uri="https://example.org/data.tsv",
    params={"dataset": "demo", "version": 1},
    attrs={"species": "human", "rows": 1200},
    status=Status.WRITE.value,
    filename="data.tsv",
)

with open(item.path, "w", encoding="utf-8") as f:
    f.write("col_a\tcol_b\n1\t2\n")

item.ready()
best = cache.best(
    uri="https://example.org/data.tsv",
    params={"dataset": "demo", "version": 1},
)
print(best, best.path)

Run the included example script which downloads a real dataset and caches it:

python scripts/hello_cachedir.py

Configuration

There is no global config file; you configure the cache per instance:

  • Cache(path: str | None = None, pkg: str | None = None)
    • path: explicit directory for cache (contains the SQLite registry and files).
    • pkg: if set, uses an OS-specific cache directory for that application name via platformdirs (e.g., on Linux: ~/.cache/<pkg>).

Common item fields:

  • uri (str): a canonical identifier used for the hash key (together with params).
  • params (dict): serialized to the stable key; changing them yields a new key.
  • attrs (dict): typed attributes persisted to attribute tables for rich queries.
  • status (int): from cachedir._status.Status (READY, WRITE, etc.).
  • filename (str): filename to be used in the cache; extension is auto-inferred.

Logging/session helpers are available under cachedir.session and cachedir.log if you want simple trace output.

Examples

  1. Create or reuse an item with best_or_new.
import os
import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")
uri = "https://example.org/report.csv"
params = {"year": 2026, "cohort": "A"}

item = cache.best_or_new(
    uri=uri,
    params=params,
    attrs={"kind": "report", "format": "csv"},
    filename="report.csv",
    new_status=Status.WRITE.value,
)

if item.status != Status.READY.value or not os.path.exists(item.path):
    with open(item.path, "w", encoding="utf-8") as f:
        f.write("id,value\n1,42\n")
    item.ready()

print("Using:", item.path)
  1. Query by attributes and metadata.
import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")

cache.create(
    uri="demo://sample-1",
    attrs={"project": "alpha", "batch": 1, "score": 0.95},
    status=Status.READY.value,
)
cache.create(
    uri="demo://sample-2",
    attrs={"project": "alpha", "batch": 2, "score": 0.71},
    status=Status.READY.value,
)

ids = cache.by_attrs({"project": "alpha", "batch": 2})
print("matching ids:", ids)

items = cache.search(uri="demo://sample-2", status=Status.READY.value)
for it in items:
    print(it.version_id, it.attrs)
  1. Open a cached file through CacheItem.open.
import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")
item = cache.create(
    uri="demo://text-file",
    filename="hello.txt",
    status=Status.WRITE.value,
)

with open(item.path, "w", encoding="utf-8") as f:
    f.write("hello\nworld\n")

item.ready()

opened = item.open(default_mode="r", encoding="utf-8", large=True)
print(next(iter(opened.result)).strip())

Contributing

Contributions are welcome! A typical flow:

Please open issues and pull requests on GitHub. If you plan a larger change, consider discussing it in an issue first. (A dedicated CONTRIBUTING.md may be added later.)

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Contact

OmniPath Team - omnipathdb@gmail.com

Project page: https://github.com/saezlab/cache-manager

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachedir-0.1.2.tar.gz (133.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachedir-0.1.2-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file cachedir-0.1.2.tar.gz.

File metadata

  • Download URL: cachedir-0.1.2.tar.gz
  • Upload date:
  • Size: 133.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cachedir-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f450088bfcf6b64f8b0f5c68e9bb7eb5d04db62f8686e3f55ab5da06a05825e0
MD5 72b48fb4654224a6a0bc07cb79e8109f
BLAKE2b-256 80392d7892aa9c436b3a0ec1a3bd77ba93de0af0116669ceee906f490257c568

See more details on using hashes here.

File details

Details for the file cachedir-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cachedir-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cachedir-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0e52659de7f481f32204853948dcead785b2b58e48a06a454224b7dbe33f673d
MD5 3b1269876ca57079800c9fff52c2c815
BLAKE2b-256 aa385092f7e3bdcb7a1bfd9f11a9c045f474ce1cc29bd5a402ed007b5e23ff08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page