Skip to main content

A Python manager for the cache

Project description

Tests Coverage

cache-manager

Description

cache-manager is a lightweight, Pythonic cache for files with an SQLite registry. It lets you:

  • Store files under a cache directory while tracking metadata in SQLite.
  • Version entries automatically based on a stable key derived from a URI/parameters.
  • Track and query item status (UNINITIALIZED, WRITE, READY, FAILED, TRASH).
  • Attach type-aware attributes (int, float, varchar, datetime, text) and search by them.
  • Open plain and compressed files (gz, zip, tar(.gz|.bz2)) via a unified interface.
  • Clean up stale records/files and keep the best/most recent ready entries.

This is ideal for reproducible data pipelines and ETL steps where you want deterministic, discoverable artifacts.

Table of Contents

Installation

Clone and install in editable mode (no extra tools required):

git clone https://github.com/saezlab/cache-manager.git
cd cache-manager
python -m venv .venv
source .venv/bin/activate
pip install -e .

Alternatively, if you prefer Poetry:

git clone https://github.com/saezlab/cache-manager.git
cd cache-manager
poetry install

Usage

The API centers around two types: Cache (manager) and CacheItem (one file + metadata). Create a cache, create or retrieve items, write files, mark them READY, and open them later.

Minimal example:

import cache_manager as cm
from cache_manager._status import Status

cache = cm.Cache(path="./my_cache")
item = cache.create(
    uri="https://example.org/data.tsv",
    params={"dataset": "demo", "version": 1},
    attrs={"species": "human", "rows": 1200},
    status=Status.WRITE.value,
    filename="data.tsv",
)

with open(item.path, "w", encoding="utf-8") as f:
    f.write("col_a\tcol_b\n1\t2\n")

item.ready()
best = cache.best(
    uri="https://example.org/data.tsv",
    params={"dataset": "demo", "version": 1},
)
print(best, best.path)

Run the included example script which downloads a real dataset and caches it:

python scripts/hello_cache_manager.py

Configuration

There is no global config file; you configure the cache per instance:

  • Cache(path: str | None = None, pkg: str | None = None)
    • path: explicit directory for cache (contains the SQLite registry and files).
    • pkg: if set, uses an OS-specific cache directory for that application name via platformdirs (e.g., on Linux: ~/.cache/<pkg>).

Common item fields:

  • uri (str): a canonical identifier used for the hash key (together with params).
  • params (dict): serialized to the stable key; changing them yields a new key.
  • attrs (dict): typed attributes persisted to attribute tables for rich queries.
  • status (int): from cache_manager._status.Status (READY, WRITE, etc.).
  • filename (str): filename to be used in the cache; extension is auto-inferred.

Logging/session helpers are available under cache_manager.session and cache_manager.log if you want simple trace output.

Examples

  1. Create or reuse an item with best_or_new.
import os
import cache_manager as cm
from cache_manager._status import Status

cache = cm.Cache(path="./my_cache")
uri = "https://example.org/report.csv"
params = {"year": 2026, "cohort": "A"}

item = cache.best_or_new(
    uri=uri,
    params=params,
    attrs={"kind": "report", "format": "csv"},
    filename="report.csv",
    new_status=Status.WRITE.value,
)

if item.status != Status.READY.value or not os.path.exists(item.path):
    with open(item.path, "w", encoding="utf-8") as f:
        f.write("id,value\n1,42\n")
    item.ready()

print("Using:", item.path)
  1. Query by attributes and metadata.
import cache_manager as cm
from cache_manager._status import Status

cache = cm.Cache(path="./my_cache")

cache.create(
    uri="demo://sample-1",
    attrs={"project": "alpha", "batch": 1, "score": 0.95},
    status=Status.READY.value,
)
cache.create(
    uri="demo://sample-2",
    attrs={"project": "alpha", "batch": 2, "score": 0.71},
    status=Status.READY.value,
)

ids = cache.by_attrs({"project": "alpha", "batch": 2})
print("matching ids:", ids)

items = cache.search(uri="demo://sample-2", status=Status.READY.value)
for it in items:
    print(it.version_id, it.attrs)
  1. Open a cached file through CacheItem.open.
import cache_manager as cm
from cache_manager._status import Status

cache = cm.Cache(path="./my_cache")
item = cache.create(
    uri="demo://text-file",
    filename="hello.txt",
    status=Status.WRITE.value,
)

with open(item.path, "w", encoding="utf-8") as f:
    f.write("hello\nworld\n")

item.ready()

opened = item.open(default_mode="r", encoding="utf-8", large=True)
print(next(iter(opened.result)).strip())

Contributing

Contributions are welcome! A typical flow:

Please open issues and pull requests on GitHub. If you plan a larger change, consider discussing it in an issue first. (A dedicated CONTRIBUTING.md may be added later.)

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Contact

OmniPath Team - omnipathdb@gmail.com

Project page: https://github.com/saezlab/cache-manager

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachedir-0.1.1.tar.gz (145.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachedir-0.1.1-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file cachedir-0.1.1.tar.gz.

File metadata

  • Download URL: cachedir-0.1.1.tar.gz
  • Upload date:
  • Size: 145.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cachedir-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c13b2ad300955e202b645c5376c827f89ac9b5b4de0c63dcb0ad74c1ef284af9
MD5 5d43dd4318f611fff59069cd94493dca
BLAKE2b-256 cdb2a04a858b504fd88ed3c8680a0c0485633641c963b03bcc5e2bd8538ca0cc

See more details on using hashes here.

File details

Details for the file cachedir-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cachedir-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 42.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cachedir-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 29dbd639401c4b2c1a8627d5b60b9cfc3d6d2ef88a84dfa6dd7332a77dd0718d
MD5 cbb48d6a7da000723f741545bbf2b776
BLAKE2b-256 c546276b7905f874f1804da889ea3cc51166cd50cf9c6a455669a8e77e24c111

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page