Skip to main content

A Python manager for the cache

Project description

Tests Coverage

cachedir

Description

cachedir is a lightweight, Pythonic cache for files with an SQLite registry. It lets you:

  • Store files under a cache directory while tracking metadata in SQLite.
  • Version entries automatically based on a stable key derived from a URI/parameters.
  • Track and query item status (UNINITIALIZED, WRITE, READY, FAILED, TRASH).
  • Attach type-aware attributes (int, float, varchar, datetime, text) and search by them.
  • Open plain and compressed files (gz, zip, tar(.gz|.bz2)) via a unified interface.
  • Clean up stale records/files and keep the best/most recent ready entries.

This is ideal for reproducible data pipelines and ETL steps where you want deterministic, discoverable artifacts.

Table of Contents

Installation

Clone and install in editable mode (no extra tools required):

git clone https://github.com/saezlab/cachedir.git
cd cachedir
python -m venv .venv
source .venv/bin/activate
pip install -e .

Alternatively, if you prefer Poetry:

git clone https://github.com/saezlab/cachedir.git
cd cachedir
poetry install

Usage

The API centers around two types: Cache (manager) and CacheItem (one file + metadata). Create a cache, create or retrieve items, write files, mark them READY, and open them later.

Minimal example:

import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")
item = cache.create(
    uri="https://example.org/data.tsv",
    params={"dataset": "demo", "version": 1},
    attrs={"species": "human", "rows": 1200},
    status=Status.WRITE.value,
    filename="data.tsv",
)

with open(item.path, "w", encoding="utf-8") as f:
    f.write("col_a\tcol_b\n1\t2\n")

item.ready()
best = cache.best(
    uri="https://example.org/data.tsv",
    params={"dataset": "demo", "version": 1},
)
print(best, best.path)

Run the included example script which downloads a real dataset and caches it:

python scripts/hello_cachedir.py

Configuration

There is no global config file; you configure the cache per instance:

  • Cache(path: str | None = None, pkg: str | None = None)
    • path: explicit directory for cache (contains the SQLite registry and files).
    • pkg: if set, uses an OS-specific cache directory for that application name via platformdirs (e.g., on Linux: ~/.cache/<pkg>).

Common item fields:

  • uri (str): a canonical identifier used for the hash key (together with params).
  • params (dict): serialized to the stable key; changing them yields a new key.
  • attrs (dict): typed attributes persisted to attribute tables for rich queries.
  • status (int): from cachedir._status.Status (READY, WRITE, etc.).
  • filename (str): filename to be used in the cache; extension is auto-inferred.

Logging/session helpers are available under cachedir.session and cachedir.log if you want simple trace output.

Examples

  1. Create or reuse an item with best_or_new.
import os
import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")
uri = "https://example.org/report.csv"
params = {"year": 2026, "cohort": "A"}

item = cache.best_or_new(
    uri=uri,
    params=params,
    attrs={"kind": "report", "format": "csv"},
    filename="report.csv",
    new_status=Status.WRITE.value,
)

if item.status != Status.READY.value or not os.path.exists(item.path):
    with open(item.path, "w", encoding="utf-8") as f:
        f.write("id,value\n1,42\n")
    item.ready()

print("Using:", item.path)
  1. Query by attributes and metadata.
import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")

cache.create(
    uri="demo://sample-1",
    attrs={"project": "alpha", "batch": 1, "score": 0.95},
    status=Status.READY.value,
)
cache.create(
    uri="demo://sample-2",
    attrs={"project": "alpha", "batch": 2, "score": 0.71},
    status=Status.READY.value,
)

ids = cache.by_attrs({"project": "alpha", "batch": 2})
print("matching ids:", ids)

items = cache.search(uri="demo://sample-2", status=Status.READY.value)
for it in items:
    print(it.version_id, it.attrs)
  1. Open a cached file through CacheItem.open.
import cachedir as cm
from cachedir._status import Status

cache = cm.Cache(path="./my_cache")
item = cache.create(
    uri="demo://text-file",
    filename="hello.txt",
    status=Status.WRITE.value,
)

with open(item.path, "w", encoding="utf-8") as f:
    f.write("hello\nworld\n")

item.ready()

opened = item.open(default_mode="r", encoding="utf-8", large=True)
print(next(iter(opened.result)).strip())

Contributing

Contributions are welcome! A typical flow:

Please open issues and pull requests on GitHub. If you plan a larger change, consider discussing it in an issue first. (A dedicated CONTRIBUTING.md may be added later.)

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Contact

OmniPath Team - omnipathdb@gmail.com

Project page: https://github.com/saezlab/cachedir

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachedir-0.1.3.tar.gz (140.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachedir-0.1.3-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file cachedir-0.1.3.tar.gz.

File metadata

  • Download URL: cachedir-0.1.3.tar.gz
  • Upload date:
  • Size: 140.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cachedir-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f3822bdb8324b0f4c201454de2a83596eae57b2135c9bca89bb35cfffb41ba4a
MD5 5d6dd8873e89bf0e8869f884f6e3169d
BLAKE2b-256 e4034c99047bccacb74bccc94341c96ab664b22199d3a64c639ef08580e3f74d

See more details on using hashes here.

File details

Details for the file cachedir-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: cachedir-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cachedir-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5d2a143e9ddd4553b90d5b875a0456db3049868b9bc638a08bcc74b37d721cd4
MD5 f65e5191b8cf4a3ad481d2cde4874ff2
BLAKE2b-256 840fd8b8a81afce661d091abf8211dd9d2a8145dfb641bd4aa387b84c4878cc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page