Skip to main content

Disk-based caching for functions returning pickleable objects and pandas DataFrames, plain and simple.

Project description

cachetto

Disk-based caching for functions returning pickleable objects and pandas DataFrames, plain and simple.

PyPI - Python Version PyPI Tests Ruff

[!WARNING]

cachetto is experimental, the API is subject to changes.

Getting Started

This is a simple library, but it can be handy for those who had to deal with codebases that have functions that take it's time to generate or process tabular data in the form of dataframes, either due to slow computations or queries. If that may be your case, take a look at the usage to see if you may find some help here.

Features:

  • Seamless caching for functions or methods returning that can be pickled, including pandas dataframes

  • Customizable cache directory

  • Cache expiration with invalid_after (e.g., "1d", "6h")

  • Toggle caching on or off

  • Uses pickle to serialize the data

Installation

cachetto is available on PyPI, and can be installed with:

# Using uv
uv add cachetto
# Using pip
pip install cachetto

The only required dependency is pandas>=1.5.3 and Python 3.10 or higher.

Usage

The API consists basically of a single decorator cached.

Minimal usage (No config)

Just decorate your function. By default, it uses an internal cache directory and never invalidates:

from cachetto import cached
import pandas as pd

@cached
def get_data():
    print("Running expensive computation...")
    return {"df": pd.DataFrame({"value": range(10)}), "meta": ("some data", 1)}

result = get_data()  # Will run and cache
result = get_data()  # Will load from cache

Custom cache directry

Specify where cached files should be stored:

@cached(cache_dir="cache_files")
def load_big_dataframe():
    return pd.DataFrame({"big": range(100000)})

Add cache expiration

Expire the cache after a certain duration (e.g., 1 day, 3 hours):

@cached(cache_dir="cache_files", invalid_after="1d")
def get_fresh_data():
    return pd.DataFrame({"timestamp": [pd.Timestamp.now()]})

If the cached file is older than 1 day, the function will re-run and overwrite the cache.

Temporarily disable caching

Use the caching_enabled flag to bypass cache logic (e.g., for debugging, when running on a different environment):

@cached(caching_enabled=False)
def debug_function():
    print("No caching here")
    return pd.DataFrame({"x": range(3)})

Clear cached files manually

You can programmatically clear the cache for a decorated function:

@cached
def some_data():
    return pd.DataFrame({"numbers": [1, 2, 3]})

some_data.clear_cache()  # Deletes all cached files for this function

Use with class methods

Works equally with class methods:

class MyModel:
    @cached(cache_dir="model_cache")
    def load_data(self):
        return pd.DataFrame({"model": ["A", "B", "C"]})

Development

Tests

Every new feature must include the corresponding tests, and ensure the coverage statys at 100% for the CI job to succeed:

make unit-tests       # Tests while developing with the default version
make cov-tests        # Check the coverage (html report generated)
make test-all-python  # Runs the tests with all the supported python versions

Lint

Pre-commit is integrated for linting and formatting, and additionally, mypy must be run to ensure the typing is correct:

make typecheck-mypy

Release

Locally, running make release will build and release the package in PyPI, but the CI is prepared to do it. Bump the version accordingly, create a new tag, and push it, this will trigger the release job:

uv version --bump [BUMP]
git tag v[NEW VERSION]
git push origin v[NEW VERSION]

License

This repository is licensed under the MIT License.

Credits

It's heavily inspired by cachier, but with a builtin support for pandas dataframes, and just disk-based caching based on pickle.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachetto-1.1.0.tar.gz (56.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachetto-1.1.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file cachetto-1.1.0.tar.gz.

File metadata

  • Download URL: cachetto-1.1.0.tar.gz
  • Upload date:
  • Size: 56.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cachetto-1.1.0.tar.gz
Algorithm Hash digest
SHA256 598d1039f54dea48bf4c85706e2dfd26de14cd6344ead8bd608adae72fb98cde
MD5 e8051fcf6e08280eb32416316a1868ec
BLAKE2b-256 d5a64d699c43a80f133fe57e9a8b0bfccbd7d30a801a9581e58b6cf6a238f7b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cachetto-1.1.0.tar.gz:

Publisher: ci.yaml on plaguss/cachetto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cachetto-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: cachetto-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cachetto-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9bd6541d7e27e37fadafb98e260632ba2df6c0a042a4593f9e0ec26bc816cf70
MD5 001bffcf1225d19999a8fbff02e2d32d
BLAKE2b-256 1d4e4f877048e1065d5b363b96cebdbf8f8b7c183fe53e958a99bca1c2f25a9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cachetto-1.1.0-py3-none-any.whl:

Publisher: ci.yaml on plaguss/cachetto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page