Disk-based caching for functions returning pickleable objects and pandas DataFrames, plain and simple.
Project description
cachetto
Disk-based caching for functions returning pickleable objects and pandas DataFrames, plain and simple.
[!WARNING]
cachetto is experimental, the API is subject to changes.
Getting Started
This is a simple library, but it can be handy for those who had to deal with codebases that have functions that take it's time to generate or process tabular data in the form of dataframes, either due to slow computations or queries. If that may be your case, take a look at the usage to see if you may find some help here.
Features:
-
Seamless caching for functions or methods returning that can be pickled, including pandas dataframes
-
Customizable cache directory
-
Cache expiration with invalid_after (e.g., "1d", "6h")
-
Toggle caching on or off
-
Uses pickle to serialize the data
Installation
cachetto is available on PyPI, and can be installed with:
# Using uv
uv add cachetto
# Using pip
pip install cachetto
The only required dependency is pandas>=1.5.3 and Python 3.10 or higher.
Usage
The API consists basically of a single decorator cached.
Minimal usage (No config)
Just decorate your function. By default, it uses an internal cache directory and never invalidates:
from cachetto import cached
import pandas as pd
@cached
def get_data():
print("Running expensive computation...")
return {"df": pd.DataFrame({"value": range(10)}), "meta": ("some data", 1)}
result = get_data() # Will run and cache
result = get_data() # Will load from cache
Custom cache directry
Specify where cached files should be stored:
@cached(cache_dir="cache_files")
def load_big_dataframe():
return pd.DataFrame({"big": range(100000)})
Add cache expiration
Expire the cache after a certain duration (e.g., 1 day, 3 hours):
@cached(cache_dir="cache_files", invalid_after="1d")
def get_fresh_data():
return pd.DataFrame({"timestamp": [pd.Timestamp.now()]})
If the cached file is older than 1 day, the function will re-run and overwrite the cache.
Temporarily disable caching
Use the caching_enabled flag to bypass cache logic (e.g., for debugging, when running on a different environment):
@cached(caching_enabled=False)
def debug_function():
print("No caching here")
return pd.DataFrame({"x": range(3)})
Clear cached files manually
You can programmatically clear the cache for a decorated function:
@cached
def some_data():
return pd.DataFrame({"numbers": [1, 2, 3]})
some_data.clear_cache() # Deletes all cached files for this function
Use with class methods
Works equally with class methods:
class MyModel:
@cached(cache_dir="model_cache")
def load_data(self):
return pd.DataFrame({"model": ["A", "B", "C"]})
Development
Work in progress
License
This repository is licensed under the MIT License.
Credits
It's heavily inspired by cachier, but with a builtin support for pandas dataframes, and just disk-based caching based on pickle.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cachetto-1.0.0.tar.gz.
File metadata
- Download URL: cachetto-1.0.0.tar.gz
- Upload date:
- Size: 55.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7278f663c87b0ee595a2c67c1b3adf30107cb180609be172ddd558f977366d8f
|
|
| MD5 |
99d303af464ec4c60f432f2ced4bfbb0
|
|
| BLAKE2b-256 |
33fc7fb91954bab258daad5d0dcef39d23f8890da2bc676108b99dc0e079767e
|
Provenance
The following attestation bundles were made for cachetto-1.0.0.tar.gz:
Publisher:
ci.yaml on plaguss/cachetto
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cachetto-1.0.0.tar.gz -
Subject digest:
7278f663c87b0ee595a2c67c1b3adf30107cb180609be172ddd558f977366d8f - Sigstore transparency entry: 245037755
- Sigstore integration time:
-
Permalink:
plaguss/cachetto@d324ac047e708a86090401c74efe32c3e49e69b8 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/plaguss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yaml@d324ac047e708a86090401c74efe32c3e49e69b8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cachetto-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cachetto-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
397085e23f9a362efc878c3570ae9cf6b11e8688d1af6d13fc614ffa979da1bb
|
|
| MD5 |
078c8826489d48b9dabb539aa6698caf
|
|
| BLAKE2b-256 |
f6426599d63c09519ede7e5ac89f6bc7faec15e50b8a4a7046d2be028cd3db2d
|
Provenance
The following attestation bundles were made for cachetto-1.0.0-py3-none-any.whl:
Publisher:
ci.yaml on plaguss/cachetto
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cachetto-1.0.0-py3-none-any.whl -
Subject digest:
397085e23f9a362efc878c3570ae9cf6b11e8688d1af6d13fc614ffa979da1bb - Sigstore transparency entry: 245037764
- Sigstore integration time:
-
Permalink:
plaguss/cachetto@d324ac047e708a86090401c74efe32c3e49e69b8 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/plaguss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yaml@d324ac047e708a86090401c74efe32c3e49e69b8 -
Trigger Event:
push
-
Statement type: