Skip to main content

Disk-based caching for functions returning pandas DataFrames, plain and simple.

Project description

cachetto

Disk-based caching for functions returning pandas DataFrames, plain and simple.

Ruff

[!WARNING]

cachetto is experimental, the API is subject to changes.

Getting Started

This is a simple library, but it can be handy for those who had to deal with codebases that have functions that take it's time to generate or process tabular data in the form of dataframes, either due to slow computations or queries. If that may be your case, take a look at the usage to see if you may find some help here.

Features:

  • Seamless caching for functions or methods returning a pandas.DataFrame

  • Customizable cache directory

  • Cache expiration with invalid_after (e.g., "1d", "6h")

  • Toggle caching on or off

  • Uses fast, efficient .parquet format

Installation

cachetto is available on PyPI, and can be installed with:

# Using uv
uv add cachetto
# Using pip
pip install cachetto

The only required dependency is pandas>=1.5.3 and Python 3.10 or higher.

Usage

The API consists basically of a single decorator cached.

Minimal usage (No config)

Just decorate your function. By default, it uses an internal cache directory and never invalidates:

from cachetto import cached
import pandas as pd

@cached
def get_data():
    print("Running expensive computation...")
    return pd.DataFrame({"value": range(10)})

df = get_data()  # Will run and cache
df = get_data()  # Will load from cache

Custom cache directry

Specify where cached files should be stored:

@cached(cache_dir="cache_files")
def load_big_dataframe():
    return pd.DataFrame({"big": range(100000)})

Add cache expiration

Expire the cache after a certain duration (e.g., 1 day, 3 hours):

@cached(cache_dir="cache_files", invalid_after="1d")
def get_fresh_data():
    return pd.DataFrame({"timestamp": [pd.Timestamp.now()]})

If the cached file is older than 1 day, the function will re-run and overwrite the cache.

Temporarily disable caching

Use the caching_enabled flag to bypass cache logic (e.g., for debugging, when running on a different environment):

@cached(caching_enabled=False)
def debug_function():
    print("No caching here")
    return pd.DataFrame({"x": range(3)})

Clear cached files manually

You can programmatically clear the cache for a decorated function:

@cached
def some_data():
    return pd.DataFrame({"numbers": [1, 2, 3]})

some_data.clear_cache()  # Deletes all cached files for this function

Use with class methods

Works equally with class methods:

class MyModel:
    @cached(cache_dir="model_cache")
    def load_data(self):
        return pd.DataFrame({"model": ["A", "B", "C"]})

Development

Work in progress

License

This repository is licensed under the MIT License.

Credits

It's heavily inspired by cachier, but with a more narrow focus on functions with pandas dataframes, and just disk-based caching.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachetto-0.0.1.tar.gz (11.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachetto-0.0.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file cachetto-0.0.1.tar.gz.

File metadata

  • Download URL: cachetto-0.0.1.tar.gz
  • Upload date:
  • Size: 11.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for cachetto-0.0.1.tar.gz
Algorithm Hash digest
SHA256 db64d1508015a73eee4ddaf7f679304177908ce8ae3735713de56d05722d350e
MD5 f45157b3ad94280a24ecca5551f8b120
BLAKE2b-256 98baa5f78bf29df60b50b92fcfb4d646fa45fb9a7a2f7f4e34bc9dcc93166459

See more details on using hashes here.

File details

Details for the file cachetto-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: cachetto-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.5

File hashes

Hashes for cachetto-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d016c6a5da03e69bbffcaf1d7fac5b13e6f10e75790aaef6d5e3cfd606e087e
MD5 7fba6fb04e0f5bd026ee2b5703101259
BLAKE2b-256 7aa5fdde0acc46889ca241b5023cb5e3c233dc6fc80b41e0198a3da99e058b30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page