Skip to main content

Persistent JSON caching for Python with async support - cache function results and object state effortlessly.

Project description

Cacherator

Persistent JSON caching for Python with async support - Cache function results and object state effortlessly.

Python 3.7+ License: MIT

Overview

Cacherator is a Python library that provides persistent JSON-based caching for class state and function results. It enables developers to cache expensive operations with minimal configuration, supporting both synchronous and asynchronous functions.

Key Features

  • Zero-configuration caching - Simple inheritance and decorator pattern
  • Async/await support - Native support for asynchronous functions
  • Persistent storage - Cache survives program restarts
  • TTL (Time-To-Live) - Automatic cache expiration
  • Selective caching - Fine-grained control over what gets cached
  • Cache management - Built-in methods for inspection and clearing
  • Cache status tracking - Per-call hit/miss detection
  • Flexible logging - Global and per-instance control
  • DynamoDB backend - Optional cloud cache for cross-machine sharing

Installation

pip install cacherator

Optional: DynamoDB Support

For cross-machine cache sharing via DynamoDB:

pip install boto3

Quick Start

Basic Function Caching

from cacherator import JSONCache, Cached
import time

class Calculator(JSONCache):
    def __init__(self):
        super().__init__(data_id="calc")
    
    @Cached()
    def expensive_calculation(self, x, y):
        time.sleep(2)  # Simulate expensive operation
        return x ** y

calc = Calculator()
result = calc.expensive_calculation(2, 10)  # Takes 2 seconds
result = calc.expensive_calculation(2, 10)  # Instant!

Async Function Caching

class APIClient(JSONCache):
    @Cached(ttl=1)  # Cache for 1 day
    async def fetch_user(self, user_id):
        # Expensive API call
        response = await api.get(f"/users/{user_id}")
        return response.json()

client = APIClient()
user = await client.fetch_user(123)  # API call
user = await client.fetch_user(123)  # Cached!

State Persistence

class GameState(JSONCache):
    def __init__(self, game_id):
        super().__init__(data_id=f"game_{game_id}")
        if not hasattr(self, "score"):
            self.score = 0
            self.level = 1
    
    def add_points(self, points):
        self.score += points
        self.json_cache_save()

# Session 1
game = GameState("player1")
game.add_points(100)

# Session 2 (after restart)
game = GameState("player1")
print(game.score)  # 100 - persisted!

Advanced Usage

DynamoDB Backend (Cross-Machine Cache Sharing)

Pass dynamodb_table to use DynamoDB as the cache backend instead of local JSON:

from cacherator import JSONCache, Cached

class WebScraper(JSONCache):
    def __init__(self):
        super().__init__(dynamodb_table='my-cache-table')
    
    @Cached(ttl=7)
    def scrape_expensive_data(self, url):
        return fetch_data(url)

# On machine 1 (laptop)
scraper = WebScraper()
data = scraper.scrape_expensive_data("https://example.com")  # Scrapes and caches

# On machine 2 (EC2 instance) - same code
scraper = WebScraper()
data = scraper.scrape_expensive_data("https://example.com")  # Uses cached data!

How it works:

  • No table specified: saves/loads from local JSON file
  • Table specified: saves/loads from DynamoDB only — no local file used
  • Compression: Payloads over 100KB are automatically gzip-compressed before writing to DynamoDB, reducing typical HTML payloads by 80-90%. A warning is logged if the compressed payload still exceeds DynamoDB's 400KB item limit.

DynamoDB table:

  • Auto-created if missing (requires IAM permissions)
  • Partition key: cache_id (String)
  • TTL enabled for automatic expiry
  • Pay-per-request billing mode

AWS credentials via standard boto3 chain:

  • Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
  • IAM role (recommended for EC2/Lambda)
  • AWS credentials file (~/.aws/credentials)

Custom TTL Configuration

class WeatherService(JSONCache):
    @Cached(ttl=0.25)  # 6 hours (0.25 days)
    def get_forecast(self, city):
        return fetch_weather(city)
    
    @Cached(ttl=30)  # 30 days
    def get_historical(self, city, year):
        return fetch_historical(city, year)

Excluding Variables from Cache

class DataProcessor(JSONCache):
    def __init__(self):
        self._excluded_cache_vars = ["temp_data", "api_key"]
        super().__init__()
        self.results = {}
        self.temp_data = []  # Won't be cached
        self.api_key = "secret"  # Won't be cached

Cache Management

processor = DataProcessor()

# Get cache statistics
stats = processor.json_cache_stats()
print(stats)
# {'total_entries': 5, 'functions': {'process': 3, 'analyze': 2}}

# Clear specific function cache
processor.json_cache_clear("process")

# Clear all cache
processor.json_cache_clear()

Cache Status Tracking

Detect whether a @Cached method returned cached data or executed the function:

class DataService(JSONCache):
    def __init__(self):
        super().__init__(data_id="my-service", ttl=7)

    @Cached(ttl=7)
    def fetch(self, key: str) -> str:
        return expensive_operation(key)

svc = DataService()

# last_cache_status is None before any call
print(svc.last_cache_status)  # None

svc.fetch("foo")
print(svc.last_cache_status)  # "miss" (first run) or "hit" (subsequent runs)

# Full per-call history keyed by function signature
print(svc.cache_status)
# {"fetch('foo',){}": "hit"}

Status values:

  • "hit" — returned from cache
  • "miss" — function was executed (no valid cache entry)
  • None — no @Cached method has been called yet

cache_status is populated on init for all keys loaded from cache, and updated on every @Cached call. It is cleared when json_cache_clear() is called.

Logging Control

from cacherator import JSONCache

# Disable logging globally
JSONCache.set_logging(False)

# Enable logging globally (default)
JSONCache.set_logging(True)

# Per-instance control
processor = DataProcessor(logging=False)  # Silent mode

When logging is enabled:

  • DynamoDB operations are logged (table creation, reads, writes)
  • Local JSON operations are silent (fast, not interesting)

When logging is disabled:

  • All operations are silent

## Configuration

### JSONCache Constructor

```python
JSONCache(
    data_id="unique_id",      # Unique identifier (default: class name)
    directory="cache",         # Cache directory (default: "data/cache"), ignored if dynamodb_table set
    clear_cache=False,         # Clear existing cache on init
    ttl=999,                   # Default TTL in days
    logging=True,              # Enable logging (True/False)
    dynamodb_table=None,       # DynamoDB table name — uses DynamoDB instead of local JSON if set
)

@Cached Decorator

@Cached(
    ttl=7,                     # Time-to-live in days (default: class ttl)
    clear_cache=False          # Clear cache for this function
)

Use Cases

API Client with Caching

class GitHubClient(JSONCache):
    def __init__(self):
        super().__init__(data_id="github_client", ttl=1)
    
    @Cached(ttl=0.5)  # 12 hours
    async def get_user(self, username):
        async with aiohttp.ClientSession() as session:
            async with session.get(f"https://api.github.com/users/{username}") as resp:
                return await resp.json()
    
    @Cached(ttl=7)  # 1 week
    async def get_repos(self, username):
        async with aiohttp.ClientSession() as session:
            async with session.get(f"https://api.github.com/users/{username}/repos") as resp:
                return await resp.json()

Database Query Caching

class UserRepository(JSONCache):
    def __init__(self):
        super().__init__(data_id="user_repo", ttl=0.1)  # 2.4 hours
    
    @Cached()
    def get_user_by_id(self, user_id):
        return db.query("SELECT * FROM users WHERE id = ?", user_id)
    
    @Cached(ttl=1)
    def get_user_stats(self, user_id):
        return db.query("SELECT COUNT(*) FROM posts WHERE user_id = ?", user_id)

Machine Learning Model Predictions

class ModelPredictor(JSONCache):
    def __init__(self):
        super().__init__(data_id="ml_predictor")
        self.model = load_model()
    
    @Cached(ttl=30)
    def predict(self, features_hash, features):
        # Cache predictions by feature hash
        return self.model.predict(features)

Best Practices

Recommended Use Cases

  • Expensive API calls and network requests
  • Database queries with relatively static data
  • Heavy computational operations
  • Machine learning model predictions
  • Data transformations and aggregations

When to Use TTL

  • Set short TTL (minutes to hours) for frequently changing data
  • Set long TTL (days to weeks) for stable reference data
  • Consider data freshness requirements for your application

What Not to Cache

  • Non-deterministic functions (random number generation, timestamps)
  • Very fast operations (overhead exceeds benefit)
  • Non-JSON-serializable objects without custom handling
  • Real-time data without appropriate TTL configuration

Performance

Cacherator introduces minimal overhead:

  • Cache hit: ~0.1ms
  • Cache miss: Function execution time + ~1ms
  • Disk I/O: Non-blocking, asynchronous operations

Performance Improvements

  • API calls (100ms - 5s) reduced to ~0.1ms
  • Database queries (10ms - 1s) reduced to ~0.1ms
  • Heavy computations (1s+) reduced to ~0.1ms

Compatibility

  • Python: 3.7 and above
  • Async: Full support for async/await syntax
  • Operating Systems: Windows, macOS, Linux
  • Data Types: All JSON-serializable types plus datetime objects
  • Optional Dependencies: boto3 (for DynamoDB backend), dynamorator

Changelog

Version 1.3.0

  • Changed: Simplified to single-backend model — local JSON (default) or DynamoDB (when dynamodb_table is set), never both simultaneously
  • Changed: cache_status values simplified to "hit" / "miss" (removed "l1" / "l2")
  • Removed: save_on_del parameter — no longer needed
  • Removed: L1→L2 backfill on load
  • Removed: Deprecated json_cache_save_db() method

Version 1.2.6

  • Added: cache_status dict — per-function-signature hit/miss tracking with L1/L2 source, populated on init and updated on every @Cached call
  • Added: last_cache_status — status of the most recent @Cached call ("l1", "l2", "miss", or None)
  • Changed: json_cache_clear() now also clears cache_status entries

Version 1.2.5

  • Fixed: L1 cache hits now automatically backfill L2 (DynamoDB) when enabled
  • Fixed: Removed misleading json_cache_save_db branch in @Cached decorator — json_cache_save() is always used, which handles both L1 and L2

Version 1.2.4

  • Added: save_on_del parameter (default False) — __del__ no longer writes to DynamoDB unless opted in, eliminating unnecessary writes on program exit
  • Changed: __del__ always saves to local JSON (L1); DynamoDB (L2) write requires explicit json_cache_save() or save_on_del=True
  • Removed: Unreliable dirty-check on json_cache_save() — saves are now always performed when called

Version 1.2.3

  • Added: Automatic gzip compression for DynamoDB payloads exceeding 100KB
  • Added: Warning logged when compressed payload still exceeds DynamoDB's 400KB limit
  • Added: Compression is transparent — no API changes required

Version 1.2.2

  • Fixed: json_cache_save() now automatically syncs to DynamoDB (L2) when enabled
  • Deprecated: json_cache_save_db() is now redundant (use json_cache_save() instead)

Version 1.2.0

  • Added: Optional DynamoDB backend for cross-machine cache sharing via dynamorator
  • Added: Two-layer cache architecture (L1: local JSON, L2: DynamoDB)
  • Added: Constructor parameter dynamodb_table for enabling DynamoDB
  • Added: Automatic DynamoDB table creation with TTL support
  • Changed: DynamoDB backend now uses dynamorator package
  • Changed: Simplified logging to boolean (True/False)
  • Removed: Environment variable configuration (use constructor parameter)
  • Removed: LogLevel enum (simplified to boolean)

Troubleshooting

Cache Not Persisting

# Explicitly save cache
obj.json_cache_save()

# Check for serialization errors
obj._excluded_cache_vars = ["problematic_attr"]

Cache Not Being Used

# Verify TTL hasn't expired
obj = MyClass(ttl=30)  # Increase TTL

# Ensure arguments are identical (type matters)
obj.func(1, 2)    # Different from
obj.func(1.0, 2)  # (int vs float)

Large Cache Files

# Exclude large attributes
self._excluded_cache_vars = ["large_data"]

# Use separate cache instances
processor1 = DataProcessor(data_id="dataset1")
processor2 = DataProcessor(data_id="dataset2")

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE file for details.

Resources


Developed by Arved Klöhn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cacherator-1.2.10.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cacherator-1.2.10-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file cacherator-1.2.10.tar.gz.

File metadata

  • Download URL: cacherator-1.2.10.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cacherator-1.2.10.tar.gz
Algorithm Hash digest
SHA256 6a4bf5e69be79b90591b3a7348ef1909bd2638a9562a634266a9ec5cd5dea51c
MD5 a07816e2ea43ab54dcb8066c8929c532
BLAKE2b-256 5d7f1b6d5a8ef3892682dbd906ce67fbd6cf6658e4f13bb46330a882e3d7cbab

See more details on using hashes here.

File details

Details for the file cacherator-1.2.10-py3-none-any.whl.

File metadata

  • Download URL: cacherator-1.2.10-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for cacherator-1.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 804bd7eecea16e220b817f39541d50f724fd9c673ef83efd04b8775c04f4ee08
MD5 96b1d114d3f9952cc04ca63b95c64da8
BLAKE2b-256 82188e7f79aeb64871415daddc3be62406097bb6878a5ff18eff2e33140bf042

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page