Persistent JSON caching for Python with async support - cache function results and object state effortlessly.
Project description
Cacherator
Persistent JSON caching for Python with async support - Cache function results and object state effortlessly.
Overview
Cacherator is a Python library that provides persistent JSON-based caching for class state and function results. It enables developers to cache expensive operations with minimal configuration, supporting both synchronous and asynchronous functions.
Key Features
- Zero-configuration caching - Simple inheritance and decorator pattern
- Async/await support - Native support for asynchronous functions
- Persistent storage - Cache survives program restarts
- TTL (Time-To-Live) - Automatic cache expiration
- Selective caching - Fine-grained control over what gets cached
- Cache management - Built-in methods for inspection and clearing
- Cache status tracking - Per-call hit/miss detection with L1/L2 source
- Flexible logging - Global and per-instance control
- DynamoDB backend - Optional L2 cache for cross-machine sharing
Installation
pip install cacherator
Optional: DynamoDB Support
For cross-machine cache sharing via DynamoDB:
pip install boto3
Quick Start
Basic Function Caching
from cacherator import JSONCache, Cached
import time
class Calculator(JSONCache):
def __init__(self):
super().__init__(data_id="calc")
@Cached()
def expensive_calculation(self, x, y):
time.sleep(2) # Simulate expensive operation
return x ** y
calc = Calculator()
result = calc.expensive_calculation(2, 10) # Takes 2 seconds
result = calc.expensive_calculation(2, 10) # Instant!
Async Function Caching
class APIClient(JSONCache):
@Cached(ttl=1) # Cache for 1 day
async def fetch_user(self, user_id):
# Expensive API call
response = await api.get(f"/users/{user_id}")
return response.json()
client = APIClient()
user = await client.fetch_user(123) # API call
user = await client.fetch_user(123) # Cached!
State Persistence
class GameState(JSONCache):
def __init__(self, game_id):
super().__init__(data_id=f"game_{game_id}")
if not hasattr(self, "score"):
self.score = 0
self.level = 1
def add_points(self, points):
self.score += points
self.json_cache_save()
# Session 1
game = GameState("player1")
game.add_points(100)
# Session 2 (after restart)
game = GameState("player1")
print(game.score) # 100 - persisted!
Advanced Usage
DynamoDB Backend (Cross-Machine Cache Sharing)
Enable optional DynamoDB L2 cache for sharing cache across multiple machines:
from cacherator import JSONCache, Cached
class WebScraper(JSONCache):
def __init__(self):
super().__init__(dynamodb_table='my-cache-table')
@Cached(ttl=7)
def scrape_expensive_data(self, url):
# Expensive operation
return fetch_data(url)
# On machine 1 (laptop)
scraper = WebScraper()
data = scraper.scrape_expensive_data("https://example.com") # Scrapes and caches
# On machine 2 (EC2 instance) - same code
scraper = WebScraper()
data = scraper.scrape_expensive_data("https://example.com") # Uses cached data!
How it works:
- L1 (local JSON): Checked first for instant access
- L2 (DynamoDB): Checked on L1 miss, then written to L1; L1 hits are automatically backfilled to L2
- Writes: Saved to both L1 and L2 simultaneously
- No table specified: Works as local-only cache
- Compression: Payloads over 100KB are automatically gzip-compressed before writing to DynamoDB, reducing typical HTML payloads by 80-90%. A warning is logged if the compressed payload still exceeds DynamoDB's 400KB item limit.
- save_on_del: By default,
__del__only saves to local JSON (L1). Setsave_on_del=Trueto also write to DynamoDB on object destruction. Usejson_cache_save()for explicit L1+L2 saves.
DynamoDB table:
- Auto-created if missing (requires IAM permissions)
- Partition key:
cache_id(String) - TTL enabled for automatic expiry
- Pay-per-request billing mode
AWS credentials via standard boto3 chain:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION - IAM role (recommended for EC2/Lambda)
- AWS credentials file (
~/.aws/credentials)
Custom TTL Configuration
class WeatherService(JSONCache):
@Cached(ttl=0.25) # 6 hours (0.25 days)
def get_forecast(self, city):
return fetch_weather(city)
@Cached(ttl=30) # 30 days
def get_historical(self, city, year):
return fetch_historical(city, year)
Excluding Variables from Cache
class DataProcessor(JSONCache):
def __init__(self):
self._excluded_cache_vars = ["temp_data", "api_key"]
super().__init__()
self.results = {}
self.temp_data = [] # Won't be cached
self.api_key = "secret" # Won't be cached
Cache Management
processor = DataProcessor()
# Get cache statistics
stats = processor.json_cache_stats()
print(stats)
# {'total_entries': 5, 'functions': {'process': 3, 'analyze': 2}}
# Clear specific function cache
processor.json_cache_clear("process")
# Clear all cache
processor.json_cache_clear()
Cache Status Tracking
Detect whether a @Cached method returned cached data or executed the function, and which cache layer (L1/L2) was used:
class DataService(JSONCache):
def __init__(self):
super().__init__(data_id="my-service", ttl=7)
@Cached(ttl=7)
def fetch(self, key: str) -> str:
return expensive_operation(key)
svc = DataService()
# last_cache_status is None before any call
print(svc.last_cache_status) # None
svc.fetch("foo")
print(svc.last_cache_status) # "miss" (first run) or "l1" / "l2" (subsequent runs)
# Full per-call history keyed by function signature
print(svc.cache_status)
# {"fetch('foo',){}": "l1"}
Status values:
"l1"— returned from local JSON cache"l2"— returned from DynamoDB cache"miss"— function was executed (no valid cache entry)None— no@Cachedmethod has been called yet
cache_status is populated on init for all keys loaded from cache, and updated on every @Cached call. It is cleared when json_cache_clear() is called.
Logging Control
from cacherator import JSONCache
# Disable logging globally
JSONCache.set_logging(False)
# Enable logging globally (default)
JSONCache.set_logging(True)
# Per-instance control
processor = DataProcessor(logging=False) # Silent mode
When logging is enabled:
- DynamoDB operations are logged (table creation, reads, writes)
- Local JSON operations are silent (fast, not interesting)
When logging is disabled:
- All operations are silent
## Configuration
### JSONCache Constructor
```python
JSONCache(
data_id="unique_id", # Unique identifier (default: class name)
directory="cache", # Cache directory (default: "data/cache")
clear_cache=False, # Clear existing cache on init
ttl=999, # Default TTL in days
logging=True, # Enable logging (True/False)
dynamodb_table=None, # DynamoDB table name (optional)
save_on_del=False # Write to DynamoDB on __del__ (default: False)
)
@Cached Decorator
@Cached(
ttl=7, # Time-to-live in days (default: class ttl)
clear_cache=False # Clear cache for this function
)
Use Cases
API Client with Caching
class GitHubClient(JSONCache):
def __init__(self):
super().__init__(data_id="github_client", ttl=1)
@Cached(ttl=0.5) # 12 hours
async def get_user(self, username):
async with aiohttp.ClientSession() as session:
async with session.get(f"https://api.github.com/users/{username}") as resp:
return await resp.json()
@Cached(ttl=7) # 1 week
async def get_repos(self, username):
async with aiohttp.ClientSession() as session:
async with session.get(f"https://api.github.com/users/{username}/repos") as resp:
return await resp.json()
Database Query Caching
class UserRepository(JSONCache):
def __init__(self):
super().__init__(data_id="user_repo", ttl=0.1) # 2.4 hours
@Cached()
def get_user_by_id(self, user_id):
return db.query("SELECT * FROM users WHERE id = ?", user_id)
@Cached(ttl=1)
def get_user_stats(self, user_id):
return db.query("SELECT COUNT(*) FROM posts WHERE user_id = ?", user_id)
Machine Learning Model Predictions
class ModelPredictor(JSONCache):
def __init__(self):
super().__init__(data_id="ml_predictor")
self.model = load_model()
@Cached(ttl=30)
def predict(self, features_hash, features):
# Cache predictions by feature hash
return self.model.predict(features)
Best Practices
Recommended Use Cases
- Expensive API calls and network requests
- Database queries with relatively static data
- Heavy computational operations
- Machine learning model predictions
- Data transformations and aggregations
When to Use TTL
- Set short TTL (minutes to hours) for frequently changing data
- Set long TTL (days to weeks) for stable reference data
- Consider data freshness requirements for your application
What Not to Cache
- Non-deterministic functions (random number generation, timestamps)
- Very fast operations (overhead exceeds benefit)
- Non-JSON-serializable objects without custom handling
- Real-time data without appropriate TTL configuration
Performance
Cacherator introduces minimal overhead:
- Cache hit: ~0.1ms
- Cache miss: Function execution time + ~1ms
- Disk I/O: Non-blocking, asynchronous operations
Performance Improvements
- API calls (100ms - 5s) reduced to ~0.1ms
- Database queries (10ms - 1s) reduced to ~0.1ms
- Heavy computations (1s+) reduced to ~0.1ms
Compatibility
- Python: 3.7 and above
- Async: Full support for async/await syntax
- Operating Systems: Windows, macOS, Linux
- Data Types: All JSON-serializable types plus datetime objects
- Optional Dependencies: boto3 (for DynamoDB backend), dynamorator
Changelog
Version 1.2.6
- Added:
cache_statusdict — per-function-signature hit/miss tracking with L1/L2 source, populated on init and updated on every@Cachedcall - Added:
last_cache_status— status of the most recent@Cachedcall ("l1","l2","miss", orNone) - Changed:
json_cache_clear()now also clearscache_statusentries
Version 1.2.5
- Fixed: L1 cache hits now automatically backfill L2 (DynamoDB) when enabled
- Fixed: Removed misleading
json_cache_save_dbbranch in@Cacheddecorator —json_cache_save()is always used, which handles both L1 and L2
Version 1.2.4
- Added:
save_on_delparameter (defaultFalse) —__del__no longer writes to DynamoDB unless opted in, eliminating unnecessary writes on program exit - Changed:
__del__always saves to local JSON (L1); DynamoDB (L2) write requires explicitjson_cache_save()orsave_on_del=True - Removed: Unreliable dirty-check on
json_cache_save()— saves are now always performed when called
Version 1.2.3
- Added: Automatic gzip compression for DynamoDB payloads exceeding 100KB
- Added: Warning logged when compressed payload still exceeds DynamoDB's 400KB limit
- Added: Compression is transparent — no API changes required
Version 1.2.2
- Fixed:
json_cache_save()now automatically syncs to DynamoDB (L2) when enabled - Deprecated:
json_cache_save_db()is now redundant (usejson_cache_save()instead)
Version 1.2.0
- Added: Optional DynamoDB backend for cross-machine cache sharing via dynamorator
- Added: Two-layer cache architecture (L1: local JSON, L2: DynamoDB)
- Added: Constructor parameter
dynamodb_tablefor enabling DynamoDB - Added: Automatic DynamoDB table creation with TTL support
- Changed: DynamoDB backend now uses dynamorator package
- Changed: Simplified logging to boolean (True/False)
- Removed: Environment variable configuration (use constructor parameter)
- Removed: LogLevel enum (simplified to boolean)
Troubleshooting
Cache Not Persisting
# Explicitly save cache
obj.json_cache_save()
# Check for serialization errors
obj._excluded_cache_vars = ["problematic_attr"]
Cache Not Being Used
# Verify TTL hasn't expired
obj = MyClass(ttl=30) # Increase TTL
# Ensure arguments are identical (type matters)
obj.func(1, 2) # Different from
obj.func(1.0, 2) # (int vs float)
Large Cache Files
# Exclude large attributes
self._excluded_cache_vars = ["large_data"]
# Use separate cache instances
processor1 = DataProcessor(data_id="dataset1")
processor2 = DataProcessor(data_id="dataset2")
Contributing
Contributions are welcome. Please see CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE file for details.
Resources
- GitHub Repository: https://github.com/Redundando/cacherator
- Issue Tracker: https://github.com/Redundando/cacherator/issues
- PyPI Package: https://pypi.org/project/cacherator/
Developed by Arved Klöhn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cacherator-1.2.8.tar.gz.
File metadata
- Download URL: cacherator-1.2.8.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e3465a112c792dcd89372e8f20b0a9f114e0be5a5c2312a63a6b2fd5ce67ec4
|
|
| MD5 |
c5292a7bdb6834dcb53ddf3c81291c76
|
|
| BLAKE2b-256 |
38f76eee9a031e6c39723dcd9d6a3db5a344c06a00cad72e2fb01d03fda984d7
|
File details
Details for the file cacherator-1.2.8-py3-none-any.whl.
File metadata
- Download URL: cacherator-1.2.8-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a01840b1930426fd8ce027f6ddb3ed58df58fec277a92171da1e0db211a1e286
|
|
| MD5 |
8d95c6812c5b903434a5c9f36b3d2bd3
|
|
| BLAKE2b-256 |
2967bd870d8a985cae47bdc882ceaf53216eeeb678c2a4b50f1c8f2dc875ed05
|