Skip to main content

A TTL-based caching wrapper for Python requests library

Project description

reqcache

A simple, transparent caching wrapper for Python's requests library with TTL-based cache control. Perfect for data collection, web scraping, and reducing redundant HTTP requests during development.

Features

  • TTL-based caching - control caching with time-to-live values
  • Default caching enabled - caches for 1 day by default
  • Simple API - just use cache_ttl parameter
  • Smart cache keys based on URL, HTTP method, and request parameters
  • Thread-safe for concurrent usage
  • Complete response preservation using pickle serialization
  • Cache management utilities for inspection and cleanup

Installation

pip install -e .

Or install with development dependencies:

pip install -e ".[dev]"

Quick Start

import reqcache

# Cached request (default 1-day TTL)
response = reqcache.get('https://api.example.com/data')

# Custom TTL (1 hour = 3600 seconds)
response = reqcache.get('https://api.example.com/data', cache_ttl=3600)

# Disable caching
response = reqcache.get('https://api.example.com/data', cache_ttl=0)

# Permanent caching (never expires)
response = reqcache.get('https://api.example.com/data', cache_ttl=-1)

# Use TTL constants
response = reqcache.get('https://api.example.com/data', cache_ttl=reqcache.TTL_ONE_DAY)
response = reqcache.get('https://api.example.com/data', cache_ttl=reqcache.TTL_DISABLED)
response = reqcache.get('https://api.example.com/data', cache_ttl=reqcache.TTL_PERMANENT)

Usage Examples

Basic Caching

import reqcache

# First call makes network request and caches response (default 1 day)
response = reqcache.get('https://httpbin.org/uuid')
print(response.json())  # {'uuid': 'abc-123-def'}

# Second call returns cached response (no network request)
response = reqcache.get('https://httpbin.org/uuid')
print(response.json())  # {'uuid': 'abc-123-def'} - same UUID!

TTL Control

import reqcache

# Cache for 1 hour (3600 seconds)
response = reqcache.get('https://api.example.com/data', cache_ttl=3600)

# Cache for 5 minutes
response = reqcache.get('https://api.example.com/data', cache_ttl=300)

# Cache for 1 week
response = reqcache.get('https://api.example.com/data', cache_ttl=604800)

# Disable caching (no cache)
response = reqcache.get('https://api.example.com/data', cache_ttl=0)

# Permanent caching (never expires)
response = reqcache.get('https://api.example.com/data', cache_ttl=-1)

All HTTP Methods Supported

import reqcache

# All HTTP methods support TTL-based caching
reqcache.get('https://api.example.com/users')                    # Default 1-day cache
reqcache.post('https://api.example.com/users', json={'name': 'Alice'})  # Default 1-day cache
reqcache.put('https://api.example.com/users/1', json={'name': 'Bob'})    # Default 1-day cache
reqcache.delete('https://api.example.com/users/1')                     # Default 1-day cache
reqcache.patch('https://api.example.com/users/1', json={'active': False}) # Default 1-day cache
reqcache.head('https://api.example.com/users/1')                        # Default 1-day cache
reqcache.options('https://api.example.com/users/1')                     # Default 1-day cache

# Disable caching for any method
reqcache.get('https://api.example.com/users', cache_ttl=0)
reqcache.post('https://api.example.com/users', json={'name': 'Alice'}, cache_ttl=0)

Full requests Compatibility

All requests parameters work as expected:

import reqcache

response = reqcache.get(
    'https://api.example.com/data',
    cache_ttl=3600,  # 1 hour cache
    headers={'Authorization': 'Bearer token123'},
    params={'page': 1, 'limit': 10},
    timeout=30,
    verify=True,
)

Cache Management

import reqcache

# Get cache statistics
info = reqcache.get_cache_info()
print(f"Total cache files: {info['total_files']}")
print(f"Valid entries: {info['valid_entries']}")
print(f"Expired entries: {info['expired_entries']}")
print(f"Total size: {info['total_size_mb']} MB")

# Clear all cached responses
deleted = reqcache.clear_cache()
print(f"Deleted {deleted} cache files")

Migration Guide (pre-1.0.0 → v1.0.0)

Version 1.0.0 introduces breaking changes with a simplified TTL-only API.

Key Changes

  • Removed: cache boolean parameter
  • Removed: cache_dir parameter from request functions
  • Changed: Default behavior is now to cache for 1 day
  • Added: TTL constants (TTL_DISABLED, TTL_PERMANENT, TTL_ONE_DAY)

API Migration

Before (pre-1.0.0):

# No caching (default)
reqcache.get('https://api.example.com/data')

# Enable caching
reqcache.get('https://api.example.com/data', cache=True)

# Custom TTL
reqcache.get('https://api.example.com/data', cache=True, cache_ttl=3600)

# Custom cache directory
reqcache.get('https://api.example.com/data', cache=True, cache_dir='./my_cache')

After (v1.0.0):

# Default caching (1 day)
reqcache.get('https://api.example.com/data')

# Disable caching
reqcache.get('https://api.example.com/data', cache_ttl=0)

# Custom TTL
reqcache.get('https://api.example.com/data', cache_ttl=3600)

# Use constants
reqcache.get('https://api.example.com/data', cache_ttl=reqcache.TTL_DISABLED)
reqcache.get('https://api.example.com/data', cache_ttl=reqcache.TTL_PERMANENT)
reqcache.get('https://api.example.com/data', cache_ttl=reqcache.TTL_ONE_DAY)

TTL Values Reference

  • 0 or reqcache.TTL_DISABLED: No caching
  • -1 or reqcache.TTL_PERMANENT: Permanent caching (never expires)
  • 86400 or reqcache.TTL_ONE_DAY: 1 day (default)
  • Any positive integer: Cache for that many seconds

How It Works

Cache Key Generation

Cache keys are generated based on:

  • URL (normalized, without query string)
  • HTTP method (GET, POST, etc.)
  • Query parameters (sorted for consistency)
  • Request body (for POST/PUT/PATCH requests)

This ensures that:

# These produce the SAME cache key
reqcache.get('https://example.com?a=1&b=2', cache=True)
reqcache.get('https://example.com?b=2&a=1', cache=True)

# These produce DIFFERENT cache keys
reqcache.get('https://example.com', cache=True)
reqcache.post('https://example.com', cache=True)
reqcache.get('https://example.com?page=1', cache=True)
reqcache.get('https://example.com?page=2', cache=True)

Storage Format

Cached responses are stored in .cache/ directory with:

  • Filename: SHA256 hash of cache key (e.g., a3b2c1d4e5f6.pkl)
  • Format: Python pickle (preserves complete Response object)
  • Metadata: Timestamp and TTL stored with response

TTL Behavior

  • Default TTL: 24 hours (86400 seconds)
  • Custom TTL: Specify per-request with cache_ttl parameter
  • Expiration: Checked on cache read; expired entries treated as cache miss
  • No auto-cleanup: Expired files remain until manually cleared (they don't hurt)

Use Cases

Data Collection / Web Scraping

import reqcache

# Scrape data with automatic caching to avoid re-fetching
urls = [
    'https://example.com/page1',
    'https://example.com/page2',
    'https://example.com/page3',
]

for url in urls:
    response = reqcache.get(url, cache=True, cache_ttl=86400)
    # Process response...
    # If script crashes and restarts, already-fetched pages are cached!

Development / Testing

import reqcache

# During development, cache API responses to speed up testing
def fetch_user_data(user_id):
    response = reqcache.get(
        f'https://api.example.com/users/{user_id}',
        cache=True,
        cache_ttl=3600,  # Cache for 1 hour during dev
    )
    return response.json()

# First call: network request
user = fetch_user_data(123)

# Subsequent calls: instant response from cache
user = fetch_user_data(123)  # No network delay!

API Rate Limit Protection

import reqcache

# Cache responses to avoid hitting API rate limits
def get_expensive_api_data():
    return reqcache.get(
        'https://api.example.com/expensive-endpoint',
        cache=True,
        cache_ttl=3600,  # Re-fetch at most once per hour
        headers={'Authorization': f'Bearer {API_KEY}'},
    )

Configuration

Default Settings

DEFAULT_CACHE_DIR = ".cache"
DEFAULT_TTL = 86400  # 24 hours in seconds

Per-Request Configuration

import reqcache

response = reqcache.get(
    'https://api.example.com/data',
    cache=True,           # Enable caching (default: False)
    cache_ttl=3600,       # TTL in seconds (default: 86400)
    cache_dir='.mycache', # Cache directory (default: '.cache')
)

Important Notes

Security Warning

Pickle Security: This library uses Python's pickle module for serialization. Only use the cache in trusted environments where you control the cache directory. Never load cache files from untrusted sources.

Cache Behavior

  • Opt-in: Caching is disabled by default; you must explicitly pass cache=True
  • No HTTP semantics: Cache-Control, ETag, and other HTTP caching headers are ignored
  • Error responses cached: 4xx and 5xx responses are cached just like successful responses
  • Thread-safe: Safe for use in multi-threaded applications
  • No size limits: Cache can grow unbounded; monitor with get_cache_info()

When NOT to Use Caching

  • Production APIs: Use proper HTTP caching or Redis/Memcached instead
  • Authentication flows: Don't cache login/token requests
  • Real-time data: Don't cache if you need fresh data every time
  • Sensitive data: Be careful with caching responses containing secrets

Troubleshooting

Cache Not Working

# Make sure cache=True is set
response = reqcache.get('https://example.com', cache=True)  # ✓
response = reqcache.get('https://example.com')  # ✗ (no caching)

Cache Directory Not Created

The cache directory is created automatically on first cached request. If you see no .cache/ directory, verify that:

  1. You've made at least one request with cache=True
  2. You have write permissions in the working directory

Stale Data

If cached data is outdated:

# Option 1: Clear all cache
reqcache.clear_cache()

# Option 2: Use shorter TTL
reqcache.get('https://api.example.com/data', cache=True, cache_ttl=60)

# Option 3: Disable cache for specific request
reqcache.get('https://api.example.com/data', cache=False)

Cache Growing Too Large

Monitor cache size:

info = reqcache.get_cache_info()
print(f"Cache size: {info['total_size_mb']} MB")

if info['total_size_mb'] > 100:  # More than 100 MB
    reqcache.clear_cache()

Import Error

If you get ModuleNotFoundError: No module named 'reqcache':

# Install the package
pip install -e .

# Or install requests dependency manually
pip install requests

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage
pytest --cov=reqcache --cov-report=html

# Run specific test class
pytest tests/test_reqcache.py::TestCacheKeyGeneration

Project Structure

reqcache/
├── reqcache/
│   └── __init__.py       # Main implementation
├── tests/
│   ├── __init__.py
│   └── test_reqcache.py  # Test suite
├── pyproject.toml        # Package configuration
├── README.md             # This file
└── .gitignore

API Reference

Main Functions

  • reqcache.get(url, cache=False, cache_ttl=None, **kwargs) - GET request
  • reqcache.post(url, cache=False, cache_ttl=None, **kwargs) - POST request
  • reqcache.put(url, cache=False, cache_ttl=None, **kwargs) - PUT request
  • reqcache.delete(url, cache=False, cache_ttl=None, **kwargs) - DELETE request
  • reqcache.patch(url, cache=False, cache_ttl=None, **kwargs) - PATCH request
  • reqcache.head(url, cache=False, cache_ttl=None, **kwargs) - HEAD request
  • reqcache.options(url, cache=False, cache_ttl=None, **kwargs) - OPTIONS request
  • reqcache.request(method, url, cache=False, cache_ttl=None, **kwargs) - Generic request

Utility Functions

  • reqcache.clear_cache(cache_dir='.cache') - Delete all cache files
  • reqcache.get_cache_info(cache_dir='.cache') - Get cache statistics

Parameters

  • cache (bool): Enable caching (default: False)
  • cache_ttl (int): Time to live in seconds (default: 86400)
  • cache_dir (str): Cache directory path (default: '.cache')
  • **kwargs: All other parameters passed to requests.request()

License

MIT License

Contributing

Contributions welcome! Please feel free to submit issues or pull requests.

Changelog

0.1.0 (Initial Release)

  • Basic caching functionality
  • Support for all HTTP methods
  • Configurable TTL
  • Cache management utilities
  • Comprehensive test suite

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reqcache_py-1.0.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reqcache_py-1.0.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file reqcache_py-1.0.0.tar.gz.

File metadata

  • Download URL: reqcache_py-1.0.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for reqcache_py-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f05c2497eba6829cdd78df6d675e4e5dfc45c5de8f42460b31357282c24d780f
MD5 3be004f99aad9c606c9647012fac594f
BLAKE2b-256 19608341962f6794e18a4c91aeaaae26c038e28331f801f06af8d5089417504b

See more details on using hashes here.

File details

Details for the file reqcache_py-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: reqcache_py-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for reqcache_py-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 677660dc8e6df21be55b2db875b1b52a8f46e18d3ff61c879b1ba7f247da2311
MD5 9cf2ae62e8bf102cead1062c86c73ec1
BLAKE2b-256 e6c69935464ae768455895742a25f22c15378230201b73b66c1350f237cf2c11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page