A predictive caching framework that dynamically pre-loads data items with minimal latency
Project description
Dynamic Prefetching Cache for Python
A Python library for memory-efficient file reading through speculative precaching. Instead of loading entire datasets into memory, this framework uses a user defined predictive function to anticipate data access patterns and proactively read and cache the most likely needed next items.
Use Cases
This library is designed for scenarios where you need to process large files or datasets sequentially or with predictable access patterns, but cannot or do not want to load everything into memory at once.
Primary use case: Video frame analysis and MOT (Multiple Object Tracking) data processing, where users typically navigate through frames sequentially but may jump to specific positions. The library includes optimized providers and predictors for this scenario.
Other applications: Any situation where you can predict future data access patterns - time series analysis, log file processing, document processing pipelines, or any sequential data where memory usage needs to be controlled. The library is designed to be flexible and can be used in a wide range of scenarios.
How It Works
Rather than reactive caching (loading data only after it's requested), this system implements speculative precaching:
- Predict: Uses a user defined predictive function to identify the most likely next items
- Prefetch: Loads predicted data in background thread before it's needed
- Serve: Returns cached data when requested (if prediction was correct) or loads synchronously as fallback
- Manage: Automatically evicts old data to maintain memory limits
Quick Start
from dynamic_prefetching_cache.predictors import DynamicDataPredictor
from dynamic_prefetching_cache.providers import MOTDataProvider
from dynamic_prefetching_cache.cache import DynamicPrefetchingCache
provider = MOTDataProvider("examples/data/large_data.txt") # Note: Use generate_large_mot_data.py to generate data
predictor = DynamicDataPredictor(possible_jumps=[-5, -1, 1, 5, 15])
# Create cache with automatic resource management
with DynamicPrefetchingCache(provider, predictor, max_keys_cached=512) as cache:
for key in range(100):
data = cache.get(key) # Returns immediately if prefetched, else loads synchronously
print(data)
# Monitor performance
stats = cache.stats()
print(f"Hit rate: {stats['hits'] / (stats['hits'] + stats['misses']):.2%}")
Core Protocols
DataProvider Protocol
Implement this interface to connect your data source:
class MyDataProvider:
def load(self, key: int) -> Any:
"""Load data for the given key. Must be thread-safe."""
# May be reading line(s) from a file or database
# May be parsing a line from a text file
# See src/dynamic_prefetching_cache/providers.py for an example
return fetch_from_file_or_database(key)
def get_available_frames(self) -> set[int]:
"""Return set of valid keys."""
return {1, 2, 3, 4, 5}
def get_total_frames(self) -> int:
"""Return total number of available keys."""
return 5
def get_stats(self) -> dict:
"""Return provider statistics."""
return {"status": "ok"}
AccessPredictor Protocol
Implement this interface to define prediction logic:
class MyAccessPredictor:
def get_likelihoods(self, current_key: int, history: list[int]) -> dict[int, float]:
"""Return likelihood scores for potential next keys."""
return {
current_key + 1: 0.8, # High likelihood
current_key + 2: 0.3, # Medium likelihood
current_key - 1: 0.1, # Low likelihood
}
Built-in Components
Access Predictors
The library includes several ready-to-use predictors:
DistanceDecayPredictor: Simple distance-based prediction with configurable decay ratesDynamicDistanceDecayPredictor: Forward-biased predictor optimized for media playbackDynamicDataPredictor: Advanced predictor with jump detection and history analysis
from dynamic_prefetching_cache.predictors import DynamicDataPredictor
# Optimized for video/media navigation patterns
predictor = DynamicDataPredictor(
possible_jumps=[-15, -5, -1, 1, 5, 15, 30], # Common seek distances
forward_bias=2.0, # Favor forward progression
jump_boost=5.0, # Boost exact jump targets
proximity_boost=2.0 # Boost areas near jump targets
)
MOT Data Provider
High-performance provider for MOT (Multiple Object Tracking) data files:
from dynamic_prefetching_cache.providers import MOTDataProvider
# Optimized for MOT format files with built-in indexing and caching
provider = MOTDataProvider('data/tracking_results.txt', cache_size=100)
# Includes comprehensive statistics
stats = provider.get_stats()
print(f"Provider cache hit rate: {stats['cache_hit_rate']:.2%}")
Configuration
cache = DynamicPrefetchingCache(
provider=my_provider,
predictor=my_predictor,
max_keys_cached=1000, # Maximum items in cache
max_keys_prefetched=8, # Max concurrent prefetch tasks
history_size=30, # Access history for prediction
eviction_policy=EvictionPolicyOldest, # Cache eviction strategy
on_event=my_event_handler # Optional event monitoring
)
Event Monitoring
Monitor cache operations for debugging and optimization:
def handle_cache_events(event_name: str, **kwargs):
if event_name == 'prefetch_error':
logger.warning(f"Prefetch failed for key {kwargs['key']}: {kwargs['error']}")
elif event_name == 'cache_evict':
logger.debug(f"Evicted key {kwargs['key']} from cache")
cache = DynamicPrefetchingCache(provider, predictor, on_event=handle_cache_events)
Available events: cache_load_start/complete/error, prefetch_start/success/error, cache_evict, worker_error
Performance Monitoring
stats = cache.stats()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Hit rate: {stats['hits'] / (stats['hits'] + stats['misses']):.2%}")
print(f"Active prefetch tasks: {stats['active_prefetch_tasks']}")
Thread Safety
get()method is thread-safe for concurrent access- Background worker thread handles all prefetching operations
close()method ensures clean resource cleanup- All internal state is properly synchronized
Examples
Basic Usage Example
from dynamic_prefetching_cache.cache import DynamicPrefetchingCache
from dynamic_prefetching_cache.predictors import DynamicDataPredictor
from dynamic_prefetching_cache.providers import MOTDataProvider
# Set up for video frame analysis
provider = MOTDataProvider('examples/data/example_data.txt')
predictor = DynamicDataPredictor(possible_jumps=[-5, -1, 1, 5, 15])
with DynamicPrefetchingCache(provider, predictor, max_keys_cached=200) as cache:
for frame_id in range(100):
detections = cache.get(frame_id)
print(f"Frame {frame_id}: {len(detections.detections)} objects detected")
Visual Interactive Demo
python examples/visable_example.py
Performance Profiling
python examples/profile_example.py
Installation
# Install from PyPI (when published)
pip install dynamic-prefetching-cache
# Or install from source
git clone <repository-url>
cd dynamic-prefetching-cache
pip install -e .
# Install with development dependencies
pip install -e ".[dev]"
# Install with example dependencies (for GUI demo)
pip install -e ".[examples]"
Getting Started
- Implement DataProvider - Connect to your data source
- Choose or implement AccessPredictor - Define prediction logic for your use case
- Configure cache parameters - Set memory limits and prefetch behavior
- Use
cache.get(key)- The system handles prefetching automatically
The library abstracts away the complexity of memory management, concurrent prefetching, and prediction logic, allowing you to focus on your core data processing tasks.
Generating Test Data
The repository includes a script to generate realistic MOT (Multiple Object Tracking) format test data for testing and development. This eliminates the need to download or upload large data files.
# Generate small test file (100 tracks, 1000 frames, ~1MB)
python scripts/generate_large_mot_data.py -o examples/data/test_data.txt -t 100 -f 1000
# Generate medium test file (500 tracks, 10000 frames, ~50MB)
python scripts/generate_large_mot_data.py -o examples/data/medium_data.txt -t 500 -f 10000
# Generate large test file (1000 tracks, 100000 frames, ~500MB)
python scripts/generate_large_mot_data.py -o examples/data/large_data.txt -t 1000 -f 100000
# Custom data generation with full options
python scripts/generate_large_mot_data.py \
--output examples/data/custom_data.txt \
--tracks 200 \ # Number of object tracks
--frames 5000 \ # Number of frames
--width 1920 \ # Image width (pixels)
--height 1080 \ # Image height (pixels)
--min-track-length 10 \ # Minimum track duration
--max-track-length 200 \ # Maximum track duration
--seed 42 # Random seed for reproducibility
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dynamic_prefetching_cache-0.1.1.tar.gz.
File metadata
- Download URL: dynamic_prefetching_cache-0.1.1.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8837cd0dd0355b8cbf424c04b3708efb367f1d69255e70c91e58516fd457268a
|
|
| MD5 |
8dad053fc819da964a163c82c38b64ac
|
|
| BLAKE2b-256 |
3cdcb1b4e856b75892bfeb9dd52234dadfcf5ce278d967c0d4c733d5576bf211
|
File details
Details for the file dynamic_prefetching_cache-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dynamic_prefetching_cache-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd063add626370691f73844f4c6ea8e268897a7ebad66debd0eeafb3bb14cab8
|
|
| MD5 |
0c5cc689c1686576fb0fae86a69cad18
|
|
| BLAKE2b-256 |
9e7a54eb0393e49338c0cc48b08629c0b9fd32a0d301f2916f2082ed5e803946
|