High-performance cross-process shared memory hashmap for Python multiprocessing
Project description
SharedHashMap
A high-performance, thread-safe and process-safe hashmap implementation for Python multiprocessing using shared memory and atomic operations.
Features
- Process-safe: Uses atomic operations from the
atomicspackage for lock-free synchronization - Shared memory: Built on Python's
multiprocessing.shared_memoryfor efficient cross-process data sharing - Optimized serialization: Avoids pickle overhead for common types (strings, bytes, integers, None)
- Dict-like interface: Familiar Python dictionary API
- Open addressing: Linear probing for collision resolution
- Fully tested: Comprehensive test suite including multiprocess stress tests
Installation
# Install dependencies
pip install atomics
# Or install the entire project
pip install .
Quick Start
from shared_hashmap import SharedHashMap
# Create a shared hashmap
with SharedHashMap(name="my_hashmap", capacity=1024, create=True) as shm:
# Set values
shm["key1"] = "value1"
shm["key2"] = 42
# Get values
print(shm["key1"]) # "value1"
print(shm.get("key2")) # 42
# Check existence
if "key1" in shm:
print("key1 exists!")
# Delete keys
del shm["key1"]
# Size
print(f"Hashmap size: {shm.size()}")
# Cleanup
shm.unlink() # Delete shared memory
Multiprocess Usage
Producer-Consumer Pattern
import multiprocessing as mp
from shared_hashmap import SharedHashMap
def producer(hashmap_name, producer_id, num_items):
# Attach to existing shared memory
shm = SharedHashMap(name=hashmap_name, create=False)
for i in range(num_items):
shm[f"item_{producer_id}_{i}"] = f"data from producer {producer_id}"
shm.close()
def consumer(hashmap_name, producer_id, num_items):
shm = SharedHashMap(name=hashmap_name, create=False)
for i in range(num_items):
value = shm.get(f"item_{producer_id}_{i}")
print(f"Consumed: {value}")
shm.close()
# Main process
if __name__ == "__main__":
hashmap_name = "producer_consumer_example"
# Create the shared hashmap
with SharedHashMap(name=hashmap_name, capacity=256, create=True) as shm:
# Start producer and consumer processes
p1 = mp.Process(target=producer, args=(hashmap_name, 0, 10))
p2 = mp.Process(target=consumer, args=(hashmap_name, 0, 10))
p1.start()
p2.start()
p1.join()
p2.join()
shm.unlink()
API Reference
Constructor
SharedHashMap(
name: str,
capacity: int = 1024,
max_key_size: int = 256,
max_value_size: int = 1024,
create: bool = True
)
Parameters:
name: Unique name for the shared memory blockcapacity: Number of buckets in the hashmapmax_key_size: Maximum size in bytes for serialized keysmax_value_size: Maximum size in bytes for serialized valuescreate: If True, create new shared memory; if False, attach to existing
Methods
set(key, value)
Set a key-value pair in the hashmap.
get(key, default=None)
Get a value from the hashmap. Returns default if key not found.
delete(key)
Delete a key from the hashmap. Returns True if deleted, False if key didn't exist.
size()
Return the number of key-value pairs in the hashmap.
close()
Close the shared memory handle (keeps shared memory alive for other processes).
unlink()
Delete the shared memory block (should be called by the last process using it).
Dict-like Operations
shm["key"] = "value" # Set
value = shm["key"] # Get (raises KeyError if not found)
del shm["key"] # Delete (raises KeyError if not found)
"key" in shm # Check existence
Serialization
SharedHashMap optimizes serialization for common types:
| Type | Serialization Method | Notes |
|---|---|---|
str |
UTF-8 encoding | No pickle overhead |
bytes |
Direct storage | No pickle overhead |
int |
ASCII encoding | No pickle overhead |
None |
Empty bytes | No pickle overhead |
| Other | pickle.dumps() |
Fallback for complex types |
Performance
SharedHashMap delivers exceptional performance for cross-process data sharing:
Key Metrics:
- String reads: ~2,600 ops/sec (382μs mean)
- String writes: ~1,200 ops/sec (826μs mean)
- Integer operations: ~6,000+ ops/sec
- Mixed workloads: ~1,170 ops/sec (854μs mean)
- Concurrent writers: Scales to multiple processes with minimal contention
Run benchmarks:
pytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v
Performance Considerations
- Capacity: Choose a capacity larger than your expected number of items to minimize collisions
- Max sizes: Set
max_key_sizeandmax_value_sizeappropriately for your data - Alignment: Buckets are automatically aligned to 8-byte boundaries for optimal atomic operations
- Serialization: Use strings, bytes, or integers when possible for best performance
Thread Safety
SharedHashMap uses atomic compare-and-swap operations to ensure thread safety:
- Multiple processes can safely read and write concurrently
- No locks or mutexes required
- Lock-free design for high concurrency
Limitations
- Fixed capacity: The hashmap size is fixed at creation time
- No iteration: Currently doesn't support iterating over keys/values
- No resizing: Cannot dynamically grow the hashmap
- Size limits: Keys and values must fit within configured max sizes
Examples
See examples/basic_usage.py for complete examples including:
- Basic operations
- Producer-consumer pattern
- Distributed computation
- Stress testing
Run the examples:
python examples/basic_usage.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters