Skip to main content

High-performance cross-process shared memory hashmap for Python multiprocessing

Project description

SharedHashMap

A high-performance, thread-safe and process-safe hashmap implementation for Python multiprocessing using shared memory and atomic operations.

Features

  • Process-safe: Uses atomic operations from the atomics package for lock-free synchronization
  • Shared memory: Built on Python's multiprocessing.shared_memory for efficient cross-process data sharing
  • Optimized serialization: Avoids pickle overhead for common types (strings, bytes, integers, None)
  • Dict-like interface: Familiar Python dictionary API
  • Open addressing: Linear probing for collision resolution
  • Fully tested: Comprehensive test suite including multiprocess stress tests

Installation

# Install dependencies
pip install atomics

# Or install the entire project
pip install .

Quick Start

from shared_hashmap import SharedHashMap

# Create a shared hashmap
with SharedHashMap(name="my_hashmap", capacity=1024, create=True) as shm:
    # Set values
    shm["key1"] = "value1"
    shm["key2"] = 42

    # Get values
    print(shm["key1"])  # "value1"
    print(shm.get("key2"))  # 42

    # Check existence
    if "key1" in shm:
        print("key1 exists!")

    # Delete keys
    del shm["key1"]

    # Size
    print(f"Hashmap size: {shm.size()}")

    # Cleanup
    shm.unlink()  # Delete shared memory

Multiprocess Usage

Producer-Consumer Pattern

import multiprocessing as mp
from shared_hashmap import SharedHashMap

def producer(hashmap_name, producer_id, num_items):
    # Attach to existing shared memory
    shm = SharedHashMap(name=hashmap_name, create=False)

    for i in range(num_items):
        shm[f"item_{producer_id}_{i}"] = f"data from producer {producer_id}"

    shm.close()

def consumer(hashmap_name, producer_id, num_items):
    shm = SharedHashMap(name=hashmap_name, create=False)

    for i in range(num_items):
        value = shm.get(f"item_{producer_id}_{i}")
        print(f"Consumed: {value}")

    shm.close()

# Main process
if __name__ == "__main__":
    hashmap_name = "producer_consumer_example"

    # Create the shared hashmap
    with SharedHashMap(name=hashmap_name, capacity=256, create=True) as shm:
        # Start producer and consumer processes
        p1 = mp.Process(target=producer, args=(hashmap_name, 0, 10))
        p2 = mp.Process(target=consumer, args=(hashmap_name, 0, 10))

        p1.start()
        p2.start()

        p1.join()
        p2.join()

        shm.unlink()

API Reference

Constructor

SharedHashMap(
    name: str,
    capacity: int = 1024,
    max_key_size: int = 256,
    max_value_size: int = 1024,
    create: bool = True
)

Parameters:

  • name: Unique name for the shared memory block
  • capacity: Number of buckets in the hashmap
  • max_key_size: Maximum size in bytes for serialized keys
  • max_value_size: Maximum size in bytes for serialized values
  • create: If True, create new shared memory; if False, attach to existing

Methods

set(key, value)

Set a key-value pair in the hashmap.

get(key, default=None)

Get a value from the hashmap. Returns default if key not found.

delete(key)

Delete a key from the hashmap. Returns True if deleted, False if key didn't exist.

size()

Return the number of key-value pairs in the hashmap.

close()

Close the shared memory handle (keeps shared memory alive for other processes).

unlink()

Delete the shared memory block (should be called by the last process using it).

Dict-like Operations

shm["key"] = "value"  # Set
value = shm["key"]     # Get (raises KeyError if not found)
del shm["key"]         # Delete (raises KeyError if not found)
"key" in shm           # Check existence

Serialization

SharedHashMap optimizes serialization for common types:

Type Serialization Method Notes
str UTF-8 encoding No pickle overhead
bytes Direct storage No pickle overhead
int ASCII encoding No pickle overhead
None Empty bytes No pickle overhead
Other pickle.dumps() Fallback for complex types

Performance

SharedHashMap delivers exceptional performance for cross-process data sharing:

Key Metrics:

  • String reads: ~2,600 ops/sec (382μs mean)
  • String writes: ~1,200 ops/sec (826μs mean)
  • Integer operations: ~6,000+ ops/sec
  • Mixed workloads: ~1,170 ops/sec (854μs mean)
  • Concurrent writers: Scales to multiple processes with minimal contention

Run benchmarks:

pytest tests/test_shared_hashmap_benchmarks.py --benchmark-only -v

Performance Considerations

  1. Capacity: Choose a capacity larger than your expected number of items to minimize collisions
  2. Max sizes: Set max_key_size and max_value_size appropriately for your data
  3. Alignment: Buckets are automatically aligned to 8-byte boundaries for optimal atomic operations
  4. Serialization: Use strings, bytes, or integers when possible for best performance

Thread Safety

SharedHashMap uses atomic compare-and-swap operations to ensure thread safety:

  • Multiple processes can safely read and write concurrently
  • No locks or mutexes required
  • Lock-free design for high concurrency

Limitations

  1. Fixed capacity: The hashmap size is fixed at creation time
  2. No iteration: Currently doesn't support iterating over keys/values
  3. No resizing: Cannot dynamically grow the hashmap
  4. Size limits: Keys and values must fit within configured max sizes

Examples

See examples/basic_usage.py for complete examples including:

  • Basic operations
  • Producer-consumer pattern
  • Distributed computation
  • Stress testing

Run the examples:

python examples/basic_usage.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shared_hashmap-0.2.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shared_hashmap-0.2.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file shared_hashmap-0.2.0.tar.gz.

File metadata

  • Download URL: shared_hashmap-0.2.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for shared_hashmap-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cb82209846f520e2769e7e034cbfdf546f3f1a095bc67d0b494fd89ab3b9d8d6
MD5 9a9bf8ee97c932d5e25c9e64b4d5fe03
BLAKE2b-256 b585c2f48b693e0fbff243b7eb875c885bd1b2d1e6288946ca6a2f5073f7a039

See more details on using hashes here.

File details

Details for the file shared_hashmap-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for shared_hashmap-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 778a780d65ef6236f60ea1eea2e3142460ad0ef975342a1c182c3f6e62c95ac2
MD5 2ffac2ec45d7c644ade47767bfeef32b
BLAKE2b-256 3ac8aca3970371072724d07ec08d8c30b007d4648b6e12807d87f47323e02e73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page