Skip to main content

A python dictionary where keys could be lists, dictionaris, numpy arrays, etc.

Project description

About

BSDict is a dictionary for Python that supports the following objects as keys:

  • None,
  • bool, int, float (including math.nan), complex,
  • str, bytes, bytearray,
  • list, tuple, dict, set, frozenset,
  • numpy.ndarray (bool, int, float, complex).

Internally, a total lexicographical order is defined over all supported objects. A sorted array is then used for storage, and binary search is used for element lookup.

BSDict can be initialized in-memory or on-disc (persistent storage). If BSDict is initialized on-disc, then keys and values are stored in memory only as long as the client application has any references to them. So, the dictionary can be larger than available RAM.

Basic Usage

In-memory dictionary:

from bsdict import bsdict
import numpy as np
data = bsdict()
key = {'s1': np.array([1.0, 10.0, np.nan])}
data[key] = 5
data

On-disc dictionary:

from bsdict import bsdict
data = bsdict(datadir = 'cache')
# ...
data.clear()

Memoization

BSDict has been originally written to help memoize functions that accept complex data structures, including floating-point data, as arguments. (Such functions are common in data analysis.) This package includes a simple memoization wrapper that uses BSDict for caching the results.

from time import sleep
from bsdict import memoizer

cached = memoizer(verbose = True)

# Persistent memoization
#cached = memoizer(datadir = 'cache', verbose = True)

@cached
def mysum(x, y):
    print("Computing a challenging math problem...")
    sleep(1)
    return x + y

z = mysum(1, 2)
z = mysum(1, 2)

Warning

This is the first official release. Bug reports are welcome. (There are extensive test suits in the package, however.)

Technical Notes

A simpler way to support arbitrary keys would be to pickle them and store their binary representation in a dictionary. There are two minor issues with approach. Firstly, identical dictionaries and sets might be serialized differently depending on the order in which they were composed (So they need to be recursively sorted). Secondly, numerical algorithms than run on multiple processes or on multiple machines might break floating-point determinism. If it desirable to consider nearly identical numbers as the same number, then with binary search that would possible, while with serialization that won't work. Then again, lexicographical ordering of vectors in inconsistent with near-match lookups, meaning that it would work fine often but not always. If you need near-match lookups, let me know, I'll add them as an option then.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bsdict-0.1.0.tar.gz (7.7 kB view hashes)

Uploaded Source

Built Distribution

bsdict-0.1.0-py3-none-any.whl (10.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page