Skip to main content

A python dictionary where keys could be lists, dictionaris, numpy arrays, etc.

Project description

About

BSDict is a dictionary for Python that supports the following objects as keys:

  • None,
  • bool, int, float (including math.nan), complex,
  • str, bytes, bytearray,
  • list, tuple, dict, set, frozenset,
  • numpy.ndarray (bool, int, float, complex).

Internally, a total lexicographical order is defined over all supported objects. A sorted array is then used for storage, and binary search is used for element lookup.

BSDict can be initialized in-memory or on-disc (persistent storage). If BSDict is initialized on-disc, then keys and values are stored in memory only as long as the client application has any references to them. So, the dictionary can be larger than available RAM.

Basic Usage

In-memory dictionary:

from bsdict import bsdict
import numpy as np
data = bsdict()
key = {'s1': np.array([1.0, 10.0, np.nan])}
data[key] = 5
data

On-disc dictionary:

from bsdict import bsdict
data = bsdict(datadir = 'cache')
# ...
data.clear()

Memoization

BSDict has been originally written to help memoize functions that accept complex data structures, including floating-point data, as arguments. (Such functions are common in data analysis.) This package includes a simple memoization wrapper that uses BSDict for caching the results.

from time import sleep
from bsdict import memoizer

cached = memoizer(verbose = True)

# Persistent memoization
#cached = memoizer(datadir = 'cache', verbose = True)

@cached
def mysum(x, y):
    print("Computing a challenging math problem...")
    sleep(1)
    return x + y

z = mysum(1, 2)
z = mysum(1, 2)

Warning

This is the first official release. Bug reports are welcome. (There are extensive test suits in the package, however.)

Technical Notes

A simpler way to support arbitrary keys would be to pickle them and store their binary representation in a dictionary. There are two minor issues with approach. Firstly, identical dictionaries and sets might be serialized differently depending on the order in which they were composed (So they need to be recursively sorted). Secondly, numerical algorithms than run on multiple processes or on multiple machines might break floating-point determinism. If it desirable to consider nearly identical numbers as the same number, then with binary search that would possible, while with serialization that won't work. Then again, lexicographical ordering of vectors in inconsistent with near-match lookups, meaning that it would work fine often but not always. If you need near-match lookups, let me know, I'll add them as an option then.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bsdict-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

bsdict-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file bsdict-0.1.0.tar.gz.

File metadata

  • Download URL: bsdict-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for bsdict-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6898c8983b494cb1e33bac0140503e8cbf40627fda3a6a23fc6772b2d724e98
MD5 8869386d6180d766894d4932623f4787
BLAKE2b-256 6a26b1230ebe7ec91fe548d28909bc194626bf1e91439daa1a29e627dfba2371

See more details on using hashes here.

File details

Details for the file bsdict-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bsdict-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for bsdict-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0012aa5c2e766ebd1e9918978738cfc131d30ac7904101a86deecab2dcd17a4
MD5 c9acc6db83ce0a5d8255ecf204373256
BLAKE2b-256 1f7998b51270cef2415347ad3bd32d61e78837c3d8ab483320f28c0357c6df33

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page