Skip to main content

A Vectorized Dictionary for Python

Project description

GetPy - A Vectorized Python Dict/Set

The goal of GetPy is to provide the highest performance python dict/set that integrates into the python scientific ecosystem.

Installation

pip install getpy

If you have issues, feel free to make an issue. You can also build the package from source by cloning the repository and running python setup.py install.

About

GetPy is a thin binding to the Parallel Hashmap (https://github.com/greg7mdp/parallel-hashmap.git) which is the current state of the art unordered map/set with minimal memory overhead and fast runtime speed. The binding layer is supported by PyBind11 (https://github.com/pybind/pybind11.git) which is fast to compile and simple to extend.

How To Use

The gp.Dict and gp.Set objects are designed to maintain a similar interface to the corresponding standard python objects. There are some key differences though, which are necessary for vectorization and other performance considerations.

  1. gp.Dict.__init__ has three arguments key_type, value_type, and default_value. The type arguments are define which compiled data structure will be used under the hood, and the full list of preset combinations of np.dtypes is found with gp.dict_types. You can also specify a default_value at construction which must be castable to the value_type. This is the value returned by the dictionary if a key is not found.

  2. All of getpy.Dict methods support a vectorized interface. Therefore, methods like gp.Dict.__getitem__, gp.Dict.__setitem__, and gp.Dict.__delitem__ can be performed with an np.ndarray. That allows the performance critical for-loop to happen within the compiled c++. Note that some dunder methods cannot be vectorized such as __contains__. Therefore, some keywords like in do not behave as expected. Those methods are renamed without the double underscores to note their deviation from the standard interface.

  3. If a key does not exist, gp.Dict.__getitem__ will return the default_value. If you do not specify the default_value, it will default to the default constructor of your data type (all 0 bits). If you would like to know the difference between a key that does not exist and a key that returns the default value, you should first run gp.contains on your key/array of keys, and then retrieve values corresponding to keys that exist.

  4. There is also a gp.MultiDict object. This object stores multiple unique values per key.

Examples

Simple Example

import numpy as np
import getpy as gp

key_type = np.dtype('u8')
value_type = np.dtype('u8')

keys = np.random.randint(1, 1000, size=10**2, dtype=key_type)
values = np.random.randint(1, 1000, size=10**2, dtype=value_type)

gp_dict = gp.Dict(key_type, value_type)
gp_dict[keys] = values

Default Example

import numpy as np
import getpy as gp

key_type = np.dtype('u8')
value_type = np.dtype('u8')

keys = np.random.randint(1, 1000, size=10**2, dtype=key_type)
values = np.random.randint(1, 1000, size=10**2, dtype=value_type)

gp_dict = gp.Dict(key_type, value_type, default_value=42)
gp_dict[keys] = values

random_keys = np.random.randint(1, 1000, size=500, dtype=key_type)
random_values = gp_dict[random_keys]

Byteset Example

import numpy as np
import getpy as gp

key_type = np.dtype('S8')
value_type = np.dtype('S8')

keys = np.array([np.random.bytes(8) for i in range(10**2)], dtype=key_type)
values = np.array([np.random.bytes(8) for i in range(10**2)], dtype=value_type)

gp_dict = gp.Dict(key_type, value_type)
gp_dict[keys] = values

Multidimensional Example

import numpy as np
import getpy as gp

key_type = np.dtype('u8')
value_type = np.dtype('u8')

keys = np.random.randint(1, 1000, size=10**2, dtype=key_type).reshape(10,10)
values = np.random.randint(1, 1000, size=10**2, dtype=value_type).reshape(10,10)

gp_dict = gp.Dict(key_type, value_type)
gp_dict[keys] = values

Serialization Example

import numpy as np
import getpy as gp

key_type = np.dtype('u8')
value_type = np.dtype('u8')

keys = np.random.randint(1, 1000, size=10**1, dtype=key_type)
values = np.random.randint(1, 1000, size=10**1, dtype=value_type)

gp_dict_1 = gp.Dict(key_type, value_type)
gp_dict_1[keys] = values
gp_dict_1.dump('test/test.hashtable.bin')

gp_dict_2 = gp.Dict(key_type, value_type)
gp_dict_2.load('test/test.hashtable.bin')

Supported Data Types

dict_types = {
    (np.dtype('u4'), np.dtype('u1')) : _gp.Dict_u4_u1,
    (np.dtype('u4'), np.dtype('u2')) : _gp.Dict_u4_u2,
    (np.dtype('u4'), np.dtype('u4')) : _gp.Dict_u4_u4,
    (np.dtype('u4'), np.dtype('u8')) : _gp.Dict_u4_u8,
    (np.dtype('u4'), np.dtype('i1')) : _gp.Dict_u4_i1,
    (np.dtype('u4'), np.dtype('i2')) : _gp.Dict_u4_i2,
    (np.dtype('u4'), np.dtype('i4')) : _gp.Dict_u4_i4,
    (np.dtype('u4'), np.dtype('i8')) : _gp.Dict_u4_i8,
    (np.dtype('u4'), np.dtype('f4')) : _gp.Dict_u4_f4,
    (np.dtype('u4'), np.dtype('f8')) : _gp.Dict_u4_f8,
    (np.dtype('u4'), np.dtype('S8')) : _gp.Dict_u4_S8,
    (np.dtype('u4'), np.dtype('S16')) : _gp.Dict_u4_S16,
    (np.dtype('u8'), np.dtype('u1')) : _gp.Dict_u8_u1,
    (np.dtype('u8'), np.dtype('u2')) : _gp.Dict_u8_u2,
    (np.dtype('u8'), np.dtype('u4')) : _gp.Dict_u8_u4,
    (np.dtype('u8'), np.dtype('u8')) : _gp.Dict_u8_u8,
    (np.dtype('u8'), np.dtype('i1')) : _gp.Dict_u8_i1,
    (np.dtype('u8'), np.dtype('i2')) : _gp.Dict_u8_i2,
    (np.dtype('u8'), np.dtype('i4')) : _gp.Dict_u8_i4,
    (np.dtype('u8'), np.dtype('i8')) : _gp.Dict_u8_i8,
    (np.dtype('u8'), np.dtype('f4')) : _gp.Dict_u8_f4,
    (np.dtype('u8'), np.dtype('f8')) : _gp.Dict_u8_f8,
    (np.dtype('u8'), np.dtype('S8')) : _gp.Dict_u8_S8,
    (np.dtype('u8'), np.dtype('S16')) : _gp.Dict_u8_S16,
    (np.dtype('i4'), np.dtype('u1')) : _gp.Dict_i4_u1,
    (np.dtype('i4'), np.dtype('u2')) : _gp.Dict_i4_u2,
    (np.dtype('i4'), np.dtype('u4')) : _gp.Dict_i4_u4,
    (np.dtype('i4'), np.dtype('u8')) : _gp.Dict_i4_u8,
    (np.dtype('i4'), np.dtype('i1')) : _gp.Dict_i4_i1,
    (np.dtype('i4'), np.dtype('i2')) : _gp.Dict_i4_i2,
    (np.dtype('i4'), np.dtype('i4')) : _gp.Dict_i4_i4,
    (np.dtype('i4'), np.dtype('i8')) : _gp.Dict_i4_i8,
    (np.dtype('i4'), np.dtype('f4')) : _gp.Dict_i4_f4,
    (np.dtype('i4'), np.dtype('f8')) : _gp.Dict_i4_f8,
    (np.dtype('i4'), np.dtype('S8')) : _gp.Dict_i4_S8,
    (np.dtype('i4'), np.dtype('S16')) : _gp.Dict_i4_S16,
    (np.dtype('i8'), np.dtype('u1')) : _gp.Dict_i8_u1,
    (np.dtype('i8'), np.dtype('u2')) : _gp.Dict_i8_u2,
    (np.dtype('i8'), np.dtype('u4')) : _gp.Dict_i8_u4,
    (np.dtype('i8'), np.dtype('u8')) : _gp.Dict_i8_u8,
    (np.dtype('i8'), np.dtype('i1')) : _gp.Dict_i8_i1,
    (np.dtype('i8'), np.dtype('i2')) : _gp.Dict_i8_i2,
    (np.dtype('i8'), np.dtype('i4')) : _gp.Dict_i8_i4,
    (np.dtype('i8'), np.dtype('i8')) : _gp.Dict_i8_i8,
    (np.dtype('i8'), np.dtype('f4')) : _gp.Dict_i8_f4,
    (np.dtype('i8'), np.dtype('f8')) : _gp.Dict_i8_f8,
    (np.dtype('i8'), np.dtype('S8')) : _gp.Dict_i8_S8,
    (np.dtype('i8'), np.dtype('S16')) : _gp.Dict_i8_S16,
    (np.dtype('S8'), np.dtype('u1')) : _gp.Dict_S8_u1,
    (np.dtype('S8'), np.dtype('u2')) : _gp.Dict_S8_u2,
    (np.dtype('S8'), np.dtype('u4')) : _gp.Dict_S8_u4,
    (np.dtype('S8'), np.dtype('u8')) : _gp.Dict_S8_u8,
    (np.dtype('S8'), np.dtype('i1')) : _gp.Dict_S8_i1,
    (np.dtype('S8'), np.dtype('i2')) : _gp.Dict_S8_i2,
    (np.dtype('S8'), np.dtype('i4')) : _gp.Dict_S8_i4,
    (np.dtype('S8'), np.dtype('i8')) : _gp.Dict_S8_i8,
    (np.dtype('S8'), np.dtype('f4')) : _gp.Dict_S8_f4,
    (np.dtype('S8'), np.dtype('f8')) : _gp.Dict_S8_f8,
    (np.dtype('S8'), np.dtype('S8')) : _gp.Dict_S8_S8,
    (np.dtype('S8'), np.dtype('S16')) : _gp.Dict_S8_S16,
    (np.dtype('S16'), np.dtype('u1')) : _gp.Dict_S16_u1,
    (np.dtype('S16'), np.dtype('u2')) : _gp.Dict_S16_u2,
    (np.dtype('S16'), np.dtype('u4')) : _gp.Dict_S16_u4,
    (np.dtype('S16'), np.dtype('u8')) : _gp.Dict_S16_u8,
    (np.dtype('S16'), np.dtype('i1')) : _gp.Dict_S16_i1,
    (np.dtype('S16'), np.dtype('i2')) : _gp.Dict_S16_i2,
    (np.dtype('S16'), np.dtype('i4')) : _gp.Dict_S16_i4,
    (np.dtype('S16'), np.dtype('i8')) : _gp.Dict_S16_i8,
    (np.dtype('S16'), np.dtype('f4')) : _gp.Dict_S16_f4,
    (np.dtype('S16'), np.dtype('f8')) : _gp.Dict_S16_f8,
    (np.dtype('S16'), np.dtype('S8')) : _gp.Dict_S16_S8,
    (np.dtype('S16'), np.dtype('S16')) : _gp.Dict_S16_S16,
}

set_types = {
    np.dtype('u4') : _gp.Set_u4,
    np.dtype('u8') : _gp.Set_u8,
    np.dtype('i4') : _gp.Set_i4,
    np.dtype('i8') : _gp.Set_i8,
    np.dtype('S8') : _gp.Set_S8,
    np.dtype('S16') : _gp.Set_S16,
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

getpy-0.14.2-cp39-cp39-manylinux2014_x86_64.whl (23.6 MB view details)

Uploaded CPython 3.9

getpy-0.14.2-cp38-cp38-manylinux2014_x86_64.whl (23.7 MB view details)

Uploaded CPython 3.8

getpy-0.14.2-cp37-cp37m-manylinux2014_x86_64.whl (25.2 MB view details)

Uploaded CPython 3.7m

getpy-0.14.2-cp36-cp36m-manylinux2014_x86_64.whl (25.2 MB view details)

Uploaded CPython 3.6m

getpy-0.14.2-cp35-cp35m-manylinux2014_x86_64.whl (25.2 MB view details)

Uploaded CPython 3.5m

File details

Details for the file getpy-0.14.2-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: getpy-0.14.2-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 23.6 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.9.0b3

File hashes

Hashes for getpy-0.14.2-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b0514bd2c2f7d140159daa3e439772b19375c5d2125a756d7727babf91f232cd
MD5 e690898d5ffd3658f579333d2ba79126
BLAKE2b-256 5fdd43fa571119b62571edf78bd5d397360b18a2302d3ad2583e6cbf970118cb

See more details on using hashes here.

File details

Details for the file getpy-0.14.2-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: getpy-0.14.2-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 23.7 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.9.0b3

File hashes

Hashes for getpy-0.14.2-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62ff3662f02019f3243728b5311c9bafe5bd7f19c331a2dc04de9ab2e9151d35
MD5 b17bc51901225c52e6b23a8dd581b932
BLAKE2b-256 1173ac12b3ba37a80767697e0641d9e034c1b61e001ebbc399164c51a74a6060

See more details on using hashes here.

File details

Details for the file getpy-0.14.2-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: getpy-0.14.2-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 25.2 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.9.0b3

File hashes

Hashes for getpy-0.14.2-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0c42c8918b84817ac22b5e2fefadc02d188b2c00ec1f38e16748635d920d2cc5
MD5 ded6760b6a3833650a8aa407e703ee02
BLAKE2b-256 809defa5fc4806bc58e57a27ef1c57c781d12542afcb91bb943b7472abc7c283

See more details on using hashes here.

File details

Details for the file getpy-0.14.2-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: getpy-0.14.2-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 25.2 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.9.0b3

File hashes

Hashes for getpy-0.14.2-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8594e5ff8ee1e3c88590d1ae8d5b90f57203219061678bfe9abcca9f3fb916ea
MD5 0d04f5a95e4e38ccb1d5598ca7e4c111
BLAKE2b-256 bfaf663a54d998c94894e19eb9be0430ba17fc8fd1818fa4b5a496ff87fe2c8f

See more details on using hashes here.

File details

Details for the file getpy-0.14.2-cp35-cp35m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: getpy-0.14.2-cp35-cp35m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 25.2 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.9.0b3

File hashes

Hashes for getpy-0.14.2-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 462923a71e4a69eff85c6306ef1013e665afa0ec003f9260f95eba4d122ef3b3
MD5 5c63f14eb4601b58d33968a94d64f66e
BLAKE2b-256 4df93602c3288edf07ff72e3451b5de3e39ab35b8baed95bdd63a527b2e0c80f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page