A Vectorized Dictionary for Python
Project description
GetPy - A Vectorized Python Dict/Set
The goal of GetPy is to provide the highest performance python dict/set that integrates into the python scientific ecosystem.
Installation
pip install getpy
If you have issues, feel free to make an issue. You can also build the package from source by cloning the repository and running python setup.py install
.
About
GetPy is a thin binding to the Parallel Hashmap (https://github.com/greg7mdp/parallel-hashmap.git) which is the current state of the art unordered map/set with minimal memory overhead and fast runtime speed. The binding layer is supported by PyBind11 (https://github.com/pybind/pybind11.git) which is fast to compile and simple to extend.
How To Use
The gp.Dict
and gp.Set
objects are designed to maintain a similar interface to the corresponding standard python objects. There are some key differences though, which are necessary for vectorization and other performance considerations.
-
gp.Dict.__init__
has three argumentskey_type
,value_type
, anddefault_value
. The type arguments are define which compiled data structure will be used under the hood, and the full list of preset combinations ofnp.dtype
s is found withgp.dict_types
. You can also specify adefault_value
at construction which must be castable to thevalue_type
. This is the value returned by the dictionary if a key is not found. -
All of
getpy.Dict
methods support a vectorized interface. Therefore, methods likegp.Dict.__getitem__
,gp.Dict.__setitem__
, andgp.Dict.__delitem__
can be performed with annp.ndarray
. That allows the performance critical for-loop to happen within the compiled c++. Note that some dunder methods cannot be vectorized such as__contains__
. Therefore, some keywords likein
do not behave as expected. Those methods are renamed without the double underscores to note their deviation from the standard interface. -
If a key does not exist,
gp.Dict.__getitem__
will return thedefault_value
. If you do not specify thedefault_value
, it will default to the default constructor of your data type (all 0 bits). If you would like to know the difference between a key that does not exist and a key that returns the default value, you should first rungp.contains
on your key/array of keys, and then retrieve values corresponding to keys that exist. -
There is also a
gp.MultiDict
object. This object stores multiple unique values per key.
Examples
Simple Example
import numpy as np
import getpy as gp
key_type = np.dtype('u8')
value_type = np.dtype('u8')
keys = np.random.randint(1, 1000, size=10**2, dtype=key_type)
values = np.random.randint(1, 1000, size=10**2, dtype=value_type)
gp_dict = gp.Dict(key_type, value_type)
gp_dict[keys] = values
Default Example
import numpy as np
import getpy as gp
key_type = np.dtype('u8')
value_type = np.dtype('u8')
keys = np.random.randint(1, 1000, size=10**2, dtype=key_type)
values = np.random.randint(1, 1000, size=10**2, dtype=value_type)
gp_dict = gp.Dict(key_type, value_type, default_value=42)
gp_dict[keys] = values
random_keys = np.random.randint(1, 1000, size=500, dtype=key_type)
random_values = gp_dict[random_keys]
Byteset Example
import numpy as np
import getpy as gp
key_type = np.dtype('S8')
value_type = np.dtype('S8')
keys = np.array([np.random.bytes(8) for i in range(10**2)], dtype=key_type)
values = np.array([np.random.bytes(8) for i in range(10**2)], dtype=value_type)
gp_dict = gp.Dict(key_type, value_type)
gp_dict[keys] = values
Multidimensional Example
import numpy as np
import getpy as gp
key_type = np.dtype('u8')
value_type = np.dtype('u8')
keys = np.random.randint(1, 1000, size=10**2, dtype=key_type).reshape(10,10)
values = np.random.randint(1, 1000, size=10**2, dtype=value_type).reshape(10,10)
gp_dict = gp.Dict(key_type, value_type)
gp_dict[keys] = values
Serialization Example
import numpy as np
import getpy as gp
key_type = np.dtype('u8')
value_type = np.dtype('u8')
keys = np.random.randint(1, 1000, size=10**1, dtype=key_type)
values = np.random.randint(1, 1000, size=10**1, dtype=value_type)
gp_dict_1 = gp.Dict(key_type, value_type)
gp_dict_1[keys] = values
gp_dict_1.dump('test/test.hashtable.bin')
gp_dict_2 = gp.Dict(key_type, value_type)
gp_dict_2.load('test/test.hashtable.bin')
Supported Data Types
dict_types = {
(np.dtype('u4'), np.dtype('u1')) : _gp.Dict_u4_u1,
(np.dtype('u4'), np.dtype('u2')) : _gp.Dict_u4_u2,
(np.dtype('u4'), np.dtype('u4')) : _gp.Dict_u4_u4,
(np.dtype('u4'), np.dtype('u8')) : _gp.Dict_u4_u8,
(np.dtype('u4'), np.dtype('i1')) : _gp.Dict_u4_i1,
(np.dtype('u4'), np.dtype('i2')) : _gp.Dict_u4_i2,
(np.dtype('u4'), np.dtype('i4')) : _gp.Dict_u4_i4,
(np.dtype('u4'), np.dtype('i8')) : _gp.Dict_u4_i8,
(np.dtype('u4'), np.dtype('f4')) : _gp.Dict_u4_f4,
(np.dtype('u4'), np.dtype('f8')) : _gp.Dict_u4_f8,
(np.dtype('u4'), np.dtype('S8')) : _gp.Dict_u4_S8,
(np.dtype('u4'), np.dtype('S16')) : _gp.Dict_u4_S16,
(np.dtype('u8'), np.dtype('u1')) : _gp.Dict_u8_u1,
(np.dtype('u8'), np.dtype('u2')) : _gp.Dict_u8_u2,
(np.dtype('u8'), np.dtype('u4')) : _gp.Dict_u8_u4,
(np.dtype('u8'), np.dtype('u8')) : _gp.Dict_u8_u8,
(np.dtype('u8'), np.dtype('i1')) : _gp.Dict_u8_i1,
(np.dtype('u8'), np.dtype('i2')) : _gp.Dict_u8_i2,
(np.dtype('u8'), np.dtype('i4')) : _gp.Dict_u8_i4,
(np.dtype('u8'), np.dtype('i8')) : _gp.Dict_u8_i8,
(np.dtype('u8'), np.dtype('f4')) : _gp.Dict_u8_f4,
(np.dtype('u8'), np.dtype('f8')) : _gp.Dict_u8_f8,
(np.dtype('u8'), np.dtype('S8')) : _gp.Dict_u8_S8,
(np.dtype('u8'), np.dtype('S16')) : _gp.Dict_u8_S16,
(np.dtype('i4'), np.dtype('u1')) : _gp.Dict_i4_u1,
(np.dtype('i4'), np.dtype('u2')) : _gp.Dict_i4_u2,
(np.dtype('i4'), np.dtype('u4')) : _gp.Dict_i4_u4,
(np.dtype('i4'), np.dtype('u8')) : _gp.Dict_i4_u8,
(np.dtype('i4'), np.dtype('i1')) : _gp.Dict_i4_i1,
(np.dtype('i4'), np.dtype('i2')) : _gp.Dict_i4_i2,
(np.dtype('i4'), np.dtype('i4')) : _gp.Dict_i4_i4,
(np.dtype('i4'), np.dtype('i8')) : _gp.Dict_i4_i8,
(np.dtype('i4'), np.dtype('f4')) : _gp.Dict_i4_f4,
(np.dtype('i4'), np.dtype('f8')) : _gp.Dict_i4_f8,
(np.dtype('i4'), np.dtype('S8')) : _gp.Dict_i4_S8,
(np.dtype('i4'), np.dtype('S16')) : _gp.Dict_i4_S16,
(np.dtype('i8'), np.dtype('u1')) : _gp.Dict_i8_u1,
(np.dtype('i8'), np.dtype('u2')) : _gp.Dict_i8_u2,
(np.dtype('i8'), np.dtype('u4')) : _gp.Dict_i8_u4,
(np.dtype('i8'), np.dtype('u8')) : _gp.Dict_i8_u8,
(np.dtype('i8'), np.dtype('i1')) : _gp.Dict_i8_i1,
(np.dtype('i8'), np.dtype('i2')) : _gp.Dict_i8_i2,
(np.dtype('i8'), np.dtype('i4')) : _gp.Dict_i8_i4,
(np.dtype('i8'), np.dtype('i8')) : _gp.Dict_i8_i8,
(np.dtype('i8'), np.dtype('f4')) : _gp.Dict_i8_f4,
(np.dtype('i8'), np.dtype('f8')) : _gp.Dict_i8_f8,
(np.dtype('i8'), np.dtype('S8')) : _gp.Dict_i8_S8,
(np.dtype('i8'), np.dtype('S16')) : _gp.Dict_i8_S16,
(np.dtype('S8'), np.dtype('u1')) : _gp.Dict_S8_u1,
(np.dtype('S8'), np.dtype('u2')) : _gp.Dict_S8_u2,
(np.dtype('S8'), np.dtype('u4')) : _gp.Dict_S8_u4,
(np.dtype('S8'), np.dtype('u8')) : _gp.Dict_S8_u8,
(np.dtype('S8'), np.dtype('i1')) : _gp.Dict_S8_i1,
(np.dtype('S8'), np.dtype('i2')) : _gp.Dict_S8_i2,
(np.dtype('S8'), np.dtype('i4')) : _gp.Dict_S8_i4,
(np.dtype('S8'), np.dtype('i8')) : _gp.Dict_S8_i8,
(np.dtype('S8'), np.dtype('f4')) : _gp.Dict_S8_f4,
(np.dtype('S8'), np.dtype('f8')) : _gp.Dict_S8_f8,
(np.dtype('S8'), np.dtype('S8')) : _gp.Dict_S8_S8,
(np.dtype('S8'), np.dtype('S16')) : _gp.Dict_S8_S16,
(np.dtype('S16'), np.dtype('u1')) : _gp.Dict_S16_u1,
(np.dtype('S16'), np.dtype('u2')) : _gp.Dict_S16_u2,
(np.dtype('S16'), np.dtype('u4')) : _gp.Dict_S16_u4,
(np.dtype('S16'), np.dtype('u8')) : _gp.Dict_S16_u8,
(np.dtype('S16'), np.dtype('i1')) : _gp.Dict_S16_i1,
(np.dtype('S16'), np.dtype('i2')) : _gp.Dict_S16_i2,
(np.dtype('S16'), np.dtype('i4')) : _gp.Dict_S16_i4,
(np.dtype('S16'), np.dtype('i8')) : _gp.Dict_S16_i8,
(np.dtype('S16'), np.dtype('f4')) : _gp.Dict_S16_f4,
(np.dtype('S16'), np.dtype('f8')) : _gp.Dict_S16_f8,
(np.dtype('S16'), np.dtype('S8')) : _gp.Dict_S16_S8,
(np.dtype('S16'), np.dtype('S16')) : _gp.Dict_S16_S16,
}
set_types = {
np.dtype('u4') : _gp.Set_u4,
np.dtype('u8') : _gp.Set_u8,
np.dtype('i4') : _gp.Set_i4,
np.dtype('i8') : _gp.Set_i8,
np.dtype('S8') : _gp.Set_S8,
np.dtype('S16') : _gp.Set_S16,
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for getpy-0.14.2-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0514bd2c2f7d140159daa3e439772b19375c5d2125a756d7727babf91f232cd |
|
MD5 | e690898d5ffd3658f579333d2ba79126 |
|
BLAKE2b-256 | 5fdd43fa571119b62571edf78bd5d397360b18a2302d3ad2583e6cbf970118cb |
Hashes for getpy-0.14.2-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62ff3662f02019f3243728b5311c9bafe5bd7f19c331a2dc04de9ab2e9151d35 |
|
MD5 | b17bc51901225c52e6b23a8dd581b932 |
|
BLAKE2b-256 | 1173ac12b3ba37a80767697e0641d9e034c1b61e001ebbc399164c51a74a6060 |
Hashes for getpy-0.14.2-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c42c8918b84817ac22b5e2fefadc02d188b2c00ec1f38e16748635d920d2cc5 |
|
MD5 | ded6760b6a3833650a8aa407e703ee02 |
|
BLAKE2b-256 | 809defa5fc4806bc58e57a27ef1c57c781d12542afcb91bb943b7472abc7c283 |
Hashes for getpy-0.14.2-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8594e5ff8ee1e3c88590d1ae8d5b90f57203219061678bfe9abcca9f3fb916ea |
|
MD5 | 0d04f5a95e4e38ccb1d5598ca7e4c111 |
|
BLAKE2b-256 | bfaf663a54d998c94894e19eb9be0430ba17fc8fd1818fa4b5a496ff87fe2c8f |
Hashes for getpy-0.14.2-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 462923a71e4a69eff85c6306ef1013e665afa0ec003f9260f95eba4d122ef3b3 |
|
MD5 | 5c63f14eb4601b58d33968a94d64f66e |
|
BLAKE2b-256 | 4df93602c3288edf07ff72e3451b5de3e39ab35b8baed95bdd63a527b2e0c80f |