A library for deserializing various formats directly into numpy arrays

These details have not been verified by PyPI

Project links

Project description

serde-numpy

serde-numpy is a library for efficient deserializing of various file formats directly into numpy arrays.

See how it works for:

Installation

Currently only available for linux, python >= 3.7

pip install --upgrade pip
pip install serde-numpy

Image formats

Example usage

>>> from serde_numpy import decode_jpeg, read_jpeg, decode_png, read_png
>>> 
>>> img = read_jpeg("test.jpg")
>>> img
array([[[ 75,  29,  82],
        [ 96,  56, 133],
        [ 72,  47, 168],
        [ 63,  56, 179]],

       [[216, 176, 203],
        [173, 139, 190],
        [111,  93, 188],
        [129, 128, 225]],

       [[ 75,  46,  21],
        [ 73,  51,  48],
        [ 81,  73, 115],
        [157, 167, 209]],

       [[165, 142,  99],
        [181, 165, 144],
        [169, 169, 188],
        [185, 203, 222]]], dtype=uint8)
>>> 
>>> byte_array = open("test.png", "rb").read()
>>> img = decode_png(byte_array)
>>> img
array([[[ 33,  47, 146],
        [206,  19, 120],
        [185,   8,  55],
        [ 33,  54, 176]],

       [[252, 156, 169],
        [169, 139, 100],
        [ 24, 128, 222],
        [136, 146, 213]],

       [[ 28,  24, 192],
        [184,  51,  58],
        [ 39,  61, 252],
        [237, 165, 113]],

       [[239, 111,  72],
        [ 30, 242,  38],
        [165, 161, 223],
        [ 91, 246, 217]]], dtype=uint8)

Benchmarks

All benchmarks were performed on an AMD Ryzen 9 3950X (Python 3.8.12, numpy 1.23.2, orjson 3.6.4). We compare serde_numpy's decode_png and decode_jpeg versus pillow's Image.open + np.asarray (which is the de facto standard for libraries than do a lot of image loading e.g. pytorch's torchvision).

JPEG

JPEG decoding for square images:

alt text

PNG

PNG decoding for square images:

alt text

JSON Formats

Motivation

If you've ever done something like this in your code:

data = json.load(open("data.json"))

arr = np.array(data["x"])

then this library does it faster by using minimal array allocations and less python.

Speed ups are 1.5x - 8x times faster, depending on array sizes (and CPU), when compared to orjson + numpy.

Usage

The user specifies the numpy dtypes within a structure corresponding to the data that they want to deserialize.

N-dimensional array

A subset of the json's (or msgpack) keys are specified in the structure which is used to initialize the NumpyDeserializer and then that subset of keys are deserialized accordingly:

>>> from serde_numpy import NumpyDeserializer
>>> 
>>> json_str = b"""
... {
...     "name": "coordinates",
...     "version": "0.1.0",
...     "arr": [[1.254439975231648, -0.6893827594332794],
...             [-0.2922560025562806, 0.5204819306523419]]
... }
... """
>>> 
>>> structure = {
...     'name': str,
...     'arr': np.float32
... }
>>> 
>>> deserializer = NumpyDeserializer.from_dict(structure)
>>> 
>>> deserializer.deserialize_json(json_str)
{'arr': array([[ 1.25444   , -0.68938273],
               [-0.292256  ,  0.52048194]], dtype=float32), 
 'name': 'coordinates'}

Transposed arrays

Sometimes people store data in jsons in a row-wise fashion as opposed to column-wise. Therefore each row can contain multiple dtypes. serde-numpy allows you to specify the types of each row and then deserializes into columns. To tell the numpy deserializer that you want to transpose the columns put square brackets outside either a dictionary [{key: Type, ...}] like this example:

>>> json_str = b"""
... {
...     "df": [{"a": 3, "b": 4.23},
...            {"a": 4, "b": 5.12}]
... }
... """
>>> 
>>> structure = {"df": [{"a": np.uint16, "b": np.float64}]}
>>> 
>>> deserializer = NumpyDeserializer.from_dict(structure)
>>> 
>>> deserializer.deserialize_json(json_str)
{'df': {'b': array([4.23, 5.12]), 'a': array([3, 4], dtype=uint16)}}

or put square brackets outside a list [[Type, ...]] of types:

>>> json_str = b"""
... {
...     "df": [["i", true],
...            ["j", false],
...            ["k", true]]
... }
... """
>>> 
>>> structure = {"df": [[str, np.bool_]]}
>>> 
>>> deserializer = NumpyDeserializer.from_dict(structure)
>>> 
>>> deserializer.deserialize_json(json_str)
{'df': [['i', 'j', 'k'], array([ True, False,  True])]}

Currently supported data formats:

JSON :: NumpyDeserializer.deserialize_json
MessagePack :: NumpyDeserializer.deserialize_msgpack

Currently supported types:

Numpy types:

np.int8
np.int16
np.int32
np.int64
np.uint8
np.uint16
np.uint32
np.uint64
np.float32
np.float64
np.bool_

Python types:

int
float
str
dict
list

Benchmarks

All benchmarks were performed on an AMD Ryzen 9 3950X (Python 3.8.12, numpy 1.23.2, orjson 3.6.4). Orjson was selected as the comparison as it is the fastest on python json benchmarks and we have also found it to be fastest in practice.

2D Array deserialization

Two tests are performed. The number of rows are kept constant at 10 while varying the number of columns and the number of columns are kept constant at 10 while varying the number of rows. We compare against orjson.loads + np.array with the desired data type. Results are presented below for deserializing arrays of various data types:

alt text

Transposed arrays deserialization

For this test we test the speed of deserializing multiple data types which have been serialized in a row-wise fashion and converting it to column-wise arrays during deserializition.

alt text

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Jun 15, 2023

0.2.1

Feb 10, 2023

0.2.0

Feb 10, 2023

0.1.0

Aug 18, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serde_numpy-0.3.0.tar.gz (215.3 kB view details)

Uploaded Jun 15, 2023 Source

Built Distributions

serde_numpy-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (470.2 kB view details)

Uploaded Jun 15, 2023 CPython 3.10 manylinux: glibc 2.17+ x86-64

serde_numpy-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (470.4 kB view details)

Uploaded Jun 15, 2023 CPython 3.9 manylinux: glibc 2.17+ x86-64

serde_numpy-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (470.9 kB view details)

Uploaded Jun 15, 2023 CPython 3.8 manylinux: glibc 2.17+ x86-64

serde_numpy-0.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471.0 kB view details)

Uploaded Jun 15, 2023 CPython 3.7m manylinux: glibc 2.17+ x86-64

File details

Details for the file serde_numpy-0.3.0.tar.gz.

File metadata

Download URL: serde_numpy-0.3.0.tar.gz
Upload date: Jun 15, 2023
Size: 215.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.1.0

File hashes

Hashes for serde_numpy-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b13259b89aee493fbdbca83d965cd8969d98035f56a9d80c1d4073b5dfb9199a`
MD5	`d748572ca459994a5822366e012bea2a`
BLAKE2b-256	`97887f2d47552027b03b12b450ff3e72fcfb2186fb0198333bb1e2b5ada10741`

See more details on using hashes here.

File details

Details for the file serde_numpy-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: serde_numpy-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 15, 2023
Size: 470.2 kB
Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.1.0

File hashes

Hashes for serde_numpy-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`0292cd15d23d9b122996e5fb3a60c46651f4cc3717a5a2d65c530d406c1684a4`
MD5	`d23f4302db346250a576e32a96342d71`
BLAKE2b-256	`d61c2674ba1ccf42f7f466a7eb129ae432ed9bf38499b7f047abcfc2d1dc75df`

See more details on using hashes here.

File details

Details for the file serde_numpy-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: serde_numpy-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 15, 2023
Size: 470.4 kB
Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.1.0

File hashes

Hashes for serde_numpy-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`6b5fffe6224230cae11784ed53e30a814029d8a3036b435cd40fa38ed5353a7c`
MD5	`0df924a47cd224e6493e07248f698135`
BLAKE2b-256	`371d007d838c436fcb2922f9086fcd72d71c748b5d60b11b760baab834c13143`

See more details on using hashes here.

File details

Details for the file serde_numpy-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: serde_numpy-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 15, 2023
Size: 470.9 kB
Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.1.0

File hashes

Hashes for serde_numpy-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`f3d1e44ded070e2c6144e9cfd30afb9904ae9b2fe04e7355cf3d5f9d8245d498`
MD5	`5891478df1bc9bafde768c08501656b5`
BLAKE2b-256	`aed8a930c7f3e21c0d44431edbb8f145941a6860a420c4fe2eeb4606b949a9d7`

See more details on using hashes here.

File details

Details for the file serde_numpy-0.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: serde_numpy-0.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Jun 15, 2023
Size: 471.0 kB
Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: maturin/1.1.0

File hashes

Hashes for serde_numpy-0.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`ffc8f0a6256e085b7957a075ae28a56ceb8428569b913e54092ae3d86a9ef814`
MD5	`a413af579658b370796fc6bd74f60acc`
BLAKE2b-256	`f6edc14b901f3ab1ce92e7aa1dde60b83b738e04209398ae63f6664a81f0a1fd`

See more details on using hashes here.

serde-numpy 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

serde-numpy

Installation

Image formats

Example usage

Benchmarks

JPEG

PNG

JSON Formats

Motivation

Usage

N-dimensional array

Transposed arrays

Currently supported data formats:

Currently supported types:

Benchmarks

2D Array deserialization

Transposed arrays deserialization

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes