Skip to main content

Fast Python ser/des for DynamoDB native JSON format

Project description

orjson-ddb

orjson-ddb is a fast DynamoDB native JSON library for Python. It is a fork/reboot of orjson (from which it inherits the fast performance) adapted to serialize and deserialize DDB native JSON format in Python. Compared to boto3 DynamoDB TypeDeserializer, it deserializes DynamoDB response 10x faster and deserialize float numbers (e.g. {"N": "0.13"}) to float instead of Decimal.

orjson-ddb supports CPython 3.7, 3.8, 3.9, 3.10, and 3.11. It distributes x86_64/amd64, aarch64/armv8, and arm7 wheels for Linux, amd64 and aarch64 wheels for macOS, and amd64 wheels for Windows. orjson-ddb does not support PyPy. Releases follow semantic versioning and serializing a new object type without an opt-in flag is considered a breaking change.

The repository and issue tracker is github.com/MattFanto/orjson-ddb, and patches may be submitted there. There is a CHANGELOG available in the repository.

  1. Usage
    1. Install
    2. Quickstart
    3. Deserialize
    4. Serialize
  2. Testing
  3. Performance
    1. Latency
    2. Memory
    3. Reproducing
  4. Questions
  5. Packaging
  6. License

Usage

Install

To install a wheel from PyPI:

pip install --upgrade "pip>=20.3" # manylinux_x_y, universal2 wheel support
pip install --upgrade orjson-ddb

To build a wheel, see packaging.

Deserialize

This library exposes a function loads which can deserializer DynamoDB response into a Python dictionary. This can be used to parse the API response from DynamoDB, as an example if you are using the REST API:

import requests
import orjson_ddb

response = requests.post("https://dynamodb.us-east-1.amazonaws.com/", data={
   "TableName": "some_table", 
   "Key": {"pk": {"S": "pk1"}, "sk": {"S": "sk1"}}
}, headers={
   # some headers
})
data = orjson_ddb.loads(response.content)

The same results can be achieved when using boto3 via a context manager provided by the library:

import boto3
from orjson_ddb import ddb_json_parser


dynamodb_client = boto3.client("dynamodb", region_name="us-east-1")
with ddb_json_parser():
   resp = dynamodb_client.get_item(
      TableName="some_table",
      Key={"pk": {"S": "pk1"}, "sk": {"S": "sk1"}}
   )
print(resp["Items"])
# {'sk': 'sk1', 'pk': 'pk1', 'data': {'some_number': 0.123, 'some_string': 'hello'}}

This context manager tells boto3 to use orjson_ddb.loads to deserialize the DynamoDB response. The output dictionary doesn't contain any reference to the DynamoDB Native format and the result is the same of what you would get with boto3.resource('dynamodb').Table('some_table').get_item(Key={"pk": "pk1", "sk": "sk1"})["Item"] except for "N" type being translated directly to int or float instead of Decimal.

N.B. Unfortunately at the moment it is not possible to use this library with boto3.resource.

Serialize

Serialization of python dictionary to DynamoDB Native JSON format is not available yet.

Performance

Deserialization performance of orjson-ddb is better than boto3, dynamodb-json-util. The benchmarks are done on fixtures of real data converted to DynamoDB native format:

  • twitter.json, 631.5KiB, results of a search on Twitter for "一", containing CJK strings, dictionaries of strings and arrays of dictionaries, indented.

  • github.json, 55.8KiB, a GitHub activity feed, containing dictionaries of strings and arrays of dictionaries, not indented.

  • citm_catalog.json, 1.7MiB, concert data, containing nested dictionaries of strings and arrays of integers, indented.

  • canada.json, 2.2MiB, coordinates of the Canadian border in GeoJSON format, containing floats and arrays, indented.

Latency

twitter.json deserialization

Library Median latency (milliseconds) Operations per second Relative (latency)
orjson-ddb 2.17 459.7 1
boto3-json 18.61 54.1 8.57
dynamodb-json-util 54.13 18.4 24.92

citm_catalog.json deserialization

Library Median latency (milliseconds) Operations per second Relative (latency)
orjson-ddb 4.43 240.3 1
boto3-json 53.3 18.6 12.03
dynamodb-json-util 57.27 17.5 12.93

canada.json deserialization

Library Median latency (milliseconds) Operations per second Relative (latency)
orjson-ddb 19.22 52 1
boto3-json 221.83 4.5 11.54
dynamodb-json-util 244.14 4.1 12.7

Reproducing

The above was measured using Python 3.9.13 on Linux (amd64) with orjson-ddb 0.1.1, boto3==1.21.27, dynamodb-json==1.3

The latency results can be reproduced using the pybench and graph scripts.

Questions

Why can't I install it from PyPI?

Probably pip needs to be upgraded to version 20.3 or later to support the latest manylinux_x_y or universal2 wheel formats.

"Cargo, the Rust package manager, is not installed or is not on PATH."

This happens when there are no binary wheels (like manylinux) for your platform on PyPI. You can install Rust through rustup or a package manager and then it will compile.

Will it support PyPy?

Probably not.

Packaging

To package orjson-ddb requires at least Rust 1.57 and the maturin build tool. The recommended build command is:

maturin build --release --strip

It benefits from also having a C build environment to compile a faster deserialization backend. See this project's manylinux_2_28 builds for an example using clang and LTO.

The project's own CI tests against nightly-2022-07-26 and stable 1.54. It is prudent to pin the nightly version because that channel can introduce breaking changes.

orjson-ddb is tested for amd64, aarch64, and arm7 on Linux. It is tested for amd64 on macOS and cross-compiles for aarch64. For Windows it is tested on amd64.

There are no runtime dependencies other than libc.

Tests are included in the source distribution on PyPI. The requirements to run the tests are specified in test/requirements.txt. The tests should be run as part of the build. It can be run with pytest -q test.

License

orjson was written by ijl ijl@mailbox.org, copyright 2018 - 2021, licensed under both the Apache 2 and MIT licenses.

orjson-ddb was forked from orjson and is maintained by Mattia Fantoni mattia.fantoni@gmail.com, licensed same as orjson.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

orjson_ddb-0.2.2-cp311-none-win_amd64.whl (208.4 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

orjson_ddb-0.2.2-cp311-none-win32.whl (210.8 kB view hashes)

Uploaded CPython 3.11 Windows x86

orjson_ddb-0.2.2-cp311-cp311-manylinux_2_28_x86_64.whl (152.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

orjson_ddb-0.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.6 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

orjson_ddb-0.2.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (267.5 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

orjson_ddb-0.2.2-cp311-cp311-macosx_10_7_x86_64.whl (273.5 kB view hashes)

Uploaded CPython 3.11 macOS 10.7+ x86-64

orjson_ddb-0.2.2-cp310-none-win_amd64.whl (208.4 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

orjson_ddb-0.2.2-cp310-none-win32.whl (210.8 kB view hashes)

Uploaded CPython 3.10 Windows x86

orjson_ddb-0.2.2-cp310-cp310-musllinux_1_1_x86_64.whl (454.8 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

orjson_ddb-0.2.2-cp310-cp310-musllinux_1_1_aarch64.whl (449.0 kB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ ARM64

orjson_ddb-0.2.2-cp310-cp310-manylinux_2_28_x86_64.whl (152.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

orjson_ddb-0.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.6 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

orjson_ddb-0.2.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (267.5 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

orjson_ddb-0.2.2-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (509.7 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

orjson_ddb-0.2.2-cp310-cp310-macosx_10_7_x86_64.whl (273.4 kB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

orjson_ddb-0.2.2-cp39-none-win_amd64.whl (208.4 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

orjson_ddb-0.2.2-cp39-none-win32.whl (210.8 kB view hashes)

Uploaded CPython 3.9 Windows x86

orjson_ddb-0.2.2-cp39-cp39-musllinux_1_1_x86_64.whl (454.8 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

orjson_ddb-0.2.2-cp39-cp39-musllinux_1_1_aarch64.whl (449.0 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ ARM64

orjson_ddb-0.2.2-cp39-cp39-manylinux_2_28_x86_64.whl (152.3 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

orjson_ddb-0.2.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

orjson_ddb-0.2.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (267.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

orjson_ddb-0.2.2-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (509.7 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

orjson_ddb-0.2.2-cp39-cp39-macosx_10_7_x86_64.whl (273.4 kB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

orjson_ddb-0.2.2-cp38-none-win_amd64.whl (208.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

orjson_ddb-0.2.2-cp38-none-win32.whl (210.7 kB view hashes)

Uploaded CPython 3.8 Windows x86

orjson_ddb-0.2.2-cp38-cp38-musllinux_1_1_x86_64.whl (454.5 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

orjson_ddb-0.2.2-cp38-cp38-musllinux_1_1_aarch64.whl (448.8 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ ARM64

orjson_ddb-0.2.2-cp38-cp38-manylinux_2_28_x86_64.whl (152.1 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

orjson_ddb-0.2.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.4 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

orjson_ddb-0.2.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (267.3 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

orjson_ddb-0.2.2-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (509.6 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

orjson_ddb-0.2.2-cp38-cp38-macosx_10_7_x86_64.whl (273.3 kB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

orjson_ddb-0.2.2-cp37-none-win_amd64.whl (208.1 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

orjson_ddb-0.2.2-cp37-none-win32.whl (210.6 kB view hashes)

Uploaded CPython 3.7 Windows x86

orjson_ddb-0.2.2-cp37-cp37m-musllinux_1_1_x86_64.whl (454.6 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

orjson_ddb-0.2.2-cp37-cp37m-musllinux_1_1_aarch64.whl (448.7 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ ARM64

orjson_ddb-0.2.2-cp37-cp37m-manylinux_2_28_x86_64.whl (152.1 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.28+ x86-64

orjson_ddb-0.2.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (282.4 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

orjson_ddb-0.2.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (267.3 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

orjson_ddb-0.2.2-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (509.4 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

orjson_ddb-0.2.2-cp37-cp37m-macosx_10_7_x86_64.whl (273.2 kB view hashes)

Uploaded CPython 3.7m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page