Skip to main content

High-speed JSON parser

Project description

cysimdjson

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.
It is Python bindings for the simdjson using Cython.

Standard Python JSON parser (json.load() etc.) is relatively slow, and if you need to parse large JSON files or a large number of small JSON files, it may represent a significant bottleneck.

Whilst there are other fast Python JSON parsers, such as pysimdjson, libpy_simdjson or orjson, they don't reach the raw speed that is provided by the brilliant SIMDJSON project. SIMDJSON is C++ JSON parser based on SIMD instructions, reportedly the fastest JSON parser on the planet.

Python 3.11 Python 3.10
Python 3.9 Python 3.8 Python 3.7

Usage

import cysimdjson

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

parser = cysimdjson.JSONParser()
json_element = parser.parse(json_bytes)

# Access using JSON Pointer
print(json_element.at_pointer("/foo/2/0"))

Note: parser object can be reused for maximum performance.

Pythonic drop-in API

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access using JSON Pointer
print(json_parsed.json_parsed['foo'])

The json_parsed is a read-only dictionary-like object, that provides an access to JSON data.

Trade-offs

The speed of cysimdjson is based on these assumptions:

  1. The output of the parser is read-only, you cannot modify it
  2. The output of the parser is not Python dictionary, but lazily evaluated dictionary-like object
  3. If you convert the parser output into a Python dictionary, you will lose the speed

If your design is not aligned with these assumptions, cysimdjson is not a good choice.

Documentation

JSONParser.parse(json_bytes)

Parse JSON json_bytes, represented as bytes.

JSONParser.parse_in_place(bytes)

Parse JSON json_bytes, represented as bytes, assuming that there is a padding expected by SIMDJSON. This is the fastest parsing variant.

JSONParser.parse_string(string)

Parse JSON json_bytes, represented as str (string).

JSONParser.load(path)

Installation

pip3 install cysimdjson

Project cysimdjson is distributed via PyPI: https://pypi.org/project/cysimdjson/ .

If you want to install cysimdjson from source, you need to install Cython first: pip3 install cython.

Performance

----------------------------------------------------------------
# 'jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          510291.81 EPS (  1.00)  1223.17 MB/s
* libpy_simdjson loads      374615.54 EPS (  1.36)   897.95 MB/s
* pysimdjson parse          362195.46 EPS (  1.41)   868.18 MB/s
* orjson loads              110615.70 EPS (  4.61)   265.15 MB/s
* python json loads          72096.80 EPS (  7.08)   172.82 MB/s
----------------------------------------------------------------

SIMDJSON: 543335.93 EPS, 1241.52 MB/s
----------------------------------------------------------------
# 'jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson parse            2556.10 EPS (  1.00)  1614.22 MB/s
* libpy_simdjson loads        2444.53 EPS (  1.05)  1543.76 MB/s
* pysimdjson parse            2415.46 EPS (  1.06)  1525.40 MB/s
* orjson loads                 387.11 EPS (  6.60)   244.47 MB/s
* python json loads            278.63 EPS (  9.17)   175.96 MB/s
----------------------------------------------------------------

SIMDJSON: 2536.16 EPS,  1527.28 MB/s
----------------------------------------------------------------
# 'jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson parse             284.67 EPS (  1.00)   640.81 MB/s
* pysimdjson parse             284.62 EPS (  1.00)   640.70 MB/s
* libpy_simdjson loads         277.13 EPS (  1.03)   623.84 MB/s
* orjson loads                  81.80 EPS (  3.48)   184.13 MB/s
* python json loads             22.52 EPS ( 12.64)    50.68 MB/s
----------------------------------------------------------------

SIMDJSON: 307.95 EPS, 661.08 MB/s
----------------------------------------------------------------
# 'jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson parse             775.61 EPS (  1.00)  2581.09 MB/s
* pysimdjson parse             743.67 EPS (  1.04)  2474.81 MB/s
* libpy_simdjson loads         654.15 EPS (  1.19)  2176.88 MB/s
* orjson loads                 166.67 EPS (  4.65)   554.66 MB/s
* python json loads            113.72 EPS (  6.82)   378.43 MB/s
----------------------------------------------------------------

SIMDJSON: 703.59 EPS, 2232.92 MB/s
----------------------------------------------------------------
# 'jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         3972376.53 EPS (  1.00)    27.81 MB/s
* orjson loads             3637369.63 EPS (  1.09)    25.46 MB/s
* libpy_simdjson loads     1774211.19 EPS (  2.24)    12.42 MB/s
* pysimdjson parse          977530.90 EPS (  4.06)     6.84 MB/s
* python json loads         527932.65 EPS (  7.52)     3.70 MB/s
----------------------------------------------------------------

SIMDJSON: 3799392.10 EPS

CPU: AMD EPYC 7452

More performance testing:

Tests are reproducible

pip3 install orjson
pip3 install pysimdjson
pip3 install libpy_simdjson
python3 setup.py build_ext --inplace
PYTHONPATH=. python3 ./perftest/test_benchmark.py

Manual build

python3 setup.py build_ext --inplace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cysimdjson-23.8.tar.gz (517.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cysimdjson-23.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cysimdjson-23.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cysimdjson-23.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

cysimdjson-23.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

cysimdjson-23.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

cysimdjson-23.8-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file cysimdjson-23.8.tar.gz.

File metadata

  • Download URL: cysimdjson-23.8.tar.gz
  • Upload date:
  • Size: 517.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for cysimdjson-23.8.tar.gz
Algorithm Hash digest
SHA256 07a6aa41188d57af961889a97f6406162dcb53e2ce72afe24947ddb150cdbd07
MD5 525cc6e0ef266edf1d12045eeba82169
BLAKE2b-256 86631704c460f2958a283213f1d1f91fcf8016aa9b26a8addec31b3ff5f2b816

See more details on using hashes here.

File details

Details for the file cysimdjson-23.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e800422372283dea223d3458efb2e6cb09252a13c0f340b84ef4639f7f155925
MD5 221f8f853a67b3c2342a952889420cbb
BLAKE2b-256 679067bca9870c15b5d4f0fdfc6f65c3c02c220787c48c75a5e33abaa7fcc877

See more details on using hashes here.

File details

Details for the file cysimdjson-23.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c8823499c97058731b15ccb550f8052b007dc88d713ff0b58d79b323686a64f9
MD5 6aea69914d85d1ad6602d384e3a6f9ae
BLAKE2b-256 ce2953471311a4337d29af5d5d1ca874a38b3a97725a6ed825f3078cc1a1d263

See more details on using hashes here.

File details

Details for the file cysimdjson-23.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 86649275e53918f980e9ba67b5697d115fd682b42abacbfaeddd2876fe3af7a0
MD5 10fbb8258160db234b97e527d54e24d1
BLAKE2b-256 7af81fd18d3cc00ea6e24d9c350b131714c3ddda340df788396daba492303afb

See more details on using hashes here.

File details

Details for the file cysimdjson-23.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 635548b40063c67c5374461035363056376b4544773eea0294357845b1e48aec
MD5 ac6c21d7333e4df1e9d354fcb10e3a6f
BLAKE2b-256 fbde48e994d7067d6f0cb1639b511b99ea925726fa4544cd23e5a4226620a4d7

See more details on using hashes here.

File details

Details for the file cysimdjson-23.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.8-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a81c8a0937564ddef2ab915ef9c3fab8ba9ad7a886b4a0e0d0c3291324e6482f
MD5 d6fb1f8f19792abb633dbcc2399b25a3
BLAKE2b-256 37103340edf6294e9b691f2c1cb3fb35d094103765d152147464cd95ad0862f1

See more details on using hashes here.

File details

Details for the file cysimdjson-23.8-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.8-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a72da6498819c7e73385e3861f2b3c9c6563a29f855f2a6fa3824798fa6c63d8
MD5 a14fe681c2f67a99a5297746373e6e44
BLAKE2b-256 167035cb9698cfb378fe1cfa8081436fa299f3db3d9ad3dc40e1f3c6b4983ced

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page