Skip to main content

High-speed JSON parser

Project description

cysimdjson

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.
It is Python bindings for the simdjson using Cython.

Standard Python JSON parser (json.load() etc.) is relatively slow, and if you need to parse large JSON files or a large number of small JSON files, it may represent a significant bottleneck.

Whilst there are other fast Python JSON parsers, such as pysimdjson, libpy_simdjson or orjson, they don't reach the raw speed that is provided by the brilliant SIMDJSON project. SIMDJSON is C++ JSON parser based on SIMD instructions, reportedly the fastest JSON parser on the planet.

Python 3.11 Python 3.10
Python 3.9 Python 3.8 Python 3.7

Usage

import cysimdjson

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

parser = cysimdjson.JSONParser()
json_element = parser.parse(json_bytes)

# Access using JSON Pointer
print(json_element.at_pointer("/foo/2/0"))

Note: parser object can be reused for maximum performance.

Pythonic drop-in API

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access using JSON Pointer
print(json_parsed.json_parsed['foo'])

The json_parsed is a read-only dictionary-like object, that provides an access to JSON data.

Trade-offs

The speed of cysimdjson is based on these assumptions:

  1. The output of the parser is read-only, you cannot modify it
  2. The output of the parser is not Python dictionary, but lazily evaluated dictionary-like object
  3. If you convert the parser output into a Python dictionary, you will lose the speed

If your design is not aligned with these assumptions, cysimdjson is not a good choice.

Documentation

JSONParser.parse(json_bytes)

Parse JSON json_bytes, represented as bytes.

JSONParser.parse_in_place(bytes)

Parse JSON json_bytes, represented as bytes, assuming that there is a padding expected by SIMDJSON. This is the fastest parsing variant.

JSONParser.parse_string(string)

Parse JSON json_bytes, represented as str (string).

JSONParser.load(path)

Installation

pip3 install cysimdjson

Project cysimdjson is distributed via PyPI: https://pypi.org/project/cysimdjson/ .

If you want to install cysimdjson from source, you need to install Cython first: pip3 install cython.

Performance

----------------------------------------------------------------
# 'jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          510291.81 EPS (  1.00)  1223.17 MB/s
* libpy_simdjson loads      374615.54 EPS (  1.36)   897.95 MB/s
* pysimdjson parse          362195.46 EPS (  1.41)   868.18 MB/s
* orjson loads              110615.70 EPS (  4.61)   265.15 MB/s
* python json loads          72096.80 EPS (  7.08)   172.82 MB/s
----------------------------------------------------------------

SIMDJSON: 543335.93 EPS, 1241.52 MB/s
----------------------------------------------------------------
# 'jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson parse            2556.10 EPS (  1.00)  1614.22 MB/s
* libpy_simdjson loads        2444.53 EPS (  1.05)  1543.76 MB/s
* pysimdjson parse            2415.46 EPS (  1.06)  1525.40 MB/s
* orjson loads                 387.11 EPS (  6.60)   244.47 MB/s
* python json loads            278.63 EPS (  9.17)   175.96 MB/s
----------------------------------------------------------------

SIMDJSON: 2536.16 EPS,  1527.28 MB/s
----------------------------------------------------------------
# 'jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson parse             284.67 EPS (  1.00)   640.81 MB/s
* pysimdjson parse             284.62 EPS (  1.00)   640.70 MB/s
* libpy_simdjson loads         277.13 EPS (  1.03)   623.84 MB/s
* orjson loads                  81.80 EPS (  3.48)   184.13 MB/s
* python json loads             22.52 EPS ( 12.64)    50.68 MB/s
----------------------------------------------------------------

SIMDJSON: 307.95 EPS, 661.08 MB/s
----------------------------------------------------------------
# 'jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson parse             775.61 EPS (  1.00)  2581.09 MB/s
* pysimdjson parse             743.67 EPS (  1.04)  2474.81 MB/s
* libpy_simdjson loads         654.15 EPS (  1.19)  2176.88 MB/s
* orjson loads                 166.67 EPS (  4.65)   554.66 MB/s
* python json loads            113.72 EPS (  6.82)   378.43 MB/s
----------------------------------------------------------------

SIMDJSON: 703.59 EPS, 2232.92 MB/s
----------------------------------------------------------------
# 'jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         3972376.53 EPS (  1.00)    27.81 MB/s
* orjson loads             3637369.63 EPS (  1.09)    25.46 MB/s
* libpy_simdjson loads     1774211.19 EPS (  2.24)    12.42 MB/s
* pysimdjson parse          977530.90 EPS (  4.06)     6.84 MB/s
* python json loads         527932.65 EPS (  7.52)     3.70 MB/s
----------------------------------------------------------------

SIMDJSON: 3799392.10 EPS

CPU: AMD EPYC 7452

More performance testing:

Tests are reproducible

pip3 install orjson
pip3 install pysimdjson
pip3 install libpy_simdjson
python3 setup.py build_ext --inplace
PYTHONPATH=. python3 ./perftest/test_benchmark.py

Manual build

python3 setup.py build_ext --inplace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

cysimdjson-24.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

cysimdjson-24.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

cysimdjson-24.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

cysimdjson-24.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

cysimdjson-24.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

cysimdjson-24.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

cysimdjson-24.12-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

File details

Details for the file cysimdjson-24.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e23dceed2d27f778362cdf3f25bcae1f74a024fc3393f0d9e04f1556fff217d5
MD5 035b472998d9ac43365cd743ec1912a4
BLAKE2b-256 fa304e6fcfa347e2538c75b79e5666e9c57bfb49d25a167877fba2dcce997e68

See more details on using hashes here.

File details

Details for the file cysimdjson-24.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7d8c66262e6905a7c05c9663f5ef1176db292c4cf3091509ece2f4fee0218573
MD5 966c312b8d9bc392f555308b9283ff3c
BLAKE2b-256 c60d4edd4412a1dfe78d1401c31988260034141381f06a9f2f3bb9b95a1b6ffc

See more details on using hashes here.

File details

Details for the file cysimdjson-24.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4f1bca0c3a1b4be798266565b951fad85d38c386605af63928cda02ac28155cc
MD5 a47ad447f82609a5056f50dfe86892c2
BLAKE2b-256 991693c42942a0d9d95a71550158ba2dc36dcc76bc363110232f783570def74b

See more details on using hashes here.

File details

Details for the file cysimdjson-24.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 df0077c145c487fc9675d0d928dc97f4a17515f981c47a0a78b7397414014fea
MD5 b78c36c98ba33b997bd078f33cf1fbdb
BLAKE2b-256 add59f3ac6725d0ab1cc15e20c21b9e01ed0a5d8d4858d21bc672cf763bdc30a

See more details on using hashes here.

File details

Details for the file cysimdjson-24.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4b0d3ce13f382af581d43c4293950c1396f1c7bf5e3b79548eeb8dc86cf1f6eb
MD5 987f4215214f976810e1a05fca1c101a
BLAKE2b-256 8a49759271cc212b1847eb6560c2f38016f9c28b8974f03f3668b842925babc4

See more details on using hashes here.

File details

Details for the file cysimdjson-24.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46b77845594c3c27cfa68b57ffd6809f799fdb2c34aa8a1d5ab15886ee755bc9
MD5 56712991b9c93adedbff6e5c11fd22bf
BLAKE2b-256 d93277b186068982e452b8119e31af735cefc7738c9991c75a1567bc2195c0f6

See more details on using hashes here.

File details

Details for the file cysimdjson-24.12-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-24.12-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 66ecd2504ddc699e7075f1057a151e9559962e8b43c6cd531cd3cdc18618c766
MD5 ef036b665b16fd9cd903c4217feff60e
BLAKE2b-256 7737ec515778d03c92e580bfdd9d8d0731eb816306ed45adc81820051509428b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page