Skip to main content

High-speed JSON parser

Project description

cysimdjson

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.
It is Python bindings for the simdjson using Cython.

Standard Python JSON parser (json.load() etc.) is relatively slow, and if you need to parse large JSON files or a large number of small JSON files, it may represent a significant bottleneck.

Whilst there are other fast Python JSON parsers, such as pysimdjson, libpy_simdjson or orjson, they don't reach the raw speed that is provided by the brilliant SIMDJSON project. SIMDJSON is C++ JSON parser based on SIMD instructions, reportedly the fastest JSON parser on the planet.

Python 3.11 Python 3.10
Python 3.9 Python 3.8 Python 3.7

Usage

import cysimdjson

json_bytes = b'''
{
  "foo": [1,2,[3]]
}
'''

parser = cysimdjson.JSONParser()
json_element = parser.parse(json_bytes)

# Access using JSON Pointer
print(json_element.at_pointer("/foo/2/0"))

Note: parser object can be reused for maximum performance.

Pythonic drop-in API

parser = cysimdjson.JSONParser()
json_parsed = parser.loads(json_bytes)

# Access using JSON Pointer
print(json_parsed.json_parsed['foo'])

The json_parsed is a read-only dictionary-like object, that provides an access to JSON data.

Trade-offs

The speed of cysimdjson is based on these assumptions:

  1. The output of the parser is read-only, you cannot modify it
  2. The output of the parser is not Python dictionary, but lazily evaluated dictionary-like object
  3. If you convert the parser output into a Python dictionary, you will lose the speed

If your design is not aligned with these assumptions, cysimdjson is not a good choice.

Documentation

JSONParser.parse(json_bytes)

Parse JSON json_bytes, represented as bytes.

JSONParser.parse_in_place(bytes)

Parse JSON json_bytes, represented as bytes, assuming that there is a padding expected by SIMDJSON. This is the fastest parsing variant.

JSONParser.parse_string(string)

Parse JSON json_bytes, represented as str (string).

JSONParser.load(path)

Installation

pip3 install cysimdjson

Project cysimdjson is distributed via PyPI: https://pypi.org/project/cysimdjson/ .

If you want to install cysimdjson from source, you need to install Cython first: pip3 install cython.

Performance

----------------------------------------------------------------
# 'jsonexamples/test.json' 2397 bytes
----------------------------------------------------------------
* cysimdjson parse          510291.81 EPS (  1.00)  1223.17 MB/s
* libpy_simdjson loads      374615.54 EPS (  1.36)   897.95 MB/s
* pysimdjson parse          362195.46 EPS (  1.41)   868.18 MB/s
* orjson loads              110615.70 EPS (  4.61)   265.15 MB/s
* python json loads          72096.80 EPS (  7.08)   172.82 MB/s
----------------------------------------------------------------

SIMDJSON: 543335.93 EPS, 1241.52 MB/s
----------------------------------------------------------------
# 'jsonexamples/twitter.json' 631515 bytes
----------------------------------------------------------------
* cysimdjson parse            2556.10 EPS (  1.00)  1614.22 MB/s
* libpy_simdjson loads        2444.53 EPS (  1.05)  1543.76 MB/s
* pysimdjson parse            2415.46 EPS (  1.06)  1525.40 MB/s
* orjson loads                 387.11 EPS (  6.60)   244.47 MB/s
* python json loads            278.63 EPS (  9.17)   175.96 MB/s
----------------------------------------------------------------

SIMDJSON: 2536.16 EPS,  1527.28 MB/s
----------------------------------------------------------------
# 'jsonexamples/canada.json' 2251051 bytes
----------------------------------------------------------------
* cysimdjson parse             284.67 EPS (  1.00)   640.81 MB/s
* pysimdjson parse             284.62 EPS (  1.00)   640.70 MB/s
* libpy_simdjson loads         277.13 EPS (  1.03)   623.84 MB/s
* orjson loads                  81.80 EPS (  3.48)   184.13 MB/s
* python json loads             22.52 EPS ( 12.64)    50.68 MB/s
----------------------------------------------------------------

SIMDJSON: 307.95 EPS, 661.08 MB/s
----------------------------------------------------------------
# 'jsonexamples/gsoc-2018.json' 3327831 bytes
----------------------------------------------------------------
* cysimdjson parse             775.61 EPS (  1.00)  2581.09 MB/s
* pysimdjson parse             743.67 EPS (  1.04)  2474.81 MB/s
* libpy_simdjson loads         654.15 EPS (  1.19)  2176.88 MB/s
* orjson loads                 166.67 EPS (  4.65)   554.66 MB/s
* python json loads            113.72 EPS (  6.82)   378.43 MB/s
----------------------------------------------------------------

SIMDJSON: 703.59 EPS, 2232.92 MB/s
----------------------------------------------------------------
# 'jsonexamples/verysmall.json' 7 bytes
----------------------------------------------------------------
* cysimdjson parse         3972376.53 EPS (  1.00)    27.81 MB/s
* orjson loads             3637369.63 EPS (  1.09)    25.46 MB/s
* libpy_simdjson loads     1774211.19 EPS (  2.24)    12.42 MB/s
* pysimdjson parse          977530.90 EPS (  4.06)     6.84 MB/s
* python json loads         527932.65 EPS (  7.52)     3.70 MB/s
----------------------------------------------------------------

SIMDJSON: 3799392.10 EPS

CPU: AMD EPYC 7452

More performance testing:

Tests are reproducible

pip3 install orjson
pip3 install pysimdjson
pip3 install libpy_simdjson
python3 setup.py build_ext --inplace
PYTHONPATH=. python3 ./perftest/test_benchmark.py

Manual build

python3 setup.py build_ext --inplace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cysimdjson-23.7.tar.gz (517.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cysimdjson-23.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cysimdjson-23.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cysimdjson-23.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

cysimdjson-23.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

cysimdjson-23.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

cysimdjson-23.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.17+ x86-64

File details

Details for the file cysimdjson-23.7.tar.gz.

File metadata

  • Download URL: cysimdjson-23.7.tar.gz
  • Upload date:
  • Size: 517.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for cysimdjson-23.7.tar.gz
Algorithm Hash digest
SHA256 34eea7865f1a56310236e2597b33fca4478439cf82a56c6116abe24cf8760145
MD5 c544858cc0c835d6348989d645e9b072
BLAKE2b-256 72c5a997f119ab359df8cd695438ee39a98390e34cdd228fea291062d0c45c79

See more details on using hashes here.

File details

Details for the file cysimdjson-23.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4aadbbb5c177cfe65f5f53bbbd5eda676069677b48dc61e1806c918bcef533f1
MD5 6ba91274bbbaeba4431ce16627926e07
BLAKE2b-256 96578b07445fd0a50ced8b6a4172abe88a8654692b6d3ac4ff7214208e4bfece

See more details on using hashes here.

File details

Details for the file cysimdjson-23.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 56bddbd5ee307ab68d715ba04ab28a259fd881f307c1abc69aa58b2e9e03dd52
MD5 db5aa2fb98c937007abfaf4c7b6868c5
BLAKE2b-256 57bc34ee47b29532c68c6f82b87e9a887c55435e80704ddd77dfb5f869b70dee

See more details on using hashes here.

File details

Details for the file cysimdjson-23.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 73eb1f5344fa34f1879635df823e81120d254be9a53088b0bd99e4089f84ed14
MD5 07a9743f678cc926261d754096ba4b9e
BLAKE2b-256 5ef991d2206729e34ef97f8b6a8fdd098ec46ad8ac1726b48d449e06086cc890

See more details on using hashes here.

File details

Details for the file cysimdjson-23.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dd6fa5c0dbb7a308a21df41f905db4c3dab2d05d37a0523116c7e2842e52ce0e
MD5 893f6985df370d4db9782c08976a69d7
BLAKE2b-256 684d4756e424a29ec01e9a96a5656fe552dd248302276a08f5d19a412c3a2d16

See more details on using hashes here.

File details

Details for the file cysimdjson-23.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9ec18f6ff59f10f78a56a90f817dd104e9566ddc607df735b732d7d63cf355e5
MD5 d63eaed80cb8c1831e39bc9355625351
BLAKE2b-256 4e820b7f7a92c81bbed06ef65df35c626b9138d3419ebdfde17d1f1b20c7a469

See more details on using hashes here.

File details

Details for the file cysimdjson-23.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cysimdjson-23.7-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7b485a61bbf7e719032868a115ee4742fa21a16685a74a761fba157d1072b22e
MD5 c6d88fa3d4fca513e8d98c1f74e893ed
BLAKE2b-256 c3b80c095c80193be90e409f74bfcf20d97884ad26e6cb3f5d48bf72d332eceb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page