Skip to main content

Python bindings for smidjson, using libpy

Project description

libpy Simdjson

Status: Working Alpha

Python bindings for simdjson using libpy.

Requirements

  • OS: macOS>10.15, linux.
  • Compiler: gcc>=9, clang >= 10 (C++17 code)
  • Python: libpy>=0.2.3, numpy.

Usage

from pathlib import Path
import libpy_simdjson as json
doc = json.load(Path("twitter.json"))
# or json.load(b"twitter.json")
# or json.load("twitter.json")
# we also support `loads` for strings.

doc is an Object. Objects act as python dicts with special methods.

isinstance(doc, json.Object)
True

We can grab keys, get the length, grab items, and access specific keys:

len(doc)
2
doc.keys()
[b'statuses', b'search_metadata']
doc[b'search_metadata'].items()
[(b'completed_in', 0.087),
 (b'max_id', 505874924095815700),
 (b'max_id_str', b'505874924095815681'),
 (b'next_results',
  b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1'),
 (b'query', b'%E4%B8%80'),
 (b'refresh_url',
  b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1'),
 (b'count', 100),
 (b'since_id', 0),
 (b'since_id_str', b'0')]

If you every want an actual python dictionary, use as_dict:

doc[b'search_metadata'].as_dict()
{b'completed_in': 0.087,
 b'max_id': 505874924095815700,
 b'max_id_str': b'505874924095815681',
 b'next_results': b'?max_id=505874847260352512&q=%E4%B8%80&count=100&include_entities=1',
 b'query': b'%E4%B8%80',
 b'refresh_url': b'?since_id=505874924095815681&q=%E4%B8%80&include_entities=1',
 b'count': 100,
 b'since_id': 0,
 b'since_id_str': b'0'}

However, we also support JSON Pointer sytnax via at. This will be much faster if you know what you're looking for:

doc.at(b"statuses/50/created_at")
b'Sun Aug 31 00:29:04 +0000 2014'
doc.at(b"statuses/50/text").decode()
'RT @Ang_Angel73: 逢坂「くっ…僕の秘められし右目が…!」\n一同「……………。」'

Let's look at statuses

statuses = doc[b'statuses']

statuses is an Array. Arrays act like python lists with special methods.

Note: statuses and doc share a single parser instance. We cannot parse a new document while these objects are alive (though we can create new parsers via libpy_simdjson.Parser.load.

isinstance(statuses, json.Array)
True

Arrays support length, indexing, iteration:

len(statuses)
100
statuses[0][b'text'].decode()
'@aym0566x \n\n名前:前田あゆみ\n第一印象:なんか怖っ!\n今の印象:とりあえずキモい。噛み合わない\n好きなところ:ぶすでキモいとこ😋✨✨\n思い出:んーーー、ありすぎ😊❤️\nLINE交換できる?:あぁ……ごめん✋\nトプ画をみて:照れますがな😘✨\n一言:お前は一生もんのダチ💖'
for status in statuses:
    # this is a bad example but you get the picture
    if status[b'id'] % 2 == 0:
        print(status[b"text"].decode())
        break
else:
    print("no even ids?")
@aym0566x

名前:前田あゆみ
第一印象:なんか怖っ!
今の印象:とりあえずキモい。噛み合わない
好きなところ:ぶすでキモいとこ😋✨✨
思い出:んーーー、ありすぎ😊❤️
LINE交換できる?:あぁ……ごめん✋
トプ画をみて:照れますがな😘✨
一言:お前は一生もんのダチ💖

If you need to you can convert and Array to a list using as_list:

statuses.as_list()[1][b'metadata']
{b'result_type': b'recent', b'iso_language_code': b'ja'}

However, just like for Objects, we support JSON Pointers via at, which is much faster:

statuses.at(b"33/created_at")
b'Sun Aug 31 00:29:06 +0000 2014'

Benchmarks


---------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/canada.json': 6 tests ----------------------------------------------
Name (time in ms)                                       Min                Max               Mean            StdDev             Median                IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path0-libpy_simdjson-loads]      3.4478 (1.0)      10.1485 (1.0)       4.0615 (1.0)      0.6386 (1.0)       3.9595 (1.0)       0.3985 (1.0)           8;6  246.2156 (1.0)         149           1
test_benchmark_load[path0-orjson-loads]             14.7421 (4.28)     31.9980 (3.15)     21.1131 (5.20)     4.7609 (7.45)     21.8631 (5.52)      8.2455 (20.69)        23;0   47.3639 (0.19)         61           1
test_benchmark_load[path0-pysimdjson-loads]         15.5617 (4.51)     30.0839 (2.96)     22.2207 (5.47)     4.3227 (6.77)     23.6153 (5.96)      8.4906 (21.31)        12;0   45.0031 (0.18)         30           1
test_benchmark_load[path0-ujson-loads]              20.0784 (5.82)     37.2904 (3.67)     27.4904 (6.77)     4.6357 (7.26)     27.7301 (7.00)      8.1542 (20.46)         9;0   36.3763 (0.15)         26           1
test_benchmark_load[path0-rapidjson-loads]          44.7989 (12.99)    69.9204 (6.89)     53.8819 (13.27)    6.2806 (9.83)     54.5078 (13.77)    10.5220 (26.40)         6;0   18.5591 (0.08)         20           1
test_benchmark_load[path0-python_json-loads]        45.6048 (13.23)    58.9150 (5.81)     52.6407 (12.96)    4.2356 (6.63)     53.2421 (13.45)     7.6745 (19.26)         9;0   18.9967 (0.08)         21           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


------------------------------------------------------ benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/citm_catalog.json': 6 tests -------------------------------------------------------
Name (time in us)                                           Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path3-libpy_simdjson-loads]        973.0290 (1.0)       1,696.1500 (1.0)       1,106.7939 (1.0)         70.3023 (1.0)       1,096.5330 (1.0)         55.0015 (1.0)        107;65  903.5106 (1.0)         496           1
test_benchmark_load[path3-orjson-loads]              6,271.9950 (6.45)     18,752.0820 (11.06)     9,199.1053 (8.31)     3,332.8687 (47.41)     7,502.8330 (6.84)     3,940.9760 (71.65)        32;1  108.7062 (0.12)        128           1
test_benchmark_load[path3-pysimdjson-loads]          7,448.6360 (7.66)     21,308.7680 (12.56)    10,668.5839 (9.64)     3,595.1711 (51.14)     8,919.9800 (8.13)     1,307.4410 (23.77)       24;24   93.7332 (0.10)        102           1
test_benchmark_load[path3-ujson-loads]               7,774.9390 (7.99)     17,898.5500 (10.55)    10,364.6843 (9.36)     3,222.6374 (45.84)     8,751.2690 (7.98)     1,562.5480 (28.41)       26;26   96.4815 (0.11)        115           1
test_benchmark_load[path3-python_json-loads]        11,643.7470 (11.97)    23,959.7150 (14.13)    15,714.9961 (14.20)    3,806.9531 (54.15)    13,973.4170 (12.74)    6,292.6375 (114.41)       12;0   63.6335 (0.07)         41           1
test_benchmark_load[path3-rapidjson-loads]          13,983.3210 (14.37)    27,216.4270 (16.05)    17,630.6505 (15.93)    4,016.1918 (57.13)    15,564.2690 (14.19)    2,136.0153 (38.84)       15;15   56.7194 (0.06)         65           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/github_events.json': 6 tests ------------------------------------------------
Name (time in us)                                        Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path2-libpy_simdjson-loads]      31.8010 (1.0)      5,766.8830 (6.11)      37.5110 (1.0)       59.9135 (1.24)      37.0010 (1.0)       0.2000 (1.0)        9;3552       26.6588 (1.0)        9200           1
test_benchmark_load[path2-orjson-loads]             229.6080 (7.22)     4,736.2550 (5.02)     266.4467 (7.10)      94.5404 (1.96)     266.1090 (7.19)     40.8512 (204.26)      56;75        3.7531 (0.14)       3243           1
test_benchmark_load[path2-pysimdjson-loads]         291.1090 (9.15)     1,112.7370 (1.18)     340.7878 (9.09)      48.2980 (1.0)      336.6110 (9.10)     33.8510 (169.25)     214;48        2.9344 (0.11)       2187           1
test_benchmark_load[path2-ujson-loads]              300.1100 (9.44)     4,311.1400 (4.57)     342.2005 (9.12)      93.3709 (1.93)     346.5110 (9.36)     50.4020 (252.01)      26;36        2.9223 (0.11)       2258           1
test_benchmark_load[path2-rapidjson-loads]          379.0120 (11.92)    4,312.8390 (4.57)     518.6963 (13.83)    117.7450 (2.44)     507.6160 (13.72)    51.0268 (255.13)      37;40        1.9279 (0.07)       1717           1
test_benchmark_load[path2-python_json-loads]        382.2120 (12.02)      943.6300 (1.0)      439.8152 (11.72)     50.1689 (1.04)     443.7140 (11.99)    82.9020 (414.51)     665;18        2.2737 (0.09)       1894           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/mesh.json': 6 tests --------------------------------------------------------
Name (time in us)                                          Min                    Max                  Mean                StdDev                Median                 IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path4-libpy_simdjson-loads]       993.7280 (1.0)       2,153.3610 (1.0)      1,113.0914 (1.0)        125.6128 (1.0)      1,122.9820 (1.0)      147.0050 (1.0)         64;16  898.3988 (1.0)         898           1
test_benchmark_load[path4-pysimdjson-loads]         3,019.2900 (3.04)     13,713.0090 (6.37)     3,958.4115 (3.56)     1,763.1884 (14.04)    3,619.4070 (3.22)     300.4090 (2.04)        10;14  252.6266 (0.28)        226           1
test_benchmark_load[path4-orjson-loads]             3,075.6900 (3.10)     12,985.8830 (6.03)     4,371.5742 (3.93)     1,528.5850 (12.17)    4,067.1200 (3.62)     444.3125 (3.02)        10;14  228.7506 (0.25)        240           1
test_benchmark_load[path4-ujson-loads]              3,947.6150 (3.97)     13,696.0010 (6.36)     4,954.1335 (4.45)     1,521.1764 (12.11)    4,690.3375 (4.18)     390.0120 (2.65)          8;9  201.8516 (0.22)        218           1
test_benchmark_load[path4-python_json-loads]        7,593.0170 (7.64)     19,002.5420 (8.82)     9,068.5910 (8.15)     1,944.1363 (15.48)    8,763.6505 (7.80)     649.6190 (4.42)          5;5  110.2707 (0.12)        122           1
test_benchmark_load[path4-rapidjson-loads]          8,291.5380 (8.34)     19,017.8470 (8.83)     9,628.5255 (8.65)     1,797.5745 (14.31)    9,276.3670 (8.26)     872.3250 (5.93)          4;4  103.8581 (0.12)        102           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------- benchmark 'Load /home/runner/work/libpy_simdjson/libpy_simdjson/libpy_simdjson/tests/jsonexamples/twitter.json': 6 tests -------------------------------------------------------
Name (time in us)                                          Min                    Max                  Mean                StdDev                Median                 IQR            Outliers         OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_load[path1-libpy_simdjson-loads]       374.2130 (1.0)      10,169.1400 (1.0)        445.6502 (1.0)        237.7491 (1.0)        443.3150 (1.0)       66.3020 (1.0)         19;29  2,243.9125 (1.0)        1790           1
test_benchmark_load[path1-orjson-loads]             2,788.1970 (7.45)     11,687.4110 (1.15)     3,351.3276 (7.52)     1,117.1151 (4.70)     3,198.9625 (7.22)     351.0120 (5.29)        10;12    298.3892 (0.13)        294           1
test_benchmark_load[path1-ujson-loads]              3,312.1150 (8.85)     12,571.4370 (1.24)     3,973.3347 (8.92)     1,221.4127 (5.14)     3,805.8815 (8.59)     447.3170 (6.75)          7;9    251.6778 (0.11)        258           1
test_benchmark_load[path1-pysimdjson-loads]         3,586.0280 (9.58)     18,704.8590 (1.84)     4,553.9661 (10.22)    1,772.5065 (7.46)     4,182.3480 (9.43)     331.1612 (4.99)         7;17    219.5888 (0.10)        169           1
test_benchmark_load[path1-python_json-loads]        4,573.6530 (12.22)    13,900.1650 (1.37)     5,396.5765 (12.11)    1,236.4753 (5.20)     5,222.7750 (11.78)    554.0430 (8.36)          6;7    185.3027 (0.08)        189           1
test_benchmark_load[path1-rapidjson-loads]          5,447.2870 (14.56)    16,226.5570 (1.60)     6,506.3766 (14.60)    1,495.7694 (6.29)     6,322.1140 (14.26)    544.9407 (8.22)          6;7    153.6954 (0.07)        165           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================== 71 passed, 1 xfailed, 1 warning in 29.65s ===================

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libpy_simdjson-0.1.0.tar.gz (488.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page