A faster tokenizer for the json-stream Python library
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
To use this package's RustTokenizer
, simply pass it as the tokenizer
argument to json-stream
's load
or visit
:
from io import StringIO
from json_stream import load
from json_stream_rs_tokenizer import RustTokenizer
json_buf = StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }')
# uses the Rust tokenizer to load JSON:
d = load(json_buf, tokenizer=RustTokenizer)
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
As a perhaps slightly more convenient alternative, the package also provides
wrappers around json_stream's load
and visit
functions that do it for you:
from json_stream_rs_tokenizer import load
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
# ...
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
Run it with --help
to see more information.
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.2.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 783e0dec6bccdd3bd1dadc8a6f0626823d9681da541ce0a402ebae438686fdbb |
|
MD5 | 9334a2cc2ccf32ebb8a447833de342ed |
|
BLAKE2b-256 | 3f7e535c4e87e5048bdaaa82b0d4aa34ae38f814001c6c263ad819f8bebf11d0 |
Hashes for json_stream_rs_tokenizer-0.2.6-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5cf72d329a15ed5d2183115cc13922af5adde0655de3b3ec3e91c7bd4729537 |
|
MD5 | 87e7e2f47c95e53f6202941a4433a0e8 |
|
BLAKE2b-256 | 9c0809b05ffc8cbe2020c33fa45999f5dbf336c94ceabdf8b185eb7b91de52da |
Hashes for json_stream_rs_tokenizer-0.2.6-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59c475b8f591f47b03ce600518424dba512ee827c11ca9a6c09d21a860f99be4 |
|
MD5 | 755f9b69b256562916ed62b4ebaecd18 |
|
BLAKE2b-256 | 0a104a4c4a3c1ce8b81beafd49ebec2e76f5ae4afba4943524d6dc335482a518 |
Hashes for json_stream_rs_tokenizer-0.2.6-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5e0d72dbbcc653a49f119cc16f6696ba02900625cf8a700794c173da529f22c |
|
MD5 | 37c9974fb69f29c923c51c3a23b94b04 |
|
BLAKE2b-256 | 4a9fe29c45099ceab739268196aa291c5edc721ccb81fa3dc9aa28312b83cd69 |
Hashes for json_stream_rs_tokenizer-0.2.6-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2b061cb9bc322638f7bf60a3bfb539ce45fd03b91cdd9e6235da71a1b719edc |
|
MD5 | b04dd31aa4cdbcb045a9ba2a47e9f4d0 |
|
BLAKE2b-256 | 911cb1acd52a20652f1434e2efd6f75cd635ee35798da650074a42eaf1710478 |
Hashes for json_stream_rs_tokenizer-0.2.6-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a6bc2827ba8863218786c68ceecd3c1b937813c08e3dfc1c002079da90aa300 |
|
MD5 | 5446892055daaac304231d8b5f8c2c41 |
|
BLAKE2b-256 | 8f529e9dcfca42df331a7a9d8e2eef16aab0019c9ae3689a233b96804f19a46c |
Hashes for json_stream_rs_tokenizer-0.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb2f5a4ce0b4ac8eb40814a36e345f5d11b5611b7a7cc88cc1e50d24e4b2badb |
|
MD5 | 2395c61bd7a0fd615bc40a0d040c75d9 |
|
BLAKE2b-256 | dbb2db6875079d8cb28c97d59ccfff2f1545573f8750b501c6a356412ee60312 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9b82a727eebb14044c66a3a5ed99efb48e007bdde081e1bd9491166944b61d3 |
|
MD5 | 391b986da84ac3017198256ca12e8034 |
|
BLAKE2b-256 | b5c22f2faa4b4141d65a027705ad2360e6404d54a1482d8a715ac191ca26c52a |
Hashes for json_stream_rs_tokenizer-0.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bda486d65b538969b0f7f1018157bb3bb1be419d7e37c3e90b3e79023277e28 |
|
MD5 | 7cdd8954d51e48dececf7198fb737545 |
|
BLAKE2b-256 | 7d69b5972434e7d1929aa290374173f75696d8aa20c292f447c9dd50d9364efb |
Hashes for json_stream_rs_tokenizer-0.2.6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35228f77f8fbe1eec2a15c2bfce6749c5096d314d86e3e39e710e65fd91279b2 |
|
MD5 | 2fa3fd41af98b57e9fcda9306290130e |
|
BLAKE2b-256 | 65cab5264f0bb1b2ca562accf3566a7d209a2556ae58cfc284349fbca83b5a7a |
Hashes for json_stream_rs_tokenizer-0.2.6-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fb78f65446408d34ed317f0ffaeec617c3c56f6d9c7770632f6accff2cf4c9e |
|
MD5 | 269b380f48fa03a2d103562aecba7c45 |
|
BLAKE2b-256 | b79139672d8168bbeeae6e0e85043c3c098a318daf1c445e156746c2bd782239 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2dbc6b01881b7abeb3bcc1e3922d6bd57a5cc8beb7baf474b723f1bd39fdc809 |
|
MD5 | abc7e06ef08e1b87181aa9caffe2232b |
|
BLAKE2b-256 | befdb1bbbd792b2f12c3ac6cfe41add983f68a49d4d6ab5288907ac800bf2c34 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17d6174bcb06d868cb63bda6d04287a01dbda0626ded4e246ea0b4884325518a |
|
MD5 | da079e11addf1ab2c7cda153214a4fa4 |
|
BLAKE2b-256 | 412c967f7e91844155c373018b0d3eef18448c0505db664f4dcb18d6ed1fdb04 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7d9ed29d1f8d8d4f10f0752f4a43d7e96458086abf56afcf56456433289445c |
|
MD5 | 6e360ff260622921e6a4234fec8ff3aa |
|
BLAKE2b-256 | 42851d58c4f919ff8eb6e06102d0382cec31f7a429e3097834182160d5be5f6f |
Hashes for json_stream_rs_tokenizer-0.2.6-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6eb3bfe95c7d916ffe262617ee4e37525deec4ed93f5320fc88f853d7f21b2c |
|
MD5 | 3fe07a47cc66a7bd7fbbf2f17b2a1365 |
|
BLAKE2b-256 | 7da924c73576b079c24d747110f1f4ff030f1cd2a2694dc53a94012245e35f7a |
Hashes for json_stream_rs_tokenizer-0.2.6-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb6f2340e609e5d1f3ccb2de35245affda7111920b5b00e19e652e6379fea207 |
|
MD5 | 226cec8de1f647b54f32e6733671f4cc |
|
BLAKE2b-256 | 100de2b3e0e32805d87e73d74e8a1fde5a846a175a15d2b708adf78ee90e9b81 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 029117de782fef1182a22223bd513b0d662721aa7d183afb5a9673e25cf0d359 |
|
MD5 | 7f7d8e91990636239098e346009d8ea5 |
|
BLAKE2b-256 | dafa96ae8708810ef921e0e10b1008113c846101a79ce3f2ae88573256f99f39 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc5c0f2d65205dba219677203972648eee9288bf675193185b1faf32a02931de |
|
MD5 | 2f49af9be1d612bed8efa231b008fa38 |
|
BLAKE2b-256 | b54a6c41568ec898546fbd0f8342f5a4887589024827a81af4d1b842ee99c465 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d3cbe310dba38eb565fbbdf10f7b1e66931bdc34bd851ecb1f0da6294b132be |
|
MD5 | a6c732b113a37e1549010ea62e6b6866 |
|
BLAKE2b-256 | a349363ad3ee785413c5b15e2e4b67b09f8ba2aeb4af5baa643efa613d3d6a90 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cffee84b90bac4b06853c3d275d037f92067854b299c7ce9011d25877198f635 |
|
MD5 | 9aa904daf062f139d1a8914c67ef374e |
|
BLAKE2b-256 | df7ffe350ef28568b722077d07b8b7f0be9aa5c76900570e54ae4d8cec922aae |
Hashes for json_stream_rs_tokenizer-0.2.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 502a391ae0b99e1f8c945a7d82a727dbf5b190dc84a3f41d244597e9a2cd78a0 |
|
MD5 | 0591f1e706f99ccdf5607a71a4155a0b |
|
BLAKE2b-256 | c697dfbde33631392204c8dc6268a8bf015beb845716cf2b2c5eda9b3bbfeb03 |
Hashes for json_stream_rs_tokenizer-0.2.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0db72187d53737078a816c4f1f5099aa0478e5ab5b6d48557f760fc1f583fe42 |
|
MD5 | ab58ceefb4f436fdd113445f24d4b899 |
|
BLAKE2b-256 | 10d2d91fa97c8da977ca1d4c02dd4d6cc676485c8932cdae41ea44b98c364edd |
Hashes for json_stream_rs_tokenizer-0.2.6-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6470a95a1bd4d3839d5404aaa37f640cc85a795cfc2f33440e495a53f3e99be6 |
|
MD5 | fddee7620f46b3cad8df645955cfdb5e |
|
BLAKE2b-256 | 7fa2f037923be2b71a75266e255088c1cd65513a46ac355b2d0564578040af82 |