A faster tokenizer for the json-stream Python library
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
To use this package's RustTokenizer
, simply pass it as the tokenizer
argument to json-stream
's load
or visit
:
from io import StringIO
from json_stream import load
from json_stream_rs_tokenizer import RustTokenizer
json_buf = StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }')
# uses the Rust tokenizer to load JSON:
d = load(json_buf, tokenizer=RustTokenizer)
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
As a perhaps slightly more convenient alternative, the package also provides
wrappers around json_stream's load
and visit
functions that do it for you:
from json_stream_rs_tokenizer import load
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
# ...
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.2.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e11b0e7d00103243c94871aa1def6a3e495b34f979c0c49552407e6fcb839cb |
|
MD5 | b9f40fe43b575319d9ec54ff6c61b4d9 |
|
BLAKE2b-256 | cee9bfd5bc036b060807998e3abb368d1a306cbaf3cc34ceb51258e13f53453a |
Hashes for json_stream_rs_tokenizer-0.2.0-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be52d3462217d20a4f6f4f096456aa8abdd43f7d0f0b904002a62a17151f90df |
|
MD5 | 6a19f8a9fbaa2880a8040e7c8c8f85f0 |
|
BLAKE2b-256 | 19a62a059393e9a9e25bacdf5fc2dfea047a60fbf3f3fac378080c8605c540e5 |
Hashes for json_stream_rs_tokenizer-0.2.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af6f491cb4073796d24c331bb8cc890b76cfe9e61e95b4a1ad82641c63f4e763 |
|
MD5 | fc97b3ccfe8fcbe994d5e585842b0248 |
|
BLAKE2b-256 | 1caada614622f66dbb78e24554abf666f357583c869ccdd4ea84b0bfd90728ea |
Hashes for json_stream_rs_tokenizer-0.2.0-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15a745a8ade623f03335e4f0230b314732798e1ca62e7c1929817ca1c8083eae |
|
MD5 | 642a2d1ba2d94e855c8595b7d0e4084d |
|
BLAKE2b-256 | c29b362c6871a0d46a329e67758446360b977de17eb1f53c62b7e0f2b8abf0c9 |
Hashes for json_stream_rs_tokenizer-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c32d6055c594c9f4dd4cf35c158fc133234c9d24755c62b6ddf3293542b3c94 |
|
MD5 | 0a0540891ad072fe2443807e5ac67189 |
|
BLAKE2b-256 | 35c6f1dc961c9e140bcc2f4abf098d2ca43e0f9321015c3ace911ba666426fdd |
Hashes for json_stream_rs_tokenizer-0.2.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4032a0c4826907c859e1a8af27f3a043bc7cfec1ecf2575a304ba52d0d1faa7d |
|
MD5 | e2c00944ac0708f5255a1ef7361ca159 |
|
BLAKE2b-256 | e0f20b90b70aebca05af78638b4ef59280ec84286e11fde9676985e0d13419ed |
Hashes for json_stream_rs_tokenizer-0.2.0-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 092634394c9491165cbaba2b8d0f505b77919b56fa583d86fb7e7fa597217c72 |
|
MD5 | 0e9069e4b95da9023cd3f2fc825bd409 |
|
BLAKE2b-256 | f86690501765e76ae4341c2bd0fc5adc5bbeac8a942df751ec93261f9d66eee4 |
Hashes for json_stream_rs_tokenizer-0.2.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 469b68df4d9ac11d48af65bdabdf4dde1eea6255bb967f9e459dc2ac6ae17a05 |
|
MD5 | cb47bbd94fc459dc231309c254cea8b9 |
|
BLAKE2b-256 | fbcc954c682ee5ce82d27c5b13c79fb6f58573bfe6d0dc1bb6eecb7ecda0a2db |
Hashes for json_stream_rs_tokenizer-0.2.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 806907bdeb5c949282d5bada9807339983c56b7f01bcd0bb4f4e71a266e7c8b9 |
|
MD5 | 2a8c329686ca387ed36dfdff08dd3be9 |
|
BLAKE2b-256 | 36c7fb97c9cef57bb8fbdd91a3d875f7418efad4c0a5dc2e5462b5de420351ec |
Hashes for json_stream_rs_tokenizer-0.2.0-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24acb9dc4a960993361eadbc6b97533f3e3120b251fa746cf442be370f78120e |
|
MD5 | 7ab59c08dccc02c81ff1a60978f3829c |
|
BLAKE2b-256 | 78a8eea9894a573109c79dd04e6f1fb9e26b98050a7961eda9d85774ac7c279f |
Hashes for json_stream_rs_tokenizer-0.2.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ef2d344c0eecfe22905b210eaa5a5dce1c180c8d6a2bb768efdc68ec690d0c6 |
|
MD5 | 001e00084f4251a41fe1f066ab782a5e |
|
BLAKE2b-256 | b631ea1d7cbabba63559297611db171615c9fafd453755717e8a7391eb5299d5 |
Hashes for json_stream_rs_tokenizer-0.2.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 264cf98c3bb48464c786cfa003fcc4dd01a150310fac81137918b3823e2b47af |
|
MD5 | 5ff5f5f9354bfd992fa21657284408f5 |
|
BLAKE2b-256 | 27abeeea778ffd6ccf8d7c9de846d62f8c3e990f50dc45b292d048228d01cf47 |
Hashes for json_stream_rs_tokenizer-0.2.0-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d95553b0bbfa2d387b3fb2f6bcf89f443e913d5d3a1fabda7e3aca9fb84a325f |
|
MD5 | dfeb4130b10b62b72d020e96050431b1 |
|
BLAKE2b-256 | 38fea851e4947b0ee14aacba45be4b2633c807b190bc93a243ec4157dd3fb839 |
Hashes for json_stream_rs_tokenizer-0.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83335d0ea31b428af345e11b62ca240c8dedd139c5ae8e8a15e702b0eb5eca43 |
|
MD5 | 8b7a8fd8354e1cb1b8bb8806619aca43 |
|
BLAKE2b-256 | 0c60b59ffe3d4cf3cef449cbbf6106231f3dd3066ba049391e53f748585baad1 |
Hashes for json_stream_rs_tokenizer-0.2.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ba1e7c4658adc0d408cceafff32d8ba468c79efe239b39e59b6e0f4a6d9d02f |
|
MD5 | 21ef301ff2d7719be3cffd3ed9d6f6d6 |
|
BLAKE2b-256 | b0fad01c77ff488be01d05c9f246a30e07d903bb2c2b2d4d29e7f2bafb8f2a15 |