No project description provided
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
Because json-stream
currently has no mechanism to provide a custom tokenizer
(which I would prefer), this package provides its own wrappres around
json_stream's load
and visit
functions that monkeypatch it in before
running them:
from io import StringIO
import json_stream_rs_tokenizer import load
# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
The patching is undone when the function returns.
Due to patching being a global state mutation, using json-stream-rs-tokenizer
in this way is generally not thread-safe. As an alternative, you can patch it
in manually using json_stream_rs_tokenizer.patch()
, which should be safe if
you do it before you spawn any threads, and then just call the original (but
now patched) json_stream.load
and json_stream.visit
functions.
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fb80f730e4c146efa8a30f3492659ff876fcb7eff8869ac307faebb2db4abc3 |
|
MD5 | 8843bcf9d36ed6b98d5796e784ef4ca1 |
|
BLAKE2b-256 | c964e128caf108e674f0337bb5cb6ec964c10f16a70c393529f79388e4b2b035 |
Hashes for json_stream_rs_tokenizer-0.1.4-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3ceee2fdc25970e6db608a3e6dc0dcf51a274a033b9e4aae0ba57fb4d94d255 |
|
MD5 | 06085c846f507b12316a6c71bed7cd0a |
|
BLAKE2b-256 | 8426c0c8162bfe28f80c841fb12726a9ea130bc8fed477a2c9bd9710c5000f1f |
Hashes for json_stream_rs_tokenizer-0.1.4-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec57129130806d48bff9c1f160fa7f11955d3b18286810bc841ce86fe3f790dc |
|
MD5 | 0b27e4bb2f7da1efb3ffd25589f357d8 |
|
BLAKE2b-256 | 30e1f5629e8414c81141713358bb69da7ca73b3545790d91fc45548c6d7334ac |
Hashes for json_stream_rs_tokenizer-0.1.4-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68e705ea168de9b1df29afea930fc951ec1ec2a55a26250307b86d02981e3d46 |
|
MD5 | 3ce6e81a144dbfb28379c7a35172ddfd |
|
BLAKE2b-256 | c7461793f21ef81ace386a51a03f14152285d9fcc57acc2b46dace071b6d821b |
Hashes for json_stream_rs_tokenizer-0.1.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5b15fa3bce5c186fa2f30cbb77f3b3e5f6dc8534b3a5e2331f875414545568f |
|
MD5 | 503da85e699c8047e23f02994ade5eb4 |
|
BLAKE2b-256 | 1636c0323777e5b20459c1e8475c9cd24bc079e2138a9fe208462002a69372f9 |
Hashes for json_stream_rs_tokenizer-0.1.4-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45681345d982c911d8d06e15ed782ff398f2a38f891b92abe7f93549bf2f0df2 |
|
MD5 | 75673bfe3b3f9e5eb35dad7f73f79572 |
|
BLAKE2b-256 | a951c0095e480aaf7d2eae42c214e0886507d78f6bd07f848c63d7252eda257f |
Hashes for json_stream_rs_tokenizer-0.1.4-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4461d04efe8f3bc3369b21ac1d72e2b584da6c44d44929bd163e7fd392614ee3 |
|
MD5 | 1ddf06c020e57e2e4e21846b2a2bbda4 |
|
BLAKE2b-256 | bf78a6db49473ad6c118af5eb51981e7fb4f8b12b5c47a97a1f0cd99ab725211 |
Hashes for json_stream_rs_tokenizer-0.1.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 852d021d62cd1ed376c66f5e29b812eb20f9e794c7ba10922e0b17505f5430df |
|
MD5 | af32a89b297bca4880ec18c35f0bde03 |
|
BLAKE2b-256 | 5a34f948f01411b97ce66e5a8165ad75f7a8a05199f166f5e21fcf05c1547699 |
Hashes for json_stream_rs_tokenizer-0.1.4-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 679cac6d3e65ae164f28dd17acfac7813c2d6434ba5b60485c4f31544e37a13a |
|
MD5 | bb2b6ada028f9642d45192bbe88930d8 |
|
BLAKE2b-256 | 276c9b6400ef14948e5b93348f5637ef3fea7157e1119fc6f3d5563cc694a46c |
Hashes for json_stream_rs_tokenizer-0.1.4-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d11243484658945b57d3c70e4dc8595a0e2f90228f05cabbc87a3d9aaa20100 |
|
MD5 | 47ed670eaf40f3b06767ca349a4b307f |
|
BLAKE2b-256 | 9157ced74304e455f476724fd8884646964dabd627b3d1e72952f6315b75777c |
Hashes for json_stream_rs_tokenizer-0.1.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a135c5224d8e09233544a1f52907b2c056a8dd8148fde12d2c326e0a1590a70a |
|
MD5 | 3bf60fcbf0bb83c25ae479416d46e606 |
|
BLAKE2b-256 | 5465729ec0cf9caccceb02f3db4dc19fb94056ee5a107e2e330e22d5b99d3a22 |
Hashes for json_stream_rs_tokenizer-0.1.4-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02b526fe64d6dd5e30e06eaeb399abb3c89f676970421c9c49b2a53d4efe6805 |
|
MD5 | e2f2f0c63416a8b26f2ad5560b60c721 |
|
BLAKE2b-256 | d6a53bbfe29bed32b87f2c8e024942a2f98f6c1ad97ac3c67c75139b733a1d18 |
Hashes for json_stream_rs_tokenizer-0.1.4-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67844317017d85e8693577c1217fab099a88ea67ab883d3e74a4edeb3cc37434 |
|
MD5 | a9db85933b772ac0bc51745b7a31ffe5 |
|
BLAKE2b-256 | 67b220647afc237e4d5fa17ce5490d464bcc432f4c0d4a7b577a95ebb0d0026b |
Hashes for json_stream_rs_tokenizer-0.1.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fb10605297ffedb1548ca7510d02ecf430ad64ebe3e24ef36055e882980da2c |
|
MD5 | d88b7e1c63daff8cbde6deb926a0b1dd |
|
BLAKE2b-256 | 46404a96510f8ab45638a5b0f1526c72a1cf8367738b2017df0e145f750ddd09 |
Hashes for json_stream_rs_tokenizer-0.1.4-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f2270587c5dc25a0a60aab563e9735c9c07160e74655f954b2355c9a96e1528 |
|
MD5 | 64fc4a8c20b859db6f709ddd3d0c86f0 |
|
BLAKE2b-256 | 94d44e82529577da0900cf92f1f773faeea2f16bc181b40f22bf25c82435f876 |