No project description provided
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
Because json-stream
currently has no mechanism to provide a custom tokenizer
(which I would prefer), this package provides its own wrappres around
json_stream's load
and visit
functions that monkeypatch it in before
running them:
from io import StringIO
from json_stream_rs_tokenizer import load
# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
The patching is undone when the function returns.
Due to patching being a global state mutation, using json-stream-rs-tokenizer
in this way is generally not thread-safe. As an alternative, you can patch it
in manually using json_stream_rs_tokenizer.patch()
, which should be safe if
you do it before you spawn any threads, and then just call the original (but
now patched) json_stream.load
and json_stream.visit
functions.
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.1.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12101c686e21fbdb6491123008b7437f3b8a271d48646b50e7cd2d1b6a907e69 |
|
MD5 | 1bb543a335d7b49050c13a06f8b996da |
|
BLAKE2b-256 | 5e8b12bb534c21e4b28842293444bd8ecb9e46c217b79ebda845d32b886fe7bb |
Hashes for json_stream_rs_tokenizer-0.1.6-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7961fb4d79ebe5183f067c05ccd28b0de4d449f9c2f7428b0d893aea3593f8b |
|
MD5 | 792f858c7bb1561dae47dc74509266ba |
|
BLAKE2b-256 | a277ff9a165c0b3a149624736a889fb9eee2723a3be639a3858d1e42a292f9a4 |
Hashes for json_stream_rs_tokenizer-0.1.6-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11db410bba134c89db8f76377f4c21f030a806a199ae0c45894df53f08182457 |
|
MD5 | eee4e2e897d307b92df8036e95669875 |
|
BLAKE2b-256 | 2343954c3cfbbf9459007d3093a321190293c9f0354985d3192a736dd154b9f3 |
Hashes for json_stream_rs_tokenizer-0.1.6-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b18eb956973aeedc2eec7cd199b916cc37fdeb907c3f83106148c3aee8bba4f0 |
|
MD5 | 51686a3a241c00e5af294ad270a09dfd |
|
BLAKE2b-256 | 9a944bcca0cdee6f8662b401500b70150818c052e77aad6e15283964f67072dd |
Hashes for json_stream_rs_tokenizer-0.1.6-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75daa874fd159f39d246c2bbfe3009ab85464e5dd2cc482bc1dd5d8c076977cf |
|
MD5 | 3d258d06e088bbd4051860ba7a557bf4 |
|
BLAKE2b-256 | f7d832aaff49ee6d7ed8cb7d51f82b0495b4aace4ce45036a081f445b41bf5ac |
Hashes for json_stream_rs_tokenizer-0.1.6-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbf5447bc624ba5e550c7f47431a83d965102725cf9f6af165ef8b770a96a0d0 |
|
MD5 | fe08a2424e1a88039c301c364872642c |
|
BLAKE2b-256 | e2bd8a0a8fe3c97e339571e2e948f17058116250b6e714d60199615a4de5ccce |
Hashes for json_stream_rs_tokenizer-0.1.6-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cd6db2f3cf606673901c18a583935cf40f556e2b25990e5f347453f77814f71 |
|
MD5 | 2b4624fb76568462831f937a14f5ce37 |
|
BLAKE2b-256 | 3fb665e1196d0039dec93e643735c06a96891dd3de4a002968ca789803b10a58 |
Hashes for json_stream_rs_tokenizer-0.1.6-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 113a8b45e6d737509b5a141e3b81bd9c4f5dd859149f49954659ce761a15602f |
|
MD5 | 2c750efdc0f590774e67e16316d262a8 |
|
BLAKE2b-256 | 752383569783d07e673a55af2340ca4a086f548c0ed4a4bae5bb32a0411fbc9a |
Hashes for json_stream_rs_tokenizer-0.1.6-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1406f3bd20d22902ec439addfc07b7b9e4db02031bd49926e3bcdf88dc0cb82 |
|
MD5 | 2311d76d79d36df045027f55d2084db5 |
|
BLAKE2b-256 | d55e58328ff5308ada595ef506f35e215237c478d4c2b6f5007161bbed00f2f1 |
Hashes for json_stream_rs_tokenizer-0.1.6-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8096038da75022f91334f1769931b24ffdf95f1383dc5306340dc84c5e712628 |
|
MD5 | c7f57d6a4aba1fa2cad00e715bca4c23 |
|
BLAKE2b-256 | 2a2bcbe2e57545092c43f35f8cdcba9b3681d3f27d3f251826d02e30dce8b45d |
Hashes for json_stream_rs_tokenizer-0.1.6-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc5375b4b33b6b76fa574f88aa28d392892935ce68e2aa32b7c45714c67cf370 |
|
MD5 | 10770c15d9cf6d9643c8fcbd39805f57 |
|
BLAKE2b-256 | 914c74fe349ad089a6e6533bff36d7d8aec28a4339a4c16b99fe9e1ee0bff13e |
Hashes for json_stream_rs_tokenizer-0.1.6-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 403e2e676ca51c0f3d6f0c53b410f6b3035b6d186700056e362ee97f30a3e79b |
|
MD5 | d61d812cfcd830f0a91f304d9167c54f |
|
BLAKE2b-256 | 969bacce03270423d465e538056f9237ce1c2a1ec3de663ebbcba99317da6f17 |
Hashes for json_stream_rs_tokenizer-0.1.6-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed223ecbe8fd7ef9aacbb4b86f5419af08484d481df2963385cd04560878de16 |
|
MD5 | 14276eb3d348122a83a75c7a5b9bebdf |
|
BLAKE2b-256 | fc42b35a70288d91ec60393404242b7568f3e55309c746e51fc651afe1304c64 |
Hashes for json_stream_rs_tokenizer-0.1.6-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55dd60d62a2b61964f576f0109374b4960730d3aee95b7790c943f24da728741 |
|
MD5 | 31d9aea5e32375e0eef70cfd9bad0f8a |
|
BLAKE2b-256 | 7b70f011b67067d9eace32844af1127807b8cc7a9e577fb19d647f56311f4652 |
Hashes for json_stream_rs_tokenizer-0.1.6-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e4e1613cf5665e45c85e93a138de7be9ff074da390d1431f1cccd0b1ddc9a99 |
|
MD5 | ffa0b5abb72d3381e6ff0296f65269e1 |
|
BLAKE2b-256 | fda41d98af1f13cfa0de09b122f425c786ee8753d08bc3b35684a86f6d6d9ca4 |