A faster tokenizer for the json-stream Python library
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
To use this package's RustTokenizer
, simply pass it as the tokenizer
argument to json-stream
's load
or visit
:
from io import StringIO
from json_stream import load
from json_stream_rs_tokenizer import RustTokenizer
json_buf = StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }')
# uses the Rust tokenizer to load JSON:
d = load(json_buf, tokenizer=RustTokenizer)
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
As a perhaps slightly more convenient alternative, the package also provides
wrappers around json_stream's load
and visit
functions that do it for you:
from json_stream_rs_tokenizer import load
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
# ...
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.2.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 531a5bf707a84c1fe83b545d6d73653f41a76e3863949e863d237ee70844261b |
|
MD5 | ad650b1af687a7516fd016dc12e49f05 |
|
BLAKE2b-256 | 681809ca25f25ea3cbc0771132a77c4cafc0e2243332cd43e9ced39ef8b35a35 |
Hashes for json_stream_rs_tokenizer-0.2.1-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4bb1a86614ecb42aa8a333a4bcb5ec6c6319297e793746806f6a2e0875da2745 |
|
MD5 | 3b9093b4c0671ab1c5190eb4d168f696 |
|
BLAKE2b-256 | e2b621e043413de368e59cf452b5378a2a78f42138db6d3228d5e224a42bb299 |
Hashes for json_stream_rs_tokenizer-0.2.1-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50a00b6afa95cf3917fbccf2449433b7a8891bc1eeeaa45c2b16ef8bf985e9b3 |
|
MD5 | c45b642e55c27d89152ce29028a8a1fd |
|
BLAKE2b-256 | 93e3d7202888252efa7e22b8dd534619487daecc0d65a9ca9b73ca783a5d6878 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c6a0c2707c044e7fb55989189ac6b57b662097265a2c70a5c588ecaf855ce00 |
|
MD5 | 365e0abdbd44403164c558a4abafe1f1 |
|
BLAKE2b-256 | 718eef55992c5ded5615bca416708aa1deffdd584b549f205acb32aa57aff0f8 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f40026562a8e5c2679b9fdeee7d81603d55346fe2d1117c0acff764b6fd6aa7e |
|
MD5 | 806f0936e42e83bb19784a7b0de3b5f6 |
|
BLAKE2b-256 | b5141acd21f88636d5eecc458c1081f5ad6af6a34d63900669bc226820162027 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d02bbbef504e33d622c5bec629ccd590ea8518680c56dcfa99df4849c197d0f |
|
MD5 | 8a0b48de624a6bc3ab30f702867ccd1f |
|
BLAKE2b-256 | 983da88205ac6403788e4d34140631b796ce6062ab4dbbb5fd9bd46eaed61ba3 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 486cb32ff08ecac882b2756db2585a9113e08e32e6ff27f28940beaec99b8b21 |
|
MD5 | cf63374dd44ef72df74eda19766f1e8e |
|
BLAKE2b-256 | ed03484a41456569698aa30a56cd92e85be32327377532573f143db85ec0ff3a |
Hashes for json_stream_rs_tokenizer-0.2.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 778d030ad2f97204c30fcaafacccb263250efc58f16f7cfd9ed3b91049835023 |
|
MD5 | 8d4e9c6c2797b3b46b1c43644e80c8d8 |
|
BLAKE2b-256 | 4edcc94269b91cc585eef37124683db463d05e450380d41906379ed3e0e9155d |
Hashes for json_stream_rs_tokenizer-0.2.1-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cc1baed734fde1bfebeedf094275c8732c548fe5987daa78491de79880b455b |
|
MD5 | 67e0a44da9cfa0367f4ddc5da261e363 |
|
BLAKE2b-256 | e131cba5c874668f8eea78c50d108bcfb239f1db42353802ceb43c6cb5d19783 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a5dfb0c27d9330bb13a02ab1698f40ffe87c2c449c09da1e9e536382b98c08a |
|
MD5 | 95f68a473475232e1b01d7579d798ce1 |
|
BLAKE2b-256 | 7728a1f6414ca6a9008da0650480904126ffb04e1b6eeb7865ace318ca796b69 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2720f8d19f15794f58e046015ab1901f0ca498dd954bbd69751f052d57d6af81 |
|
MD5 | 473ad8a82639b902862d67ec11f6c85b |
|
BLAKE2b-256 | fa7f0ea1850a46cc2df9c98bc3320b48c1d6a33547fb67289d94a0d08f51cc17 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67a8c76c7f7e74d26f37c6b5fb9616bae4cbd1f338cfef4bf996b34a25a5f248 |
|
MD5 | d896bcfd5c82d703c65976c7fb083cb8 |
|
BLAKE2b-256 | 7ab0990c9803cfc4f048bc4311d547f61a2a5042b87a94b5359956afabf379d3 |
Hashes for json_stream_rs_tokenizer-0.2.1-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3dc11de32a3d517679c0592c8e0d62a2fe3e9f37ce73bb041634ec27595b47e |
|
MD5 | 0f3543d62b951c5e6a4df6d2b07ff992 |
|
BLAKE2b-256 | 4760de6ee95d2c45f908c771a3b50699b261cfc275f9f259f9f0c2ed5d323b1a |
Hashes for json_stream_rs_tokenizer-0.2.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f06fdbb05245c5b72f958183ed146c06090ee64333339d5c51b3b850724b555 |
|
MD5 | c02e025d79a10ca2d3df827c083f62fc |
|
BLAKE2b-256 | 3886e3e59bc5f443053549a00427a2cde0c86aab200d59cad1fbfe9320cb821e |
Hashes for json_stream_rs_tokenizer-0.2.1-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70d40c3980d8e91cd50594761aa9783d80b2c8ebdb40087738f43027b20bd7cf |
|
MD5 | 796f5ba3a04b5e9e1f55bc56cb77260d |
|
BLAKE2b-256 | 6c2fb78298cafb73ea82fb0ba3d9e2acbe54fae0ef2dc95367ff02a9917cd63e |