A faster tokenizer for the json-stream Python library
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
To use this package's RustTokenizer
, simply pass it as the tokenizer
argument to json-stream
's load
or visit
:
from io import StringIO
from json_stream import load
from json_stream_rs_tokenizer import RustTokenizer
json_buf = StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }')
# uses the Rust tokenizer to load JSON:
d = load(json_buf, tokenizer=RustTokenizer)
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
As a perhaps slightly more convenient alternative, the package also provides
wrappers around json_stream's load
and visit
functions that do it for you:
from json_stream_rs_tokenizer import load
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
# ...
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
Run it with --help
to see more information.
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.2.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 581ddb781370c25e1c74487b58a2b2c78fe5b2b1a853c5970b077099be8358af |
|
MD5 | c2e5a7ab956269c1830a95897557e290 |
|
BLAKE2b-256 | 807eaf9a39dc72a4ba04be552210cda7107bbfa8c634eed326b841119b8344e6 |
Hashes for json_stream_rs_tokenizer-0.2.5-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e225bc0432bc651285a2da82b1da3c39943505b5ba28a1c63afa14af91ab622 |
|
MD5 | d21f9edb32cab280f5cbb4877a9a49e1 |
|
BLAKE2b-256 | 7e73bc714658134f2e6cacb75c229abe1898d43f7e309665cfcc97dc5b23645e |
Hashes for json_stream_rs_tokenizer-0.2.5-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a39d66d587b7273c549aa653e013925d849c442097417e33652bb06efdb56247 |
|
MD5 | ee7259610ce01c660d96ed5cc646d635 |
|
BLAKE2b-256 | e4ff8bc5a2bc6d8add5e8a2b230e0cfe21ec0ab7e42fe940249b94b2da52dcec |
Hashes for json_stream_rs_tokenizer-0.2.5-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b812e1e7c6b42c563bd1a6aacf2399e8c759d706d8df10c738c6df90e35a1ab0 |
|
MD5 | 6b9eceeb8bc9d38683fe18e3fa84d9cd |
|
BLAKE2b-256 | ebf772a49949cdb57b6b6d8617a50027a79fd698079ca376ccc52a0af5eef7f5 |
Hashes for json_stream_rs_tokenizer-0.2.5-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51e3a6232a489924d48eacebf8276bde24ddfbe4e0040423d729c6cb035f00c0 |
|
MD5 | b77bfe241524c2b8bc868a2a065c3add |
|
BLAKE2b-256 | b9c8f2298ea43d9d35be8924f219c305c32f52ab4fc375ab1dd145ef68b3b73b |
Hashes for json_stream_rs_tokenizer-0.2.5-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f6b779577b8c53f787356c02a8b8c8f67ffccdcca2234ebeb37adf7f15ade1a |
|
MD5 | a7e55812e4499bcbf7dfdab3aa8c219b |
|
BLAKE2b-256 | 262d1606239a063ab0e0a265f16018cdc0db5de3e99e12d96d1d40197b9d3e85 |
Hashes for json_stream_rs_tokenizer-0.2.5-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9893c0c29ea91a42d380d6e488bc1e4a9e1a23f69959a00223e58aa47355a6a |
|
MD5 | 6fe7716474ba5fd5194d3dbbbeae101d |
|
BLAKE2b-256 | 915c67410a8b25e7728f8eb1658ee478af6bec4675e789ad2b53db21b1ef1c4f |
Hashes for json_stream_rs_tokenizer-0.2.5-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dc0aa9b19082d3beffc5d388e94c7124bab52b5f2bfc3eb24313c2bfc712503 |
|
MD5 | 31ab011aba213771255b57f243ce0d9c |
|
BLAKE2b-256 | deca7a914d13f328eb4716b9c0aa89144f565b5a7395498e32550fb598ec652f |
Hashes for json_stream_rs_tokenizer-0.2.5-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e3742a9a45cb6491fe5ba9921bb30e85631eee904278a93159f1d5fa932c4df |
|
MD5 | 38e89a1e1a8975fee26122e0d2f8a426 |
|
BLAKE2b-256 | d35eefc00bb0fb481e17b89f70c64faaac5babcb8d8843feaf1050322b8e73eb |
Hashes for json_stream_rs_tokenizer-0.2.5-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fdef102832be7ac17cabbf6ca3b56d8f45898dfa89f5b32cb5d4b5e90375602 |
|
MD5 | 0a93ec6ec6b30dc0e73d88822e35ea41 |
|
BLAKE2b-256 | 7f26459a5136fb6fa1266667d235e27557aacd3dfc79d5fbba905982b238e1a1 |
Hashes for json_stream_rs_tokenizer-0.2.5-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89941e30c70d5ce460d510166f056c280f6d9e7a6f7aad919b42a3ddaad4b62c |
|
MD5 | ab8ffa005b217353672f670d14c57c8b |
|
BLAKE2b-256 | ed7dc698bff58377ac09156e7aeb2c12f2eb23944daed481be5be3e48c72f1a7 |
Hashes for json_stream_rs_tokenizer-0.2.5-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ed39dda17300abd1587d51db458159a4ab3310329f8bb63495b9cab97326906 |
|
MD5 | d1edf30593c24f0116430fe7aeb02cea |
|
BLAKE2b-256 | c4fbc73d3080a8345e24db5cb75afb10606adf439dd9775dfe9d110298df99fe |
Hashes for json_stream_rs_tokenizer-0.2.5-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd6db7aaa2b7e924f87863ff397251b64704ba6bb7ee86ed438eeee36aa92031 |
|
MD5 | 633070e6130a774717ac3638a2413910 |
|
BLAKE2b-256 | 68885a5818133f5a2c1f6e010bc6c99dabac97d5816e95f5d3e8544393bdb7eb |
Hashes for json_stream_rs_tokenizer-0.2.5-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae2ee7afdca03e2c260c6cb87d9bd404c8b44ebd48e16b2adfe6fdb8f632d573 |
|
MD5 | 2c7f953205faf6774c772eb04aafa0f2 |
|
BLAKE2b-256 | 5a0b2646dbae25baef0cabccf29bc81c9f829f493733df6f3f89100ca4adbb06 |
Hashes for json_stream_rs_tokenizer-0.2.5-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 817aa3384c84726747480130d83d8f8d2158dfa9b4b58a93a5019e39ef109273 |
|
MD5 | 4facb6a9ea41cfbb04b0152a71821ca1 |
|
BLAKE2b-256 | 64a34a620d393b6a29cf0a30bdd1224c9abe5db2c4d6f8ec81143f1c530224f8 |