No project description provided
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
Because json-stream
currently has no mechanism to provide a custom tokenizer
(which I would prefer), this package provides its own wrappres around
json_stream's load
and visit
functions that monkeypatch it in before
running them:
from io import StringIO
import json_stream_rs_tokenizer import load
# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
The patching is undone when the function returns.
Due to patching being a global state mutation, using json-stream-rs-tokenizer
in this way is generally not thread-safe. As an alternative, you can patch it
in manually using json_stream_rs_tokenizer.patch()
, which should be safe if
you do it before you spawn any threads, and then just call the original (but
now patched) json_stream.load
and json_stream.visit
functions.
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.1.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b9e4fdb833611c1d0b6d45511adc2dd78bf7fb06b7baa4dfba1bb3f86bc5e6f |
|
MD5 | 1dca502c98ae87f3c65b73407c3dfcbc |
|
BLAKE2b-256 | 2bd5726979659876f34c277dd816a710700b33b94ef4026210aa9446d1f57c16 |
Hashes for json_stream_rs_tokenizer-0.1.5-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 857bfd72f8f679ddecf5ee20bf644a919af5e17766733db7a502f5be29f13166 |
|
MD5 | 64d3db9f7af930f097af7addc10aaa86 |
|
BLAKE2b-256 | 35970118fe81c91e2cf878386806429b827e10558936d50542a116c01f50f1b4 |
Hashes for json_stream_rs_tokenizer-0.1.5-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4bf485868fd9dba0e4702a210af655a160142206f6f8a9fce0610b582bed66b |
|
MD5 | 77456252b5be3a98a44273f5b93ba60a |
|
BLAKE2b-256 | f53b29f24da4508879e905d83729ccd2abe9b92b1b74e5bc638e3e733a9b13c6 |
Hashes for json_stream_rs_tokenizer-0.1.5-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bd4d4ac4e7c3d5f274657607eb79367161c2acfc6f69f4bfeb485b8c4798f04 |
|
MD5 | ee597c4babd6b0a67d596115015f5dfa |
|
BLAKE2b-256 | fca250455f840cfda1c4a93bf92447e30c937634e92840fd844de5e3057671ed |
Hashes for json_stream_rs_tokenizer-0.1.5-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae4b7f82219b68052cc8976fbae8eb24dee488b3a90d7aed37bb86de01ebac44 |
|
MD5 | f7621d6c6576d5c69a8c76d5ac8642c2 |
|
BLAKE2b-256 | 8e35f42f8fb95ac54516e4d7bdd314ee58fbd4eec8166f8074574509720ffbda |
Hashes for json_stream_rs_tokenizer-0.1.5-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b64a267ae2cc3f83f14a6142a8d0eccc2c95c6cd2c63870e32c7e5b83fe9066b |
|
MD5 | d58af83960712868c8794454c2709937 |
|
BLAKE2b-256 | 4888abdff3ccbf22dfe2a849bc48ae85b6512388fb45a03b8b7044e2cd13471d |
Hashes for json_stream_rs_tokenizer-0.1.5-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf7bc4c524139da7e041a813a8697c0d41ab01e45605b20765cf80f871743171 |
|
MD5 | 18639346c6a996ad7b92173f3639fe48 |
|
BLAKE2b-256 | edc4c03edb9f94d71b4b19d17a5d47c6645d222930a4b647255b90ac9fbde81f |
Hashes for json_stream_rs_tokenizer-0.1.5-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fb197b7d5eb9ae5b50f1e16a825cd11e1cc9634930c5174f29dbe7adfb2b191 |
|
MD5 | 086e317f11b80a6d81211b5f00aeab04 |
|
BLAKE2b-256 | 2bdfab3abb624e2dd803f5ebbc95f9fedaf0a9c9d256303f65570acc98ff91c0 |
Hashes for json_stream_rs_tokenizer-0.1.5-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 547dde2af6e2ca951e56529742a92f2075fdba4d7ea19b6eaebf93d588bd94f0 |
|
MD5 | 717b64276a5af63286fe806b9dd66823 |
|
BLAKE2b-256 | 49c733c2205f188e27a39a030728d5749d4646ddad345de277a546d167464ea0 |
Hashes for json_stream_rs_tokenizer-0.1.5-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad4ec37c52cc72fb9335b06a004b88cfcdc3f3efd61bd075d6c5a487a884a850 |
|
MD5 | 317238d5032a59444f5cde407b2f80f5 |
|
BLAKE2b-256 | bc07ced443b5f474d593eab112eb45b3033fb45d07433f8c41ff9bf37a8f8465 |
Hashes for json_stream_rs_tokenizer-0.1.5-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5b98e40b614c7041e064989e7fbf793d80e8cbeb09e109a546c09dc4736cb8d |
|
MD5 | 45532ddb7dd0a5a4b162d30552c3c0fa |
|
BLAKE2b-256 | a699d8fa1db5befaa943f2e59ca6497abc50aaf9cf2ecdb775db62ba32db324e |
Hashes for json_stream_rs_tokenizer-0.1.5-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb0f82737c30daa19d1c353ba3f290cc2881cc7edc6c2b9e8d9fe3f7240678d6 |
|
MD5 | 2df0016370c99aacf2b5380083354754 |
|
BLAKE2b-256 | 27fdfb1b9056a7f81c212b641779d8ebc999da95de883be72aea5785a19486f3 |
Hashes for json_stream_rs_tokenizer-0.1.5-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef1d4a55f0f35c85e6ba3675cb481540294421805416915e69a204bfb81d936e |
|
MD5 | 1723076be57f2e8f9bc95af5b3dad06f |
|
BLAKE2b-256 | 862838b826a34450c9fe519439f07ae5a3d91751fb7f503d14ffbb7b9a759d4c |
Hashes for json_stream_rs_tokenizer-0.1.5-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca592e867bc683c19bf8a3627cdf6681aa6386a6458b4cfd7b2cedcb7e69bd98 |
|
MD5 | ce5e0fc8a0e919f04bb2d5fdf884add2 |
|
BLAKE2b-256 | 838d93ba5c936764d8a53ca00ab0d19e105709cc23b0d61f7ffc390977c43ddc |
Hashes for json_stream_rs_tokenizer-0.1.5-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c111bf655cc6eb56dc0b3909e715e4ff523e772301c7bdd5fdfed54bbe8df57 |
|
MD5 | e9f3d4f402c564536d8ba8efe578c2a6 |
|
BLAKE2b-256 | 26b4dce148abffac962ba193c405fdb2c023b0ff148cb08523a019cfc1bc1d9f |