No project description provided
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install git+https://github.com/smheidrich/py-json-stream-rs-tokenizer.git
Note that in editable installs, it will sometimes (?) compile the Rust
library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
Because json-stream
currently has no mechanism to provide a custom tokenizer
(which I would prefer), this package provides its own wrappres around
json_stream's load
and visit
functions that monkeypatch it in before
running them:
from io import StringIO
import json_stream_rs_tokenizer import load
# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
The patching is undone when the function returns.
Due to patching being a global state mutation, using json-stream-rs-tokenizer
in this way is generally not thread-safe. As an alternative, you can patch it
in manually using json_stream_rs_tokenizer.patch()
, which should be safe if
you do it before you spawn any threads, and then just call the original (but
now patched) json_stream.load
and json_stream.visit
functions.
License
MIT license. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b059319d2a9d35830fd5ef8e9a493930f8df983a3243116e8fdd0afbce692481 |
|
MD5 | 0e0edc247a60d9b3c9af0ad78ad5ec20 |
|
BLAKE2b-256 | 37a1f8a03d1e42d7c9b92f300de7e244288c909ded983b8d98c93d8b843b9833 |
Hashes for json_stream_rs_tokenizer-0.1.0-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf5dbf352ccf7d5fd196598a3627e0ce865ff2238bd9f9ed2c1aa0ff571d13a7 |
|
MD5 | 9fdfba28e547640f6e03a5584a947a66 |
|
BLAKE2b-256 | b141711a4758e49dca3e014ab37c8ece9484a68363703217d556e6b227dc12db |
Hashes for json_stream_rs_tokenizer-0.1.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08d9e2c86b8ab9a00c8b99bc9695cf58a10017c41a02829b1bc4c17b27218306 |
|
MD5 | 336f43fb292ea458be2653075d4cd263 |
|
BLAKE2b-256 | 3f0f8ddd1080caf2bbcf8542f234ed89c1cd127277d1163645b8286d74398dfd |
Hashes for json_stream_rs_tokenizer-0.1.0-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e19748a91b14a484d972ebcd3e7778a0ea446322af00ef191f480477055dd13a |
|
MD5 | ba976c905efb723351f3718d6f58fa8c |
|
BLAKE2b-256 | d17efae803ae8160d4f428b740ff5c2cbd72cd7bdbde68a71a68014834f15dd6 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b1f698e7c2a97d3375e4134efd9d4e2ed3ac2e5a331253d52458f715e2e51fc |
|
MD5 | d51ca4761b81fe1dde3bec10cc2f1050 |
|
BLAKE2b-256 | 50f2f1bbd9ce4e08717695845c27c866984f612eaf603b8c8d9c910c346f6a24 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 168799e25f127e9bd656a2252474019e152c0a27b42231283b5817065cedb95c |
|
MD5 | 261503a7661845e3dd6d1c0d0d4e97ee |
|
BLAKE2b-256 | 968e4b5ed1c1a190aeb1e033e8ae63e46f77bcfa90ef43001cb9f3a2249e1ae8 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a4f4cf024f2d4936982b922d206c3a8da16a8a2462031f7dc45450471fb120d |
|
MD5 | 73f3307d31affd27fd93346fc3509dee |
|
BLAKE2b-256 | db7086fb05bf8ba59f3148cc9847329d64c72d702237ec1e96ee1a8ac908fb2d |
Hashes for json_stream_rs_tokenizer-0.1.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 194ceccf43148769cae70d19be9aae757f8a12e9bc46dff1e48aed02cfc251d9 |
|
MD5 | d207ec94ad0813004a20d1409aabcf4b |
|
BLAKE2b-256 | f7cbfed595143ac093bfb0e44822da0c43672fb5f24425ed5512cd4c848726b9 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5652e3997bfe4de27f7bf50a391fae53c48ff13ae0d66dc526acc0e4c292781b |
|
MD5 | fd13091c273ec4f6221e2d131657d64e |
|
BLAKE2b-256 | a1b3596052b44eb076e5cb3b12703ebdb1733fa4bcc1750b42855498fa5844ff |
Hashes for json_stream_rs_tokenizer-0.1.0-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a66c29ae7930a705fab76c8e0f82847abe0e636a17242ae69c50981828c4ddf |
|
MD5 | 663207b1ee07fce0b626d65f35fc4830 |
|
BLAKE2b-256 | 241532841ee67159a858130766d1af377fac292b26fc6aa97e0d52b14fbebf30 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de50dd46e5057aee9cc80cc830c33ea0120b84d82de00480714c252426c43fa5 |
|
MD5 | 1bc2183798efe6deaf381729e92eb7d2 |
|
BLAKE2b-256 | e54e8af1733bf93c4bcfbeece8bedddbe47d6646f416d89f10abdf38012d7428 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c09f720e5255926a17a94ba5948233ddfac3f444e7749d364fb6d76c64d7864e |
|
MD5 | d0f2da1b04d56c56940597e0d7f6292e |
|
BLAKE2b-256 | 7af0d0b8c785915529ee0e6fe1bb8153a1200b1d9e3fb9b21244647d2d2ed558 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0051467da55e34c4cda0b9900e3b898f11fb9badc69a0e850dec7fcdb846aa76 |
|
MD5 | fc3ce767d77e7d87aaa3de50de6fec01 |
|
BLAKE2b-256 | 96cd8094c8d36e193971e8fa166fdd55c6e9a1ff9af2e0f004ae5905ac3e8900 |
Hashes for json_stream_rs_tokenizer-0.1.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb73a54aa1cc0f6ea1cb7ad2af5f06443cdcb523fbe0e92609b77ad7fc99155b |
|
MD5 | 16838a4654c3b7da31ea5ca37b893f89 |
|
BLAKE2b-256 | 9cdbdef30aeb6f22b97cc2ea8304ce541e7960957a0afe69ea403801a998f8db |
Hashes for json_stream_rs_tokenizer-0.1.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07590a7777cf4f68553183fabc5efc61a772ce46449a5171939356472ed69cbe |
|
MD5 | fccce61e9c33da85038e891226ee5752 |
|
BLAKE2b-256 | c52205b19b473e332e035be7a2b1adf5969433210578ac492d13c95efef2018c |