A faster tokenizer for the json-stream Python library
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
This will install a prebuilt wheel if one is available for your platform and
otherwise try to build it from the source distribution which requires a Rust
toolchain to be installed and available to succeed. Note that if the build
fails, the package installation will be considered as successfully completed
anyway, but RustTokenizer
(see below) won't be available for import. This is
so that packages can depend on the library but fall back to their own
implementation if neither a prebuild wheel is available nor the build succeeds.
Increase the installation command's verbosity with -v
(repeated for even more
information, e.g. -vv
) to see error messages when the build fails.
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
To use this package's RustTokenizer
, simply pass it as the tokenizer
argument to json-stream
's load
or visit
:
from io import StringIO
from json_stream import load
from json_stream_rs_tokenizer import RustTokenizer
json_buf = StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }')
# uses the Rust tokenizer to load JSON:
d = load(json_buf, tokenizer=RustTokenizer)
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
As a perhaps slightly more convenient alternative, the package also provides
wrappers around json_stream's load
and visit
functions which do this for
you, provided that json-stream
has been installed:
from json_stream_rs_tokenizer import load
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
# ...
Limitations
- Arbitrary-size integers are not currently supported for PyPy nor when the
extension is built against Python's limited C API (
Py_LIMITED_API
). This is due to a limitation of PyO3'snum-bigint
extension. However, PyO3 PR #2626, which lifts the restriction for PyPy, has been merged into PyO3 main and is expected to make it into a release sooner or later. To find out whether a given installation supports arbitrary-size integers, thejson_stream_rs_tokenizer.supports_bigint()
can be called.
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
Run it with --help
to see more information.
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json-stream-rs-tokenizer-0.4.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5f9ef2147431c61a84d71c54c0b6d01f8d496a0775e16fd9460b3dc6717cc98 |
|
MD5 | 7bfb33ec71077cb96c729a4349aeffce |
|
BLAKE2b-256 | 655378944348af6a781603379d7771e80178f736e1874addb4503079bf9e5829 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp39-pypy39_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3df1c910e35f50c1c9e85d039b7b0401f868062b6145fe0b1d79746bca48c069 |
|
MD5 | fec9e061fdd16a511c620473669d649e |
|
BLAKE2b-256 | bd11a1e166ff58f84ad16904f4db90a8df03d75045ead65da93643393b27f929 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70435b69e603a45b631fc15c75e8323b540d1971d6fb0239fd2042475d232c7b |
|
MD5 | 1c63ddd665ed61d9c70e7dfdaaa5d5cf |
|
BLAKE2b-256 | 9b8a075fb988aeb096d112ba992dc6b647df1ac07fce95b7005d5825075ef6d9 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fabc753d0b47db846c476c00bdaa8a441e924f758aa5ba17ca03bf3afac1924c |
|
MD5 | 42bdcc3732bcb69d5d62168d97b18128 |
|
BLAKE2b-256 | 4b6b388ede233826889d01702e287814f1e78376cded686287fd0a1672518fcd |
Hashes for json_stream_rs_tokenizer-0.4.3-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c50f184010916e89ba7e361f29c83dfc147bdaff1d53bc304056cd5d535014b5 |
|
MD5 | c352e63ca777b736bb03e9a4369d87cf |
|
BLAKE2b-256 | d857f04d8c014d25303f215de9e5c444a88201fc73adcc6537a6b068be50d66d |
Hashes for json_stream_rs_tokenizer-0.4.3-pp38-pypy38_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d593a8fd00883cd5f9e8f2f4ab8f97923d75358a2aef73ca19fc0c76dd1cb6a |
|
MD5 | dd501a572388ec7e4c1d28ca974a7a88 |
|
BLAKE2b-256 | bfd81305e8c1563cec422e48f2f10370e42934492bbe22137085e14a2708e384 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c16fa623e1f6fa4d76b98376d417fe8de7f3ab7944a93104220cd770251ea1d6 |
|
MD5 | f94ea40134f74c939dbff879254539fc |
|
BLAKE2b-256 | a48b2ceb54bdd15e352854d3b301f8659714d4db71b98a074d8ae6260ce4b21d |
Hashes for json_stream_rs_tokenizer-0.4.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b075f9d299495b649ee3f41e66cd91a9e674fca1244842bbc88294af19da9a6 |
|
MD5 | c82de77ec79f975e2e178518de5e8fa1 |
|
BLAKE2b-256 | 02703e76ed5dde9d52a346df908ace72c8df45b34348afa85bb07731a4dc93de |
Hashes for json_stream_rs_tokenizer-0.4.3-pp38-pypy38_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 511653704ac81ae3f924c2752ff6d87472845af027e6a764477387cda1683b0a |
|
MD5 | c9f88f072d9d86f9a2ae5355356eb896 |
|
BLAKE2b-256 | 1731863ea2a95b8f1e592a10bf3b10f0dfa6748cbf37bfe89bfb0d0541375305 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp37-pypy37_pp73-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca6192b5a24885d958956518c0e2a89f857bbf1f9c5494bc8490af139c10342e |
|
MD5 | 09baa2fa2aafa16c2e5477c1ca913bbb |
|
BLAKE2b-256 | bbd41f603257aa854b42e61332ba92c98e2e32330156a370cf562042050b7590 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43175fea7db73255ac9f8d1287a62f04fcf9193fad78e8d851a26b3f5d010f99 |
|
MD5 | dd45b64d9227c4297e270154635b0f15 |
|
BLAKE2b-256 | 4d301722d5dba344365e2dc2d7a54ab937042c29e5455c8fb89fceb339e0b2b9 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e064ff7eaa31a8291e4d157bba43eb6e9ee619d7eb23f7f9c3866236b5addcb4 |
|
MD5 | df2fabfb61c6587f6d97ed4b11f2e179 |
|
BLAKE2b-256 | 13d9a8201f92b0d0d1c452363f5f74d725477ee116ef10fc66c4be2558e810a2 |
Hashes for json_stream_rs_tokenizer-0.4.3-pp37-pypy37_pp73-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b26270e59dec3d4b2f1035b9b1f6a3b1e3d686b4ebc82e2a10a697e939f44aa |
|
MD5 | c2bfb6570bf5331a838e491d4da166bb |
|
BLAKE2b-256 | a60008a52e5bf8d3067e62a6fa4b4e3914a580f3b27c16baf5a44789fc1a340a |
Hashes for json_stream_rs_tokenizer-0.4.3-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11e2b8ce6b43244ec9ad89e480ad1bca633afa477f0aa61e97055c5050295cba |
|
MD5 | 12e70ce48fef548a147596a037d56d10 |
|
BLAKE2b-256 | c6e72c611d14201c718cf92ca542deee38e0957086e430bc952b8d7f7fa045fe |
Hashes for json_stream_rs_tokenizer-0.4.3-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3a76c9460a45db21f2a7286943e15a6dbc990d8156ac981b10975324b170969 |
|
MD5 | 3a7349abdd71cf21b532313f73252f25 |
|
BLAKE2b-256 | bbf5d60ac40e4ea773375930103621018211c23215290ca616aab7323d360c2f |
Hashes for json_stream_rs_tokenizer-0.4.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f1959128cb2b8dbf9d4903367c6bbbe751848077caceed5e84461a0ae902d3a |
|
MD5 | d5b9e2b731128b8c5871220f8ad0af4d |
|
BLAKE2b-256 | 9c3a7ce1b432a5cebf787548d44955daa2d635b6657898b040a9d152596a8e9e |
Hashes for json_stream_rs_tokenizer-0.4.3-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6334cf0f561ee48a73ab1b948c7745d4120b15655c474c81b11e20807f63c460 |
|
MD5 | 5bc88b48583625ee8ee42f9bb023672b |
|
BLAKE2b-256 | 8c390e389119843c51a06697ec0e5cd55460acca3092aaa363501fd14fc75143 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d54364f51beb1e6b469fe2e8588486e28ed501850cee8e8d81d26914a67aa7f1 |
|
MD5 | 9e43dfdee7fc1b8be05c8d19f2166475 |
|
BLAKE2b-256 | 012c0ff534442a949cbd61a3c39e9227427ed67f02c59eba8186e792252050da |
Hashes for json_stream_rs_tokenizer-0.4.3-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dde87de9a5d64dcb6e3d06b872a85ef9525afebff6ef7d755ce122129ec392d7 |
|
MD5 | f8a627713bd76e7d03c1367f5caf3470 |
|
BLAKE2b-256 | 2d53d22e68e08ce21eb04682a1b1386a9a99eb95d3dfc08400ec2a4e3230c81c |
Hashes for json_stream_rs_tokenizer-0.4.3-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 181fedfe7e193916190e2cec14f99d884b053c5aecdb9a27a8e244a2ba356209 |
|
MD5 | 4eb93ebe5ab9b260962e699f68edf7e8 |
|
BLAKE2b-256 | 3bf513322c5de80f964ee5b832d1c6854941b5eebb5b413fd3892af98fd2a1a1 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b906d56f66e80959983ca9fe7bde1a934b16add6a54794e6193231e7fd9403f |
|
MD5 | fb4a8cba0432b6db3d28de5fd88bca07 |
|
BLAKE2b-256 | ce8809f596083f577d3cfc256708b6006a083cbfa1ebdd6c65b58c87a5fac7e2 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e3bec290cd32f358ad9a7d97d4696794ecb16274ca4a1c6003891280612c8ff |
|
MD5 | 260934d63ffe09773ee2d1bad5625f44 |
|
BLAKE2b-256 | e35f7b2ad15be726d9905fd4b53f73fb01bbf910cfc1c24fcde7d34dd967bfcb |
Hashes for json_stream_rs_tokenizer-0.4.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ab487c1af220fec9068876fa67aa8e4fb85072d2fb9df1f1fbbc8171ace5bd1 |
|
MD5 | 43546b7baec2c49f682562da5a3093d8 |
|
BLAKE2b-256 | 600ecae1a582d6f2af61dd7f3cbbfd8b9a8a944838e0e1c3ea634a27281374c6 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12fd318d954ed9cfa92274556f44e4ca66a289ee5186254ae0aa24ee1cc3e9e6 |
|
MD5 | df111e50f43877a1c8ad142488650095 |
|
BLAKE2b-256 | 64347bfec3b03b9b61104fa15058e1a42d15ea91ae162887b4073ea1a0101cca |
Hashes for json_stream_rs_tokenizer-0.4.3-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8584f75b4c4576d6d9f46c5b7a87e7cfbb053d8c299eb3b94703efe487b4bae |
|
MD5 | 9e7a1257faae486c8d7acf53c4842707 |
|
BLAKE2b-256 | b5ffe443d2bc4d0a42f493c8071643ab19ab2da1de6285c54afab68c12b463fe |
Hashes for json_stream_rs_tokenizer-0.4.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc948e432c92526333937e8059443c12654a164f604796be164e7ac1da73e019 |
|
MD5 | 3affb780c5b71ec0877702d8ac83c3b3 |
|
BLAKE2b-256 | 747422198bc68a7e50844d74d64875b527d4c07e4198e56288a74dbe21940f2d |
Hashes for json_stream_rs_tokenizer-0.4.3-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80c50022b2a3a5e1c4e1a4abe4a71da9b7c6121870951365f09d958a6848ba7b |
|
MD5 | 941aa8c67aa84daddc9eb9a7dea4cc87 |
|
BLAKE2b-256 | 80415d914c51fb26f1a8a30affaa43e0e51e4767dd68085d81c37f6edf6f4604 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5fb31c8b0422faec61f95c56cbd1bf28743b149f40e9cc75b6ce2dd1292afdf3 |
|
MD5 | 25d152f7ba906c065d0495414bf5325c |
|
BLAKE2b-256 | d9ae5c71af27d69add5ae6ca8a99fc65499447876f37a760921720f5cb8aa76f |
Hashes for json_stream_rs_tokenizer-0.4.3-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc311fa7c5ccf553c91dcbe9bca49c9d4d1c0174bd198a2a9ee8bf9573b1cc1c |
|
MD5 | 71d4e3da508f8df2735cddfcb7057d38 |
|
BLAKE2b-256 | b7d0f7ce496ddefae4a4d845b82cf20812e49299e447f9eccb8526997b641d06 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp38-cp38-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 194048657e1556f6e5ca2642e812f40b1d2c1bb7fac3f291279acc7a843a948f |
|
MD5 | 372605525085eb101441114096b5e4eb |
|
BLAKE2b-256 | 6390d9410e749584a75b88e6a119e67e2ced0dc86559fef3a302aff3fb50f17e |
Hashes for json_stream_rs_tokenizer-0.4.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f079b94d2d2411e3ffd0fc73cd517b4a75291127a2a0fd610c807e4ef5d3803b |
|
MD5 | 0d4af835a56c0e7a7a220f415f269f64 |
|
BLAKE2b-256 | 07b6b59835ffdde7bda7109a00866dc1c6a0a4fbf112aa699f22bb2168bb5a2b |
Hashes for json_stream_rs_tokenizer-0.4.3-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b46eab28f12c92863143c6c96ff53c203b5ec9e66f5c08778c35884a011f267d |
|
MD5 | 48f1e3f7d00ac55ba2c6df70421e306c |
|
BLAKE2b-256 | 0d20eb3dd3a55906247dd69322c32eb8746aff4962131cda9e18f68bda8e9185 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d332922ba4998eb91c5638c0a0c75d9746da2d3284d6d4f6c2b5b9fde01106d2 |
|
MD5 | 9630c74aeea361f2bb091086d25fb2a5 |
|
BLAKE2b-256 | 3c7656c83912e5e3b7afab3af0b6a748d6c45a693eb8b10f364dcbce4b276d49 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53c1bd5fd9a8f9279076aa8936e5d41e705872471f25b6d5a32a5d9037c5aaab |
|
MD5 | 11fa5a58e5faeb7257042d7ffd0d569a |
|
BLAKE2b-256 | 7f07d4d9fceb427ed946bae862328cfb6a9298ffbcbb8b5f30dcdfaa1f9ba540 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9643a867a32175a145064ec64df02b0b5e3e3b5a7c976ac3fb0da735059370aa |
|
MD5 | 25eb23d22cc2ce98f36ca71d1d6e631c |
|
BLAKE2b-256 | bd6aee2529a067cfe257c15bd63013c765b9b5d46401f605510223cee5490e67 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c46ca855e7911869e421aa85368cb8ca21a843841d846c658fd532a7d4e3ac9 |
|
MD5 | 6993699fb700718b59b367dfcb89a110 |
|
BLAKE2b-256 | be61d71aa7c3e89fd1980e01c2ef4bcca3e534be8c6b09021e23e4c50e7be482 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49463b6cf5013910dd30e31ea2b9eafda64308473abad96aac6eaa0d06659245 |
|
MD5 | ee80a34b7d3879d8021b7e217d2f97b7 |
|
BLAKE2b-256 | 04ab16ac9e453f910c108ce34b1de551455e1433391034eee370abc78fe3a007 |
Hashes for json_stream_rs_tokenizer-0.4.3-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 101c1bf368fc700ba8532ec06f99832fb6979b5704687f695d5a3360a0bcd234 |
|
MD5 | 0fcf07ef98701c6e80885618d2a9c0e4 |
|
BLAKE2b-256 | f6656012c9102e4424a09fd0a51cf0bd2fab2e6fa10c8703e235979868f4dddb |