No project description provided
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install git+https://github.com/smheidrich/py-json-stream-rs-tokenizer.git
Note that in editable installs, it will sometimes (?) compile the Rust
library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
Because json-stream
currently has no mechanism to provide a custom tokenizer
(which I would prefer), this package provides its own wrappres around
json_stream's load
and visit
functions that monkeypatch it in before
running them:
from io import StringIO
import json_stream_rs_tokenizer import load
# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
The patching is undone when the function returns.
Due to patching being a global state mutation, using json-stream-rs-tokenizer
in this way is generally not thread-safe. As an alternative, you can patch it
in manually using json_stream_rs_tokenizer.patch()
, which should be safe if
you do it before you spawn any threads, and then just call the original (but
now patched) json_stream.load
and json_stream.visit
functions.
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b62a40ee4a983586c6878088d5c8666a82bd2a7a5a54c007f3e4fcd703f2fb4 |
|
MD5 | b384ea2b02de84bd341848debc1bbed3 |
|
BLAKE2b-256 | e896a4e4d8784c7f06ff6406b25ddf18114a3f99ed6c12348150ba72d92bc0c9 |
Hashes for json_stream_rs_tokenizer-0.1.1-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e144ce09a8c2e0df32aa9eea8b4419f5d175e1b5671f30d745d6c6ff63e1011c |
|
MD5 | aa90bff821df63b96b1930434b7f9cef |
|
BLAKE2b-256 | 0d09f45317ffeb76dbb2d24f2b228a8507283bde435c8016f2102e1058ec1b9f |
Hashes for json_stream_rs_tokenizer-0.1.1-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9710f5b3c3f75de7f799469f9f8ade0953f0cbe28e0a66b94dffee3914d54fb6 |
|
MD5 | 53e2750d82f733ea5153a8d4c2d3534d |
|
BLAKE2b-256 | 5e6395f199c5f4502cdb8439e8972899d6f24434b99a1975d27831aef3b1337c |
Hashes for json_stream_rs_tokenizer-0.1.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2000ad799ea8e6213867fa59ca09d0383a03d8bb64bb9212817e09d817635d5e |
|
MD5 | 6634e6e90c4685870f1f8893daad020b |
|
BLAKE2b-256 | 21476dc26bf943ca35af67840a4e361c28f8909a352bcfec0bfb075fae2e2e9e |
Hashes for json_stream_rs_tokenizer-0.1.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b58f8737329c6c72ec4d7ac5f3ce9bfeeeb025d5de036db248ce9d6e00bc3480 |
|
MD5 | 8e5cf5a3f6b6024963ae46a2ef0ad7bc |
|
BLAKE2b-256 | 4c4c5ad07ce8a561946be9f0cf94ecb194030caa567a45419d2960bc09acd7ee |
Hashes for json_stream_rs_tokenizer-0.1.1-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97347695e1a68f271edb0df51ea36fb98125550be1e5f8389e02170ead5a2866 |
|
MD5 | fcd8729042cceb2af3ac4bfc04fdc5c5 |
|
BLAKE2b-256 | 0511c6a3c907f0f877b6938e1c97b2971205898bea2a2916999ca40b9c6493a3 |
Hashes for json_stream_rs_tokenizer-0.1.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f15b0b09c7efd1e45781db48e02c60b7ab644f39eca90bbac2a1151134902e2 |
|
MD5 | d3da4a2ebeee1bbada08d61a4b836789 |
|
BLAKE2b-256 | fcbca3e51996533ef7d7a04982643f2e2162bff03e6f66d5a9d0603c450ba3bb |
Hashes for json_stream_rs_tokenizer-0.1.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc98ae8f56c6936e88b918c63e1f66e7e0782977cb99d55e10d25e2a7780a831 |
|
MD5 | f9c7eb281e90626098b3f69769ef45bb |
|
BLAKE2b-256 | 3e5e87d2e94b8b0607172ca8e6b23fa6f71f1d839e4ff7222bf535254e9c6f53 |
Hashes for json_stream_rs_tokenizer-0.1.1-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4563472f9e1c71877228d760beef4e4d9e41a92b1dfe27e5978470c7532ccbde |
|
MD5 | 9eef969b4c71fd2e79e0e1617265e0b8 |
|
BLAKE2b-256 | 26c0b16d2e281a237d9839e01dfd5642e23e918b7f37299c1f2e7a2dea10941b |
Hashes for json_stream_rs_tokenizer-0.1.1-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55694667c58c6d0da07cbbb66b4239fb7a34cbb57267f32fe32f324c14ba92a6 |
|
MD5 | d60f852d424c7ea1f8d21da8d13e243b |
|
BLAKE2b-256 | 778c24e8d357348028cb395929aca70e06b845cb339ef78b534e22a644cac03d |
Hashes for json_stream_rs_tokenizer-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e0cd36008b9f31246a9b19345df1476e37053e1c751c96970fd63b53a404624 |
|
MD5 | e36ed9d3deb69b2c934936d9e559b225 |
|
BLAKE2b-256 | 538cfa1639c478c808dc17cf8a1645fdb92bda6b02670a85dc2d31adbf05f74a |
Hashes for json_stream_rs_tokenizer-0.1.1-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9db004dd2c8071d169e75d553d994d2a1e2a06a1d92be4c15c44938e3e73949 |
|
MD5 | 8c94766c5bd7ff0d2c3d073de841a3ea |
|
BLAKE2b-256 | 3a74dbbd5bd44e2c3b7f706632fdf72198912b8f743396066430fe59321f2ed1 |
Hashes for json_stream_rs_tokenizer-0.1.1-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee77819b464028fff2e1e914a9b1becb9a13aa8e686a19b59aa37376a46103c7 |
|
MD5 | c3a8bb37feb2c34f2f015fb91704bff4 |
|
BLAKE2b-256 | 3b9462ee005d271ffce6d1316fbf7e8bb514df8afa93a10d7a649dd69b249b0b |
Hashes for json_stream_rs_tokenizer-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fb4597a3ce2b74f88873b320a4507a2ae1b81fc10ddba91c7d6aa83085d5d57 |
|
MD5 | d414d9a65e1b2d4afa4c6931a02ee610 |
|
BLAKE2b-256 | 218a6843e1b7828989a1c246bf3b46a80832262d03bfd753cf7cfff6ae9f9b8f |
Hashes for json_stream_rs_tokenizer-0.1.1-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2da50dc843e0958ee5bcef5964b408cbb8cb87292a03b41575b3053ce7f221a |
|
MD5 | e0c28d3c9df4218c36ecae6a55358453 |
|
BLAKE2b-256 | 0a053fd85fce57b3715cf22aebd7f194c3e3308910e8b70133bfcf29b3ecb031 |