No project description provided
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
Because json-stream
currently has no mechanism to provide a custom tokenizer
(which I would prefer), this package provides its own wrappres around
json_stream's load
and visit
functions that monkeypatch it in before
running them:
from io import StringIO
import json_stream_rs_tokenizer import load
# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
The patching is undone when the function returns.
Due to patching being a global state mutation, using json-stream-rs-tokenizer
in this way is generally not thread-safe. As an alternative, you can patch it
in manually using json_stream_rs_tokenizer.patch()
, which should be safe if
you do it before you spawn any threads, and then just call the original (but
now patched) json_stream.load
and json_stream.visit
functions.
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1772d282bc365b02e47ecedb1b92cfebd4fd7579a41df35cad03b1a370c885be |
|
MD5 | 67e3db338da126aa1941caa70f76784c |
|
BLAKE2b-256 | ba1ef37eed08891d5bfb6ab604a9a2837b01b0e751cf0606df639fb72436699e |
Hashes for json_stream_rs_tokenizer-0.1.2-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e96be1d7479f6aec22e5bf336497ac2532d32ce3b0e41d11aef0d03e244f9c0a |
|
MD5 | f72a455b5f9dfeb44a973b0b2a84db4f |
|
BLAKE2b-256 | 0cd0f73f77535525411ea6e1d6d4a7a4c4491e5030799cc09f3da4f31962a863 |
Hashes for json_stream_rs_tokenizer-0.1.2-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d95fd0eb53345077644f4cdbf46a6dcc950b519a696b2435c2fb1582b9a8aff7 |
|
MD5 | 7a6f040411dcfbdc4fa092ff5b0384a3 |
|
BLAKE2b-256 | 4e454dd1655e8c7bfd6ca30c4999fb7698d64e5fb29239cc6632045685c4472b |
Hashes for json_stream_rs_tokenizer-0.1.2-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d1d5a620eb60a252d00fa0edb2947e04fd60c30116ec7504d1ff9924cf6759a |
|
MD5 | d6cab74072a24f0b59417ec0eadb5ec1 |
|
BLAKE2b-256 | 3f5378e0625c6be9bac1be378edbeb2ad66757294d9f515d43ff5f4c6145f68b |
Hashes for json_stream_rs_tokenizer-0.1.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1af900c421d828954e3e15c29779f83a70f251574c4f686eedcc2d33c5f0231 |
|
MD5 | 595bc35087d7f6ee18f06bcf8931831a |
|
BLAKE2b-256 | 0f3bf12038f8685ae8b29d5ebe44b2ed981ee1af3b7115f9c00fd0fbcf8064bb |
Hashes for json_stream_rs_tokenizer-0.1.2-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7864b8c57a7c12a7678111105b0943c26cd06cac248756d6881748950a240b59 |
|
MD5 | 1464e6fc47d99b0c2ed4acc41d65996e |
|
BLAKE2b-256 | b66da2a875d3d5e904936f7fed8dcb270c79be22422bfd1600f2922658f8ea61 |
Hashes for json_stream_rs_tokenizer-0.1.2-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17a369fad7c70f56fbec2cd189a597464497c47d4503e9ee77c698fdde23af0f |
|
MD5 | 17cca2fdfacd64eb4bc58b153e7eb4f6 |
|
BLAKE2b-256 | 131d64c5f788d3ccf3071274737cf34c29460e076277a33169c60cb1f4828f28 |
Hashes for json_stream_rs_tokenizer-0.1.2-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e67743ea423d3c3a662737fb6697aa8c6e8427cd3b8235df3addc4c6395ee08 |
|
MD5 | 99daea002b73521bc1ec8bc55c7552f1 |
|
BLAKE2b-256 | 5916de80df86c8a5c8621dfe1be212f246b9fb4139d8406f32a65f9bbc36894b |
Hashes for json_stream_rs_tokenizer-0.1.2-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fa6451819fd86ade8d8a4d4369541591e5534bc8b56853917e0f569a6e2c5ca |
|
MD5 | bb9b70473a1c056ac80827a324f4f9ea |
|
BLAKE2b-256 | 12b7afb77e135c9d6720b81ab33aab243ba26f7aa6526bad645633368de48dff |
Hashes for json_stream_rs_tokenizer-0.1.2-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7af9762d1d850fb530a6c3d65595ae0f4e44fef1d0b6a6b5fe6c23dc892f3db1 |
|
MD5 | e07e9a110d1ca5c700ecba51a0eaeba7 |
|
BLAKE2b-256 | c853a6ce06ec8b72eb4d448cc533e012d80fb94fa090f2daa044756dc7b289b5 |
Hashes for json_stream_rs_tokenizer-0.1.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7028ff2e5e51a450b584f6f84a2737f83b4ca0b66c0c7898a6439054f51c91db |
|
MD5 | 9d5ff52021486ceff3dee3bf1c70b00d |
|
BLAKE2b-256 | a7c27efe8c3afccb6b52c1afa05297c14b43cc91dfcd0daff73f4bbc9061e650 |
Hashes for json_stream_rs_tokenizer-0.1.2-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb358fc9c19c9dfde91cf3270312ea163158ff1ea8707bdc4ed0bcec29d536a6 |
|
MD5 | ad2b76c65274a21101de5c754f701602 |
|
BLAKE2b-256 | 75043f52ccb62724fee8a370a508b2b9f70a88ccb8d34f1c03f1ece8eca893a3 |
Hashes for json_stream_rs_tokenizer-0.1.2-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b17e6a1b6406ffaf11f7d205fde24189516783e88e658260153dccc49dba55d0 |
|
MD5 | 4a163407dbd91a608d61ee73789309ee |
|
BLAKE2b-256 | 7d70fd0c1b0785e593a5924e8f01ab0b19c91ec10aa75bc5747416d056d7184d |
Hashes for json_stream_rs_tokenizer-0.1.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70c24e1e1b2f5f8ea07fe7b0d44e3aeda8ac4fbe85e81044af1239d706dfea01 |
|
MD5 | cb5c2cf3d47147d8ab96a32ab21b7922 |
|
BLAKE2b-256 | 5b613d06ee16987d1bcd8f2a51dc2315524a2903cb8d5e99b437b0228038deae |
Hashes for json_stream_rs_tokenizer-0.1.2-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96a4dc7beca17a57ff2c8a512f893a40432a7f9f3e5760d180b2c758a1e0cf2f |
|
MD5 | 7657036a1df0d45bc7d0a8e314b4f87c |
|
BLAKE2b-256 | 99f27e67d99387d5d5d4fd6f679888491d2f1679ad842220dcd6cfa8a3b2998a |