A faster tokenizer for the json-stream Python library
Project description
json-stream-rs-tokenizer
A faster tokenizer for the json-stream Python library.
It's actually just json-stream
's own tokenizer (itself adapted from the
NAYA project) ported to Rust almost
verbatim and made available as a Python module using
PyO3.
On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.
Installation
pip install json-stream-rs-tokenizer
Note that in editable/develop installs, it will sometimes (?) compile the
Rust library in debug mode, which makes it run slower than the pure-Python
tokenizer. When in doubt, run installation commands with --verbose
to see the
Rust compilation commands and verify that they used --release
.
Usage
To use this package's RustTokenizer
, simply pass it as the tokenizer
argument to json-stream
's load
or visit
:
from io import StringIO
from json_stream import load
from json_stream_rs_tokenizer import RustTokenizer
json_buf = StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }')
# uses the Rust tokenizer to load JSON:
d = load(json_buf, tokenizer=RustTokenizer)
for k, l in d.items():
print(f"{k}: {' '.join(str(n) for n in l)}")
As a perhaps slightly more convenient alternative, the package also provides
wrappers around json_stream's load
and visit
functions which do this for
you:
from json_stream_rs_tokenizer import load
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))
# ...
Limitations
- Arbitrary-size integers are not supported for PyPy nor when the extension is
built against Python's limited C API (
Py_LIMITED_API
). This is due to a limitation of PyO3'snum-bigint
extension.
Benchmarks
The package comes with a script for rudimentary benchmarks on randomly
generated JSON data. To run it, you'll need to install the optional benchmark
dependencies and a version of json-stream
with
this patch applied:
pip install json_stream_rs_tokenizer[benchmark]
pip install --ignore-installed \
git+https://github.com/smheidrich/json-stream.git@util-to-convert-to-py-std-types
You can then run the benchmark as follows:
python -m json_stream_rs_tokenizer.benchmark
Run it with --help
to see more information.
License
MIT license. Refer to the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for json_stream_rs_tokenizer-0.3.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | edfdaf4d338cd948b1285a3e1badb2033e3228cead7ac111d5f98054816850af |
|
MD5 | 200f8b6c0476b1171795ac81ec5e79c3 |
|
BLAKE2b-256 | 72632f775571d290684e52c9c235413c4d968f29391ed1cac91060166f3c8628 |
Hashes for json_stream_rs_tokenizer-0.3.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51ee4f92594368bcf6c6323d4889a3f1d741a285ffa785c11c310981ec7d2421 |
|
MD5 | af16af6b2eccff0ac3c9fb2d890d8276 |
|
BLAKE2b-256 | eddce57f7408e1e8d138f5dd079c2f66bc76149776c1acbd8ac761d1b478c2a7 |
Hashes for json_stream_rs_tokenizer-0.3.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2960f9e39653a1fcf53e02888b1c531537995f32295cd87605c0867a8e86de9e |
|
MD5 | d4232619ca80dafc345689ab171c6dd7 |
|
BLAKE2b-256 | 6353ad0e691c6bb626187cd8b202ff92cc01c1242216b867de73a8c79b474911 |
Hashes for json_stream_rs_tokenizer-0.3.0-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45d7819dab8f1d77f06b5a91a74cb334c1e2d5eb8d1595d2f823c2487edbff5d |
|
MD5 | cc24363a6f2e532506d27723e374ad39 |
|
BLAKE2b-256 | 783e479a260347cc93197babbd0a8ac6f7b14825fe63088696702bbd3d96f2da |
Hashes for json_stream_rs_tokenizer-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d5b6fde19d2e281ebab8fb3e295f1dd1a596329914e8973cda1325fe94981db |
|
MD5 | 82803c649c204438e1336a026eb32a96 |
|
BLAKE2b-256 | b76cfd5fbb05dad0a9c0572ef2a63644d00b61ddb14d37c451e5f15b55743254 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eff41d288c716254ecf1a6780fb6202c3b7e4e6aed1b4b01f4ff86b85ea7d6c |
|
MD5 | b607f633777d3a13b272e483a566dbea |
|
BLAKE2b-256 | 9aa3ed2f853c4e764b74b9d0fb13e26539d53b585f096d2c2a1dab09b0ff4471 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdf2723e5e9daeb988c71229b199797013238da3e5439c9d5ea28c7dd34b84cd |
|
MD5 | 53ba2d29962e7bf729135bc3f763aa76 |
|
BLAKE2b-256 | ea74a7d978447556ec8e628d46b0b1b5b8cf3b828408334f1cb77cb6ec79351b |
Hashes for json_stream_rs_tokenizer-0.3.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b9dd7abd6a599fa046ca69c29d013986bab00da082f82d11b5ef265bb7a6bb1 |
|
MD5 | e4642f5d10019eed761059ff441c7575 |
|
BLAKE2b-256 | 0f4d7196a06efb00b859e6c5372481ee24c14877e66fda789ccb246fa8bc7863 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48f717403189b97270115ccc6ed04029f4708481438ccacf797c420f6ea824c4 |
|
MD5 | eb06000bda0ba27ef7d638acbc1cf5ab |
|
BLAKE2b-256 | 83d4d7b465ea3830aaae7ddacbb255761a7ea8f11d4481865bcaff82686decda |
Hashes for json_stream_rs_tokenizer-0.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cf7581ad5945c5a14a3c6b36947883f85a757ffee79cbbcdf88cda6af182161 |
|
MD5 | 87bb85740d5d8454accc6a3143b7dcff |
|
BLAKE2b-256 | 90ae26a58f134be78761befbd5a9818159d777fc600d0ba00e636a1b556035f3 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1282c99b3301ed66bc36fda095e36cf88caf76cc05f8aa3a46d1866a1395e7e7 |
|
MD5 | 6a6245cac85d3380c50c07f7af59dd3a |
|
BLAKE2b-256 | 9dfc04d75e4f8adfeb7074222cf10bce654f201d8391472af3d6bb6aa3d80ac8 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52d0847c73c6b903510173854cf0550a3ef8a91d813828cb8590fd4b367e0dc2 |
|
MD5 | 3e8964c0c2bb72f8bfe476bfd0469a67 |
|
BLAKE2b-256 | 979cdbbd255591ca6df42e02669ddf8f43351bb12584eed0f19df7ed42c28487 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a33a901f4eded406903b7e73935b5f88b618767786175e9d24c06d431fe3d5d5 |
|
MD5 | 47bee17d11c6249d10d80fc26863fb57 |
|
BLAKE2b-256 | 9be4c30ca1d12fa98721572f9b410f0f47841693629902a22855b96696581c72 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59386af7e4e8cad6791d341d070c55a66c9286eeb1ed5344224adbb8103f7b5b |
|
MD5 | f23922c114bea5eb60aea3c0d8a820a8 |
|
BLAKE2b-256 | 652cbe1b16e040f51dea1d3f34617576a419390a5f6ea7b4f2792446e539b87f |
Hashes for json_stream_rs_tokenizer-0.3.0-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09f9b4a532aae2eabe14e74b5aed6f0a589d78fdb8f43bc6bd208e355a09262b |
|
MD5 | 103c7ba104b3d78ad6f4f2860be6b6b1 |
|
BLAKE2b-256 | 8fba0de3014c74489b192c78b5d68767873929e091d94f67ec29b9474d63657b |
Hashes for json_stream_rs_tokenizer-0.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9add9c447d6c43ea4bdb84e99588951e9cfe29ab9d9b770bf3eed59ab4f7a59 |
|
MD5 | 707eb2e256873225ab95f16110609724 |
|
BLAKE2b-256 | 54c17cf7152d48e95ddabe574162feaa3ca18f2a0f6febb41c3568ff91c19e64 |
Hashes for json_stream_rs_tokenizer-0.3.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebad3af36b3168a1c1fa9d45d022dda40b3caeb628ead92bb8eaade41031c7a3 |
|
MD5 | f446681217e2ae4674344706188037f9 |
|
BLAKE2b-256 | ab5401cd866194613b302da1856785b8e52eb84967c7f46d2e20dafbdd50dc96 |