Skip to main content

No project description provided

Project description

json-stream-rs-tokenizer

A faster tokenizer for the json-stream Python library.

It's actually just json-stream's own tokenizer (itself adapted from the NAYA project) ported to Rust almost verbatim and made available as a Python module using PyO3.

On my machine, it speeds up parsing by a factor of 4–10, depending on the nature of the data.

Installation

pip install git+https://github.com/smheidrich/py-json-stream-rs-tokenizer.git

Note that in editable installs, it will sometimes (?) compile the Rust library in debug mode, which makes it run slower than the pure-Python tokenizer. When in doubt, run installation commands with --verbose to see the Rust compilation commands and verify that they used --release.

Usage

Because json-stream currently has no mechanism to provide a custom tokenizer (which I would prefer), this package provides its own wrappres around json_stream's load and visit functions that monkeypatch it in before running them:

from io import StringIO
import json_stream_rs_tokenizer import load

# uses the Rust tokenizer to load JSON:
d = load(StringIO('{ "a": [1,2,3,4], "b": [5,6,7] }'))

for k, l in d.items():
  print(f"{k}: {' '.join(str(n) for n in l)}")

The patching is undone when the function returns.

Due to patching being a global state mutation, using json-stream-rs-tokenizer in this way is generally not thread-safe. As an alternative, you can patch it in manually using json_stream_rs_tokenizer.patch(), which should be safe if you do it before you spawn any threads, and then just call the original (but now patched) json_stream.load and json_stream.visit functions.

License

MIT license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_stream_rs_tokenizer-0.1.0.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distributions

json_stream_rs_tokenizer-0.1.0-cp310-none-win_amd64.whl (156.0 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

json_stream_rs_tokenizer-0.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

json_stream_rs_tokenizer-0.1.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (529.4 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

json_stream_rs_tokenizer-0.1.0-cp39-none-win_amd64.whl (156.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

json_stream_rs_tokenizer-0.1.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

json_stream_rs_tokenizer-0.1.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (529.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

json_stream_rs_tokenizer-0.1.0-cp38-none-win_amd64.whl (155.5 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

json_stream_rs_tokenizer-0.1.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

json_stream_rs_tokenizer-0.1.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (529.3 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

json_stream_rs_tokenizer-0.1.0-cp37-none-win_amd64.whl (155.4 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

json_stream_rs_tokenizer-0.1.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

json_stream_rs_tokenizer-0.1.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (529.3 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page