Skip to main content

simdjson bindings for python

Project description

PyPI - License Tests

pysimdjson

Python bindings for the simdjson project, a SIMD-accelerated JSON parser.

Bindings are currently tested on OS X, Linux, and Windows for Python 3.4+.

Installation

If binary wheels are available for your platform, you can install from pip with no further requirements:

pip install pysimdjson

The project is self-contained, and has no additional dependencies. If binary wheels are not available for your platform, or you want to build from source for the best performance, you'll need a C++11-capable compiler to compile the sources:

pip install 'pysimdjson[dev]' --no-binary :all:

Development and Testing

This project comes with a full test suite. To install development and testing dependencies, use:

pip install -e ".[dev]"

To also install 3rd party JSON libraries used for running benchmarks, use:

pip install -e ".[benchmark]"

To run the tests, just type pytest. To also run the benchmarks, use pytest --runslow.

To properly test on Windows, you need both a recent version of Visual Studio (VS) as well as VS2015, patch 3. Older versions of CPython required portable C/C++ extensions to be built with the same version of VS as the interpreter. Use the Developer Command Prompt to easily switch between versions.

How It Works

This project uses pybind11 to generate the low-level bindings on top of the simdjson project. You can use it just like the built-in json module, or use the simdjson-specific API for much better performance.

import simdjson
doc = simdjson.loads('{"hello": "world"}')

Making things faster

pysimdjson provides an api compatible with the built-in json module for convenience, and this API is pretty fast (beating or tying all other Python JSON libraries). However, it also provides a simdjson-specific API that can perform significantly better.

Don't load the entire document

95% of the time spent loading a JSON document into Python is spent in the creation of Python objects, not the actual parsing of the document. You can avoid all of this overhead by ignoring parts of the document you don't want.

pysimdjson supports this in two ways - the use of JSON pointers via at(), or proxies for objects and lists.

import simdjson
parser = simdjson.Parser()
doc = parser.parse(b'{"res": [{"name": "first"}, {"name": "second"}]')

For our sample above, we really just want the second entry in res, we don't care about anything else. We can do this two ways:

assert doc['res'][1]['name'] == 'second' # True
assert doc.at('res/1/name') == 'second' # True

Both of these approaches will be much faster than using load/s(), since they avoid loading the parts of the document we didn't care about.

Re-use the parser.

One of the easiest performance gains if you're working on many documents is to re-use the parser.

import simdjson
parser = simdjson.Parser()

for i in range(0, 100):
    doc = parser.parse(b'{"a": "b"})

This will drastically reduce the number of allocations being made, as it will reuse the existing buffer when possible. If it's too small, it'll grow to fit.

Performance Considerations

The actual parsing of a document is a small fraction (~5%) of the total time spent bringing a JSON document into CPython. However, even in the case of bringing the entire document into Python, pysimdjson will almost always be faster or equivelent to other high-speed Python libraries.

There are two things to keep in mind when trying to get the best performance:

  1. Do you really need the entire document? If you have a JSON document with thousands of keys but just need to check if the "published" key is True, use the JSON pointer interface to pull only a single field into Python.
  2. There is significant overhead in calling a C++ function from Python. Minimizing the number of function calls can offer significant speedups in some use cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysimdjson-2.0.10.tar.gz (203.6 kB view hashes)

Uploaded Source

Built Distributions

pysimdjson-2.0.10-pp36-pypy36_pp73-macosx_10_9_x86_64.whl (173.1 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pysimdjson-2.0.10-cp38-cp38-win_amd64.whl (140.6 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pysimdjson-2.0.10-cp38-cp38-macosx_10_14_x86_64.whl (194.1 kB view hashes)

Uploaded CPython 3.8 macOS 10.14+ x86-64

pysimdjson-2.0.10-cp37-cp37m-win_amd64.whl (140.2 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

pysimdjson-2.0.10-cp37-cp37m-macosx_10_14_x86_64.whl (190.0 kB view hashes)

Uploaded CPython 3.7m macOS 10.14+ x86-64

pysimdjson-2.0.10-cp36-cp36m-macosx_10_14_x86_64.whl (190.0 kB view hashes)

Uploaded CPython 3.6m macOS 10.14+ x86-64

pysimdjson-2.0.10-cp35-cp35m-macosx_10_14_x86_64.whl (190.0 kB view hashes)

Uploaded CPython 3.5m macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page