Skip to main content

More JSON Tools!

Project description

More JSON Tools!
================

This set of modules solves three problems:

- We want to iterate over massive JSON easily (``mo_json.stream``)
- A bijection between strictly typed JSON, and dynamic typed JSON.
- Flexible JSON parser to handle comments, and other forms
- JSON encoding is slow (``mo_json.encode``)

Running tests
-------------

::

pip install -r tests/requirements.txt
set PYTHONPATH=.
python.exe -m unittest discover tests

Module Details
--------------

Method ``mo_json.value2json()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Convert a ``dict``, list, or primitive value to a utf-8 encoded JSON
string.

Method ``mo_json.json2value()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Convert a utf-8 encoded string to a data structure

Method ``mo_json.scrub()``
~~~~~~~~~~~~~~~~~~~~~~~~~~

Remove, or convert, a number of objects from a structure that are not
JSON-izable. It is faster to ``scrub`` and use the default (aka c-based)
python encoder than it is to use ``default`` serializer that forces the
use of an interpreted python encoder.

--------------

Module ``mo_json.stream``
~~~~~~~~~~~~~~~~~~~~~~~~~

A module that supports queries over very large JSON strings. The overall
objective is to make a large JSON document appear like a hierarchical
database, where arrays of any depth, can be queried like tables.

Limitations
^^^^^^^^^^^

This is not a generic streaming JSON parser. It is only intended to
breakdown the top-level array, or object for less memory usage.

- **Array values must be the last object property** - If you query into
a nested array, all sibling properties found after that array must be
ignored (must not be in the ``expected_vars``). The code will raise
an exception if you can not extract all expected variables.

--------------

Method ``mo_json.stream.parse()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Will return an iterator over all objects found in the JSON stream.

**Parameters:**

- **json** - a parameter-less function, when called returns some number
of bytes from the JSON stream. It can also be a string.
- **path** - a dot-delimited string specifying the path to the nested
JSON. Use ``"."`` if your JSON starts with ``[``, and is a list.
- **expected\_vars** - a list of strings specifying the full property
names required (all other properties are ignored)

Common Usage
^^^^^^^^^^^^

The most common use of ``parse()`` is to iterate over all the objects in
a large, top-level, array:

::

parse(json, path=".", required_vars=["."]}

For example, given the following JSON:

::

[
{"a": 1},
{"a": 2},
{"a": 3},
{"a": 4}
]

returns a generator that provides

::

{"a": 1}
{"a": 2}
{"a": 3}
{"a": 4}

Examples
^^^^^^^^

**Simple Iteration**

::

json = {"b": "done", "a": [1, 2, 3]}
parse(json, path="a", required_vars=["a", "b"]}

We will iterate through the array found on property ``a``, and return
both ``a`` and ``b`` variables. It will return the following values:

::

{"b": "done", "a": 1}
{"b": "done", "a": 2}
{"b": "done", "a": 3}

**Bad - Property follows array**

The same query, but different JSON with ``b`` following ``a``:

::

json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a", "b"]}

Since property ``b`` follows the array we're iterating over, this will
raise an error.

**Good - No need for following properties**

The same JSON, but different query, which does not require ``b``:

::

json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a"]}

If we do not require ``b``, then streaming will proceed just fine:

::

{"a": 1}
{"a": 2}
{"a": 3}

**Complex Objects**

This streamer was meant for very long lists of complex objects. Use
dot-delimited naming to refer to full name of the property

::

json = [{"a": {"b": 1, "c": 2}}, {"a": {"b": 3, "c": 4}}, ...
parse(json, path=".", required_vars=["a.c"])

The dot (``.``) can be used to refer to the top-most array. Notice the
structure is maintained, but only includes the required variables.

::

{"a": {"c": 2}}
{"a": {"c": 4}}
...

**Nested Arrays**

Nested array iteration is meant to mimic a left-join from parent to
child table; as such, it includes every record in the parent.

::

json = [
{"o": 1: "a": [{"b": 1}: {"b": 2}: {"b": 3}: {"b": 4}]},
{"o": 2: "a": {"b": 5}},
{"o": 3}
]
parse(json, path=[".", "a"], required_vars=["o", "a.b"])

The ``path`` parameter can be a list, which is used to indicate which
properties are expected to have an array, and to iterate over them.
Please notice if no array is found, it is treated like a singleton
array, and missing arrays still produce a result.

::

{"o": 1, "a": {"b": 1}}
{"o": 1, "a": {"b": 2}}
{"o": 1, "a": {"b": 3}}
{"o": 1, "a": {"b": 4}}
{"o": 2, "a": {"b": 5}}
{"o": 3}

**Large top-level objects**

Some JSON is a single large object, rather than an array of objects. In
these cases, you can use the ``items`` operator to iterate through all
name/value pairs of an object:

::

json = {
"a": "test",
"b": 2,
"c": [1, 2]
}
parse(json, {"items":"."}, {"name", "value"})

produces an iterator of

::

{"name": "a", "value":"test"}
{"name": "b", "value":2}
{"name": "c", "value":[1,2]}

--------------

Module ``typed_encoder``
~~~~~~~~~~~~~~~~~~~~~~~~

One reason that NoSQL documents stores are wonderful is their schema can
automatically expand to accept new properties. Unfortunately, this
flexibility is not limitless; A string assigned to property prevents an
object being assigned to the same, or visa-versa. This flexibility is
under attack by the strict-typing zealots; who, in their self righteous
delusion, believe explicit types are better. They make the lives of
humans worse; as we are forced to toil over endless schema
modifications.

This module translates JSON documents into "typed" form; which allows
document containers to store both objects and primitives in the same
property. This also enables the storage of values with no containing
object!

The typed JSON has a different form than the original, and queries into
the documents store must take this into account. This conversion is
intended to be hidden behind a query abstraction layer that can
understand this format.

How it works
^^^^^^^^^^^^

There are three main conversions:

1. Primitive values are replaced with single-property objects, where the
property name indicates the data type of the value stored:

{"a": true} -> {"a": {":sub:`b`\ ": true}} {"a": 1 } -> {"a":
{":sub:`n`\ ": 1 }} {"a": "1" } -> {"a": {":sub:`s`\ ": "1" }}

2. JSON objects get an additional property, ``~e~``, to mark existence.
This allows us to query for object existence, and to count the number
of objects.

{"a": {}} -> {"a": {}, ":sub:`e`\ ": 1}

3. JSON arrays are contained in a new object, along with ``~e~`` to
count the number of elements in the array:

{"a": [1, 2, 3]} -> {"a": { ":sub:`e`\ ": 3, ":sub:`N`\ ":[
{":sub:`n`\ ": 1}, {":sub:`n`\ ": 2}, {":sub:`n`\ ": 3} ] }} Please
notice the sum of ``a.~e~`` works for both objects and arrays;
letting us interpret sub-objects as single-value nested object
arrays.

Function ``typed_encode()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Accepts a ``dict``, ``list``, or primitive value, and generates the
typed JSON that can be inserted into a document store.

Function ``json2typed()``
~~~~~~~~~~~~~~~~~~~~~~~~~

Converts an existing JSON unicode string and returns the typed JSON
unicode string for the same.

--------------

Module ``mo_json.encode``
~~~~~~~~~~~~~~~~~~~~~~~~~

Function: ``mo_json.encode.json_encoder()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--------------

**Update Mar2016** - *PyPy version 5.x appears to have improved C
integration to the point that the C library callbacks are no longer a
significant overhead: This pure Python JSON encoder is no longer faster
than a compound C/Python solution.*

Fast JSON encoder used in ``convert.value2json()`` when running in Pypy.
Run the
`speedtest <https://github.com/klahnakoski/pyLibrary/blob/dev/tests/speedtest_json.py>`__
to compare with default implementation and ujson

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mo-json-2.16.18199.tar.gz (25.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page