Skip to main content

More JSON Tools!

Project description

This set of modules solves three problems:

  • We want to iterate over massive JSON easily (mo_json.stream)

  • A bi-jection between strictly typed JSON, and dynamic typed JSON.

  • Flexible JSON parser to handle comments, and other forms

  • JSON encoding is slow (mo_json.encode)

Module mo_json.stream

A module supporting the implementation of queries over very large JSON strings. The overall objective is to make a large JSON document appear like a hierarchical database, where arrays of any depth, can be queried like tables.

Limitations

This is not a generic streaming JSON parser. This module has two main restrictions:

  1. Objects are not streamed - All objects will reside in memory. Large objects, with a multitude of properties, may cause problems. Property names should be known at query time. If you must serialize large objects; instead of {<name>: <value>} format, try a list of name/value pairs instead: [{"name": <name>, "value": <value>}] This format is easier to query, and gentler on the various document stores that you may put this data into.

  2. Array values must be the last object property - If you query into a nested array, all sibling properties found after that array must be ignored (must not be in the expected_vars). If not, then those arrays will not benefit from streaming, and will reside in memory.

Function mo_json.stream.parse()

Will return an iterator over all objects found in the JSON stream.

Parameters:

  • json - a parameter-less function, when called returns some number of bytes from the JSON stream. It can also be a string.

  • path - a list of strings specifying the nested JSON paths. Use "." if your JSON starts with [, and is a list.

  • expected_vars - a list of strings specifying the full property names required (all other properties are ignored)

Examples

Simple Iteration

json = {"b": "done", "a": [1, 2, 3]}
parse(json, path="a", required_vars=["a", "b"]}

We will iterate through the array found on property a, and return both a and b variables. It will return the following values:

{"b": "done", "a": 1}
{"b": "done", "a": 2}
{"b": "done", "a": 3}

Bad - Property follows array

The same query, but different JSON with b following a:

json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a", "b"]}

Since property b follows the array we’re iterating over, this will raise an error.

Good - No need for following properties

The same JSON, but different query, which does not require b:

json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a"]}

If we do not require b, then streaming will proceed just fine:

{"a": 1}
{"a": 2}
{"a": 3}

Complex Objects

This streamer was meant for very long lists of complex objects. Use dot-delimited naming to refer to full name of the property

json = [{"a": {"b": 1, "c": 2}}, {"a": {"b": 3, "c": 4}}, ...
parse(json, path=".", required_vars=["a.c"])

The dot (.) can be used to refer to the top-most array. Notice the structure is maintained, but only includes the required variables.

{"a": {"c": 2}}
{"a": {"c": 4}}
...

Nested Arrays

Nested array iteration is meant to mimic a left-join from parent to child table; as such, it includes every record in the parent.

json = [
    {"o": 1: "a": [{"b": 1}: {"b": 2}: {"b": 3}: {"b": 4}]},
    {"o": 2: "a": {"b": 5}},
    {"o": 3}
]
parse(json, path=[".", "a"], required_vars=["o", "a.b"])

The path parameter can be a list, which is used to indicate which properties are expected to have an array, and to iterate over them. Please notice if no array is found, it is treated like a singleton array, and missing arrays still produce a result.

{"o": 1, "a": {"b": 1}}
{"o": 1, "a": {"b": 2}}
{"o": 1, "a": {"b": 3}}
{"o": 1, "a": {"b": 4}}
{"o": 2, "a": {"b": 5}}
{"o": 3}

Module typed_encoder

One reason NoSQL documents stores are wonderful is the fact their schema can automatically expand to accept new properties. Unfortunately, this flexibility is not limitless; A string assigned to property prevents an object being assigned to the same, or visa-versa. This flexibility is under attack by the strict-typing zealots, who, in their self righteous delusion believe explicit types are better, actually make the lives of humans worse; with endless schema modifications.

This module translates JSON documents into “typed” form; which allows document containers to store both objects and primitives in the same property value. This allows storage of values with no containing object!

How it works

Typed JSON uses $value and $object properties to markup the original JSON:

  • All JSON objects are annotated with "$object":".", which makes querying object existence (especially the empty object) easier.

  • All primitive values are replaced with an object with a single $value property: So "value" gets mapped to {"$value": "value"}.

Of course, the typed JSON has a different form than the original, and queries into the documents store must take this into account. Fortunately, the use of typed JSON is intended to be hidden behind a query abstraction layer.

Function typed_encode()

Accepts a dict, list, or primitive value, and generates the typed JSON that can be inserted into a document store.

Function json2typed()

Converts an existing JSON unicode string and returns the typed JSON unicode string for the same.


also see http://tools.ietf.org/id/draft-pbryan-zyp-json-ref-03.html

Module mo_json.encode

Function: mo_json.encode.json_encoder()

Update Mar2016 - PyPy version 5.x appears to have improved C integration to the point that the C library callbacks are no longer a significant overhead: This pure Python JSON encoder is no longer faster than a compound C/Python solution.

Fast JSON encoder used in convert.value2json() when running in Pypy. Run the speedtest to compare with default implementation and ujson

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mo-json-1.0.17085.zip (28.0 kB view details)

Uploaded Source

Built Distribution

mo_json-1.0.17085-py2.7.egg (20.0 kB view details)

Uploaded Source

File details

Details for the file mo-json-1.0.17085.zip.

File metadata

  • Download URL: mo-json-1.0.17085.zip
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mo-json-1.0.17085.zip
Algorithm Hash digest
SHA256 187f72ae28370f232f1e78a9dca81361512924b8927b451fd90f6634ef3683ec
MD5 8ecc1439c82fbd994a870b0c938d2e4d
BLAKE2b-256 7205aef88cb2e92b65a45dd2bd2130fc598ebf68806102fef2ea3fd5f91f272a

See more details on using hashes here.

File details

Details for the file mo_json-1.0.17085-py2.7.egg.

File metadata

File hashes

Hashes for mo_json-1.0.17085-py2.7.egg
Algorithm Hash digest
SHA256 a02a588bdc531433acdc6fb45c11cbd5dd09375e818318c5cf70be9a7e16656a
MD5 7af910d3a0b363b73532244b329fbb79
BLAKE2b-256 7237bdca72bb09b23b2cc8cb0b69fcc5556c84cdf273b822315c3673a76c9ba9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page