More JSON Tools!
Project description
More JSON Tools!
================
This set of modules solves three problems:
- JSON encoding is slow (``mo_json.encode``)
- We want to iterate over massive JSON easily (``mo_json.stream``)
- A bi-jection between strictly typed JSON, and dynamic typed JSON.
Module ``mo_json.encode``
=========================
Function: ``mo_json.encode.json_encoder()``
-------------------------------------------
**Update Mar2016 - PyPy version 5.x appears to have improved C
integration to the point that the C library callbacks are no longer a
significant overhead: This pure Python JSON encoder is no longer faster
than a compound C/Python solution.**
Fast JSON encoder used in ``convert.value2json()`` when running in Pypy.
Run the
`speedtest <https://github.com/klahnakoski/pyLibrary/blob/dev/tests/speedtest_json.py>`__
to compare with default implementation and ujson
Module ``mo_json.stream``
=========================
A module supporting the implementation of queries over very large JSON
strings. The overall objective is to make a large JSON document appear
like a hierarchical database, where arrays of any depth, can be queried
like tables.
Limitations
~~~~~~~~~~~
This is not a generic streaming JSON parser. This module has two main
restrictions:
1. **Objects are not streamed** - All objects will reside in memory.
Large objects, with a multitude of properties, may cause problems.
Property names should be known at query time. If you must serialize
large objects; instead of ``{<name>: <value>}`` format, try a list of
name/value pairs instead: ``[{"name": <name>, "value": <value>}]``
This format is easier to query, and gentler on the various document
stores that you may put this data into.
2. **Array values must be the last object property** - If you query into
a nested array, all sibling properties found after that array must be
ignored (must not be in the ``expected_vars``). If not, then those
arrays will not benefit from streaming, and will reside in memory.
Function ``mo_json.stream.parse()``
-----------------------------------
Will return an iterator over all objects found in the JSON stream.
**Parameters:**
- **json** - a parameter-less function, when called returns some number
of bytes from the JSON stream. It can also be a string.
- **path** - a list of strings specifying the nested JSON paths. Use
``"."`` if your JSON starts with ``[``, and is a list.
- **expected\_vars** - a list of strings specifying the full property
names required (all other properties are ignored)
Examples
~~~~~~~~
**Simple Iteration**
::
json = {"b": "done", "a": [1, 2, 3]}
parse(json, path="a", required_vars=["a", "b"]}
We will iterate through the array found on property ``a``, and return
both ``a`` and ``b`` variables. It will return the following values:
::
{"b": "done", "a": 1}
{"b": "done", "a": 2}
{"b": "done", "a": 3}
**Bad - Property follows array**
The same query, but different JSON with ``b`` following ``a``:
::
json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a", "b"]}
Since property ``b`` follows the array we're iterating over, this will
raise an error.
**Good - No need for following properties**
The same JSON, but different query, which does not require ``b``:
::
json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a"]}
If we do not require ``b``, then streaming will proceed just fine:
::
{"a": 1}
{"a": 2}
{"a": 3}
**Complex Objects**
This streamer was meant for very long lists of complex objects. Use
dot-delimited naming to refer to full name of the property
::
json = [{"a": {"b": 1, "c": 2}}, {"a": {"b": 3, "c": 4}}, ...
parse(json, path=".", required_vars=["a.c"])
The dot (``.``) can be used to refer to the top-most array. Notice the
structure is maintained, but only includes the required variables.
::
{"a": {"c": 2}}
{"a": {"c": 4}}
...
**Nested Arrays**
Nested array iteration is meant to mimic a left-join from parent to
child table; as such, it includes every record in the parent.
::
json = [
{"o": 1: "a": [{"b": 1}: {"b": 2}: {"b": 3}: {"b": 4}]},
{"o": 2: "a": {"b": 5}},
{"o": 3}
]
parse(json, path=[".", "a"], required_vars=["o", "a.b"])
The ``path`` parameter can be a list, which is used to indicate which
properties are expected to have an array, and to iterate over them.
Please notice if no array is found, it is treated like a singleton
array, and missing arrays still produce a result.
::
{"o": 1, "a": {"b": 1}}
{"o": 1, "a": {"b": 2}}
{"o": 1, "a": {"b": 3}}
{"o": 1, "a": {"b": 4}}
{"o": 2, "a": {"b": 5}}
{"o": 3}
Module ``typed_encoder``
========================
One reason NoSQL documents stores are wonderful is the fact their schema
can automatically expand to accept new properties. Unfortunately, this
flexibility is not limitless; A string assigned to property prevents an
object being assigned to the same, or visa-versa. This flexibility is
under attack by the strict-typing zealots, who, in their self righteous
delusion believe explicit types are better, actually make the lives of
humans worse; with endless schema modifications.
This module translates JSON documents into "typed" form; which allows
document containers to store both objects and primitives in the same
property value. This allows storage of values with no containing object!
How it works
~~~~~~~~~~~~
Typed JSON uses ``$value`` and ``$object`` properties to markup the
original JSON:
- All JSON objects are annotated with ``"$object":"."``, which makes
querying object existence (especially the empty object) easier.
- All primitive values are replaced with an object with a single
``$value`` property: So ``"value"`` gets mapped to
``{"$value": "value"}``.
Of course, the typed JSON has a different form than the original, and
queries into the documents store must take this into account.
Fortunately, the use of typed JSON is intended to be hidden behind a
query abstraction layer.
Function ``typed_encode()``
---------------------------
Accepts a ``dict``, ``list``, or primitive value, and generates the
typed JSON that can be inserted into a document store.
Function ``json2typed()``
-------------------------
Converts an existing JSON unicode string and returns the typed JSON
unicode string for the same.
--------------
also see http://tools.ietf.org/id/draft-pbryan-zyp-json-ref-03.html
================
This set of modules solves three problems:
- JSON encoding is slow (``mo_json.encode``)
- We want to iterate over massive JSON easily (``mo_json.stream``)
- A bi-jection between strictly typed JSON, and dynamic typed JSON.
Module ``mo_json.encode``
=========================
Function: ``mo_json.encode.json_encoder()``
-------------------------------------------
**Update Mar2016 - PyPy version 5.x appears to have improved C
integration to the point that the C library callbacks are no longer a
significant overhead: This pure Python JSON encoder is no longer faster
than a compound C/Python solution.**
Fast JSON encoder used in ``convert.value2json()`` when running in Pypy.
Run the
`speedtest <https://github.com/klahnakoski/pyLibrary/blob/dev/tests/speedtest_json.py>`__
to compare with default implementation and ujson
Module ``mo_json.stream``
=========================
A module supporting the implementation of queries over very large JSON
strings. The overall objective is to make a large JSON document appear
like a hierarchical database, where arrays of any depth, can be queried
like tables.
Limitations
~~~~~~~~~~~
This is not a generic streaming JSON parser. This module has two main
restrictions:
1. **Objects are not streamed** - All objects will reside in memory.
Large objects, with a multitude of properties, may cause problems.
Property names should be known at query time. If you must serialize
large objects; instead of ``{<name>: <value>}`` format, try a list of
name/value pairs instead: ``[{"name": <name>, "value": <value>}]``
This format is easier to query, and gentler on the various document
stores that you may put this data into.
2. **Array values must be the last object property** - If you query into
a nested array, all sibling properties found after that array must be
ignored (must not be in the ``expected_vars``). If not, then those
arrays will not benefit from streaming, and will reside in memory.
Function ``mo_json.stream.parse()``
-----------------------------------
Will return an iterator over all objects found in the JSON stream.
**Parameters:**
- **json** - a parameter-less function, when called returns some number
of bytes from the JSON stream. It can also be a string.
- **path** - a list of strings specifying the nested JSON paths. Use
``"."`` if your JSON starts with ``[``, and is a list.
- **expected\_vars** - a list of strings specifying the full property
names required (all other properties are ignored)
Examples
~~~~~~~~
**Simple Iteration**
::
json = {"b": "done", "a": [1, 2, 3]}
parse(json, path="a", required_vars=["a", "b"]}
We will iterate through the array found on property ``a``, and return
both ``a`` and ``b`` variables. It will return the following values:
::
{"b": "done", "a": 1}
{"b": "done", "a": 2}
{"b": "done", "a": 3}
**Bad - Property follows array**
The same query, but different JSON with ``b`` following ``a``:
::
json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a", "b"]}
Since property ``b`` follows the array we're iterating over, this will
raise an error.
**Good - No need for following properties**
The same JSON, but different query, which does not require ``b``:
::
json = {"a": [1, 2, 3], "b": "done"}
parse(json, path="a", required_vars=["a"]}
If we do not require ``b``, then streaming will proceed just fine:
::
{"a": 1}
{"a": 2}
{"a": 3}
**Complex Objects**
This streamer was meant for very long lists of complex objects. Use
dot-delimited naming to refer to full name of the property
::
json = [{"a": {"b": 1, "c": 2}}, {"a": {"b": 3, "c": 4}}, ...
parse(json, path=".", required_vars=["a.c"])
The dot (``.``) can be used to refer to the top-most array. Notice the
structure is maintained, but only includes the required variables.
::
{"a": {"c": 2}}
{"a": {"c": 4}}
...
**Nested Arrays**
Nested array iteration is meant to mimic a left-join from parent to
child table; as such, it includes every record in the parent.
::
json = [
{"o": 1: "a": [{"b": 1}: {"b": 2}: {"b": 3}: {"b": 4}]},
{"o": 2: "a": {"b": 5}},
{"o": 3}
]
parse(json, path=[".", "a"], required_vars=["o", "a.b"])
The ``path`` parameter can be a list, which is used to indicate which
properties are expected to have an array, and to iterate over them.
Please notice if no array is found, it is treated like a singleton
array, and missing arrays still produce a result.
::
{"o": 1, "a": {"b": 1}}
{"o": 1, "a": {"b": 2}}
{"o": 1, "a": {"b": 3}}
{"o": 1, "a": {"b": 4}}
{"o": 2, "a": {"b": 5}}
{"o": 3}
Module ``typed_encoder``
========================
One reason NoSQL documents stores are wonderful is the fact their schema
can automatically expand to accept new properties. Unfortunately, this
flexibility is not limitless; A string assigned to property prevents an
object being assigned to the same, or visa-versa. This flexibility is
under attack by the strict-typing zealots, who, in their self righteous
delusion believe explicit types are better, actually make the lives of
humans worse; with endless schema modifications.
This module translates JSON documents into "typed" form; which allows
document containers to store both objects and primitives in the same
property value. This allows storage of values with no containing object!
How it works
~~~~~~~~~~~~
Typed JSON uses ``$value`` and ``$object`` properties to markup the
original JSON:
- All JSON objects are annotated with ``"$object":"."``, which makes
querying object existence (especially the empty object) easier.
- All primitive values are replaced with an object with a single
``$value`` property: So ``"value"`` gets mapped to
``{"$value": "value"}``.
Of course, the typed JSON has a different form than the original, and
queries into the documents store must take this into account.
Fortunately, the use of typed JSON is intended to be hidden behind a
query abstraction layer.
Function ``typed_encode()``
---------------------------
Accepts a ``dict``, ``list``, or primitive value, and generates the
typed JSON that can be inserted into a document store.
Function ``json2typed()``
-------------------------
Converts an existing JSON unicode string and returns the typed JSON
unicode string for the same.
--------------
also see http://tools.ietf.org/id/draft-pbryan-zyp-json-ref-03.html
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mo-json-1.0.17035.zip
(27.7 kB
view details)
Built Distribution
mo_json-1.0.17035-py2.7.egg
(43.3 kB
view details)
File details
Details for the file mo-json-1.0.17035.zip
.
File metadata
- Download URL: mo-json-1.0.17035.zip
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5dae2825bc6a36eabeae047dbbcee2f51f531ad209fb09728fe4a1690ce35812 |
|
MD5 | c4f1e714e03e636a2ec6737274debbfa |
|
BLAKE2b-256 | 9153799785f7bbf5dba82700d128c614b78425e2c8f3ea3d8d271fb64c4e3b1d |
File details
Details for the file mo_json-1.0.17035-py2.7.egg
.
File metadata
- Download URL: mo_json-1.0.17035-py2.7.egg
- Upload date:
- Size: 43.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49f2810ee4eb54773ae01cb3282f84c9f42b2277f8768e29b62c932a80d01858 |
|
MD5 | a0fa38caf149f9135d83a368b39f7def |
|
BLAKE2b-256 | 38fe9d587f9ae4d25f11bb5ccf4702b5837f97c4cba5da3df264b0514b14d850 |