Skip to main content

Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.

Project description


JSON tricks (python)
---------------------------------------

The `pyjson-tricks` package brings several pieces of functionality to python handling of json files:

1. **Store and load numpy arrays** in human-readable format.
2. **Store and load class instances** both generic and customized.
3. **Store and load date/times** as a dictionary (including timezone).
4. **Preserve map order** ``{}`` using ``OrderedDict``.
5. **Allow for comments** in json files by starting lines with ``#``.
`6. Sets, complex numbers, Decimal, Fraction, enums, compression, duplicate keys, ...

As well as compression and disallowing duplicate keys.

* Code: https://github.com/mverleg/pyjson_tricks
* Documentation: http://json-tricks.readthedocs.org/en/latest/
* PIP: https://pypi.python.org/pypi/json_tricks

The 2.0 series added some of the above features and broke backward compatibility. The version 3.0 series is a more readable rewrite that also makes it easier to combine encoders, again not fully backward compatible.

Several keys of the format ``__keyname__`` have special meanings, and more might be added in future releases.

If you're considering JSON-but-with-comments as a config file format, have a look at HJSON_, it might be more appropriate. For other purposes, keep reading!

Thanks for all the Github stars!

Installation and use
---------------------------------------

You can install using

.. code-block:: bash

pip install json-tricks # or e.g. 'json-tricks<3.0' for older versions

Decoding of some data types needs the corresponding package to be installed, e.g. ``numpy`` for arrays, ``pandas`` for dataframes and ``pytz`` for timezone-aware datetimes.

You can import the usual json functions dump(s) and load(s), as well as a separate comment removal function, as follows:

.. code-block:: bash

from json_tricks import dump, dumps, load, loads, strip_comments

The exact signatures of these and other functions are in the documentation_.

``json-tricks`` supports Python 2.7, and Python 3.4 and later, and is automatically tested on 2.7, 3.4, 3.5 and 3.6. Pypy is supported without numpy and pandas.

Preserve type vs use primitive
-------------------------------

By default, types are encoded such that they can be restored to their original type when loaded with ``json-tricks``. Example encodings in this documentation refer to that format.

You can also choose to store things as their closest primitive type (e.g. arrays and sets as lists, decimals as floats). This may be desirable if you don't care about the exact type, or you are loading the json in another language (which doesn't restore python types). It's also smaller.

To forego meta data and store primitives instead, pass ``primitives`` to ``dump(s)``. This is available in version ``3.8`` and later. Example:

.. code-block:: python

data = [
arange(0, 10, 1, dtype=int).reshape((2, 5)),
datetime(year=2017, month=1, day=19, hour=23, minute=00, second=00),
1 + 2j,
Decimal(42),
Fraction(1, 3),
MyTestCls(s='ub', dct={'7': 7}), # see later
set(range(7)),
]
# Encode with metadata to preserve types when decoding
print(dumps(data))

.. code-block:: javascript

// (comments added and indenting changed)
[
// numpy array
{
"__ndarray__": [
[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
"dtype": "int64",
"shape": [2, 5],
"Corder": true
},
// datetime (naive)
{
"__datetime__": null,
"year": 2017,
"month": 1,
"day": 19,
"hour": 23
},
// complex number
{
"__complex__": [1.0, 2.0]
},
// decimal & fraction
{
"__decimal__": "42"
},
{
"__fraction__": true
"numerator": 1,
"denominator": 3,
},
// class instance
{
"__instance_type__": [
"tests.test_class",
"MyTestCls"
],
"attributes": {
"s": "ub",
"dct": {"7": 7}
}
},
// set
{
"__set__": [0, 1, 2, 3, 4, 5, 6]
}
]

.. code-block:: python

# Encode as primitive types; more simple but loses type information
print(dumps(data, primitives=True))

.. code-block:: javascript

// (comments added and indentation changed)
[
// numpy array
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]],
// datetime (naive)
"2017-01-19T23:00:00",
// complex number
[1.0, 2.0],
// decimal & fraction
42.0,
0.3333333333333333,
// class instance
{
"s": "ub",
"dct": {"7": 7}
},
// set
[0, 1, 2, 3, 4, 5, 6]
]

Note that valid json is produced either way: ``json-tricks`` stores meta data as normal json, but other packages probably won't interpret it.

Features
---------------------------------------

Numpy arrays
+++++++++++++++++++++++++++++++++++++++

The array is encoded in sort-of-readable and very flexible and portable format, like so:

.. code-block:: python

arr = arange(0, 10, 1, dtype=uint8).reshape((2, 5))
print(dumps({'mydata': arr}))

this yields:

.. code-block:: javascript

{
"mydata": {
"dtype": "uint8",
"shape": [2, 5],
"Corder": true,
"__ndarray__": [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
}
}

which will be converted back to a numpy array when using ``json_tricks.loads``. Note that the memory order (``Corder``) is only stored in v3.1 and later and for arrays with at least 2 dimensions.

As you've seen, this uses the magic key ``__ndarray__``. Don't use ``__ndarray__`` as a dictionary key unless you're trying to make a numpy array (and know what you're doing).

Numpy scalars are also serialized (v3.5+). They are represented by the closest python primitive type. A special representation was not feasible, because Python's json implementation serializes some numpy types as primitives, without consulting custom encoders. If you want to preverse the exact numpy type, use encode_scalars_inplace_.

**Performance**: this method has slow write times similar to other human-readable formats, although read time is worse than csv. File size (with compression) is high on a relative scale, but it's only around 30% above binary. See this benchmark_ (it's called JSONGzip). A binary alternative `might be added`_, but is not yet available.

This implementation is inspired by an answer by tlausch on stackoverflow_ that you could read for details.

Class instances
+++++++++++++++++++++++++++++++++++++++

``json_tricks`` can serialize class instances.

If the class behaves normally (not generated dynamic, no ``__new__`` or ``__metaclass__`` magic, etc) *and* all it's attributes are serializable, then this should work by default.

.. code-block:: python

# json_tricks/test_class.py
class MyTestCls:
def __init__(self, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)

cls_instance = MyTestCls(s='ub', dct={'7': 7})

json = dumps(cls_instance, indent=4)
cls_instance_again = loads(json)

You'll get your instance back. Here the json looks like this:

.. code-block:: javascript

{
"__instance_type__": [
"json_tricks.test_class",
"MyTestCls"
],
"attributes": {
"s": "ub",
"dct": {
"7": 7
}
}
}

As you can see, this stores the module and class name. The class must be importable from the same module when decoding (and should not have changed).
If it isn't, you have to manually provide a dictionary to ``cls_lookup_map`` when loading in which the class name can be looked up. Note that if the class is imported, then ``globals()`` is such a dictionary (so try ``loads(json, cls_lookup_map=glboals())``).
Also note that if the class is defined in the 'top' script (that you're calling directly), then this isn't a module and the import part cannot be extracted. Only the class name will be stored; it can then only be deserialized in the same script, or if you provide ``cls_lookup_map``.

Note that this also works with ``slots`` without having to do anything (thanks to ``koffie``), which encodes like this (custom indentation):

.. code-block:: javascript

{
"__instance_type__": ["module.path", "ClassName"],
"slots": {"slotattr": 37},
"attributes": {"dictattr": 42}
}

If the instance doesn't serialize automatically, or if you want custom behaviour, then you can implement ``__json__encode__(self)`` and ``__json_decode__(self, **attributes)`` methods, like so:

.. code-block:: python

class CustomEncodeCls:
def __init__(self):
self.relevant = 42
self.irrelevant = 37

def __json_encode__(self):
# should return primitive, serializable types like dict, list, int, string, float...
return {'relevant': self.relevant}

def __json_decode__(self, **attrs):
# should initialize all properties; note that __init__ is not called implicitly
self.relevant = attrs['relevant']
self.irrelevant = 12

As you've seen, this uses the magic key ``__instance_type__``. Don't use ``__instance_type__`` as a dictionary key unless you know what you're doing.

Date, time, datetime and timedelta
+++++++++++++++++++++++++++++++++++++++

Date, time, datetime and timedelta objects are stored as dictionaries of "day", "hour", "millisecond" etc keys, for each nonzero property.

Timezone name is also stored in case it is set. You'll need to have ``pytz`` installed to use timezone-aware date/times, it's not needed for naive date/times.

.. code-block:: javascript

{
"__datetime__": null,
"year": 1988,
"month": 3,
"day": 15,
"hour": 8,
"minute": 3,
"second": 59,
"microsecond": 7,
"tzinfo": "Europe/Amsterdam"
}

This approach was chosen over timestamps for readability and consistency between date and time, and over a single string to prevent parsing problems and reduce dependencies. Note that if ``primitives=True``, date/times are encoded as ISO 8601, but they won't be restored automatically.

Don't use ``__date__``, ``__time__``, ``__datetime__``, ``__timedelta__`` or ``__tzinfo__`` as dictionary keys unless you know what you're doing, as they have special meaning.

Order
+++++++++++++++++++++++++++++++++++++++

Given an ordered dictionary like this (see the tests for a longer one):

.. code-block:: python

ordered = OrderedDict((
('elephant', None),
('chicken', None),
('tortoise', None),
))

Converting to json and back will preserve the order:

.. code-block:: python

from json_tricks import dumps, loads
json = dumps(ordered)
ordered = loads(json, preserve_order=True)

where ``preserve_order=True`` is added for emphasis; it can be left out since it's the default.

As a note on performance_, both dicts and OrderedDicts have the same scaling for getting and setting items (``O(1)``). In Python versions before 3.5, OrderedDicts were implemented in Python rather than C, so were somewhat slower; since Python 3.5 both are implemented in C. In summary, you should have no scaling problems and probably no performance problems at all, especially for 3.5 and later. Python 3.6+ preserve order of dictionaries by default making this redundant, but this is an implementation detail that should not be relied on.

Comments
+++++++++++++++++++++++++++++++++++++++

This package uses ``#`` and ``//`` for comments, which seem to be the most common conventions, though only the latter is valid javascript.

For example, you could call ``loads`` on the following string::

{ # "comment 1
"hello": "Wor#d", "Bye": "\"M#rk\"", "yes\\\"": 5,# comment" 2
"quote": "\"th#t's\" what she said", // comment "3"
"list": [1, 1, "#", "\"", "\\", 8], "dict": {"q": 7} #" comment 4 with quotes
}
// comment 5

And it would return the de-commented version:

.. code-block:: javascript

{
"hello": "Wor#d", "Bye": "\"M#rk\"", "yes\\\"": 5,
"quote": "\"th#t's\" what she said",
"list": [1, 1, "#", "\"", "\\", 8], "dict": {"q": 7}
}

Since comments aren't stored in the Python representation of the data, loading and then saving a json file will remove the comments (it also likely changes the indentation).

The implementation of comments is not particularly efficient, but it does handle all the special cases I could think of. For a few files you shouldn't notice any performance problems, but if you're reading hundreds of files, then they are presumably computer-generated, and you could consider turning comments off (``ignore_comments=False``).

Other features
+++++++++++++++++++++++++++++++++++++++

* Sets are serializable and can be loaded. By default the set json representation is sorted, to have a consistent representation.
* Save and load complex numbers (version 3.2) with ``1+2j`` serializing as ``{'__complex__': [1, 2]}``.
* Save and load ``Decimal`` and ``Fraction`` (including NaN, infinity, -0 for Decimal).
* Save and load ``Enum`` (thanks to ``Jenselme``), either built-in in python3.4+, or with the enum34_ package in earlier versions. ``IntEnum`` needs encode_intenums_inplace_.
* ``json_tricks`` allows for gzip compression using the ``compression=True`` argument (off by default).
* ``json_tricks`` can check for duplicate keys in maps by setting ``allow_duplicates`` to False. These are `kind of allowed`_, but are handled inconsistently between json implementations. In Python, for ``dict`` and ``OrderedDict``, duplicate keys are silently overwritten.

Usage & contributions
---------------------------------------

Revised BSD License; at your own risk, you can mostly do whatever you want with this code, just don't use my name for promotion and do keep the license file.

Contributions (ideas, issues, pull requests) are welcome!

.. image:: https://travis-ci.org/mverleg/pyjson_tricks.svg?branch=master
:target: https://travis-ci.org/mverleg/pyjson_tricks

.. _HJSON: https://github.com/hjson/hjson-py
.. _documentation: http://json-tricks.readthedocs.org/en/latest/#main-components
.. _stackoverflow: http://stackoverflow.com/questions/3488934/simplejson-and-numpy-array
.. _performance: http://stackoverflow.com/a/8177061/723090
.. _`kind of allowed`: http://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object
.. _benchmark: https://github.com/mverleg/array_storage_benchmark
.. _`might be added`: https://github.com/mverleg/pyjson_tricks/issues/9
.. _encode_scalars_inplace: https://json-tricks.readthedocs.io/en/latest/#json_tricks.np_utils.encode_scalars_inplace
.. _encode_intenums_inplace: https://json-tricks.readthedocs.io/en/latest/#json_tricks.utils.encode_intenums_inplace
.. _enum34: https://pypi.org/project/enum34/


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_tricks-3.11.0.tar.gz (20.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page