Add features to json: en/decoding of numpy arrays, preservation of ordering and ignoring of comments in input
Project description
JSON tricks (python)
At this time, the pyjson-tricks package brings three pieces of functionality to python handling of json files:
Store and load numpy arrays in human-readable format.
Preserve map order {} using OrderedDict.
Allow for comments in json files by starting lines with #.
It also allows for gzip compression using the compress=True argument (off by default).
Installation and use
You can install using
pip install json-tricks
If you want to use numpy features, you should install numpy as well. If you don’t, then numpy is not required.
You can import the usual json functions dump(s) and load(s), as well as a separate comment removal function, as follows:
from json_tricks.np import dump, dumps, load, loads, strip_hash_comments
If you do not have numpy and want to use only order preservation and commented json reading, you should import from json_tricks.nonp`` instead.
The exact signatures of these functions are in the documentation. In many cases, keeping the arguments of the standard json functions but changing the import will be enough to use the extra features.
Features
Numpy arrays
This implementation is mostly based on an answer by tlausch on stackoverflow that you could read for details.
The array is encoded in sort-of-readableformat, like so:
arr = arange(0, 10, 1, dtype=uint8).reshape((2, 5))
print dumps({'mydata': arr})
after indering this yields:
{
"mydata": {
"dtype": "uint8",
"shape": [2, 5],
"__ndarray__": [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
}
}
which will be converted back to a numpy array when using json_tricks.loads.
As you’ve seen, this uses the magic key __ndarray__. Don’t use __ndarray__ as a dictionary key unless you’re trying to make a numpy array (and know what you’re doing).
Order
Given an ordered dictionary like this (see the tests for a longer one):
ordered = OrderedDict((
('elephant', None),
('chicken', None),
('tortoise', None),
))
Converting to json and back will preserve the order:
from json_tricks import dumps, loads
json = dumps(ordered, preserve_order=True)
ordered = loads(json, preserve_order=True)
where preserve_order=True is added for emphasis; it can be left out since it’s the default.
As a note on performance, both dicts and OrderedDicts have the same scaling for getting and setting items (O(1)). In Python versions before 3.5, OrderedDicts were implemented in Python rather than C, so were somewhat slower; since Python 3.5 both are implemented in C. In summary, you should have no scaling problems and probably no performance problems at all, especially for 3.5 and later.
License
Revised BSD License; at your own risk, you can mostly do whatever you want with this code, just don’t use my name for promotion and do keep the license file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Comments
This package uses # for comments, which seems to be the most common convention. For example, you could call loads on the following string:
And it would return the de-commented version:
Since comments aren’t stored in the Python representation of the data, loading and then saving a json file will remove the comments (it also likely changes the indentation).
There is already a commentjson package for Python. However, as of November 2015, it doesn’t support Python 3.x, and a pull request to add support has been left pending for five months.
The implementation of comments is not particularly efficient, but it does handle all the special cases I tested. For a few files you shouldn’t notice any performance problems, but if you’re reading hundreds of files, then they are presumably computer-generated, and you could consider turning comments off (strip_comments=False).