Extending the python json package functionality

These details have not been verified by PyPI

Project links

Homepage

Project description

JSON Extended

A module to extend the python json package functionality:

Treat a directory structure like a nested dictionary:
- lightweight plugin system: define bespoke classes for parsing different file extensions (in-the-box: .json, .csv, .hdf5) and encoding/decoding objects
- lazy loading: read files only when they are indexed into
- tab completion: index as tabs for quick exploration of data
Manipulation of nested dictionaries:
- enhanced pretty printer
- Javascript rendered, expandable tree in the Jupyter Notebook
- functions including; filter, merge, flatten, unflatten, diff
- output to directory structure (of n folder levels)
On-disk indexing option for large json files (using the ijson package)
Units schema concept to apply and convert physical units (using the pint package)

Documentation: https://jsonextended.readthedocs.io

JSON Extended

Installation

From Conda (recommended):

conda install -c conda-forge jsonextended

From PyPi:

pip install jsonextended

jsonextended has no import dependancies, on Python 3.x and only pathlib2 on 2.7 but, for full functionallity, it is advised to install the following packages:

conda install -c conda-forge ijson numpy pint h5py pandas

Basic Example

from jsonextended import edict, plugins, example_mockpaths

Take a directory structure, potentially containing multiple file types:

datadir = example_mockpaths.directory1
print(datadir.to_string(indentlvl=3,file_content=True))

Folder("dir1")
   File("file1.json") Contents:
    {"key2": {"key3": 4, "key4": 5}, "key1": [1, 2, 3]}
   Folder("subdir1")
     File("file1.csv") Contents:
       # a csv file
      header1,header2,header3
      val1,val2,val3
      val4,val5,val6
      val7,val8,val9
     File("file1.literal.csv") Contents:
       # a csv file with numbers
      header1,header2,header3
      1,1.1,string1
      2,2.2,string2
      3,3.3,string3
   Folder("subdir2")
     Folder("subsubdir21")
       File("file1.keypair") Contents:
         # a key-pair file
        key1 val1
        key2 val2
        key3 val3
        key4 val4

Plugins can be defined for parsing each file type (see Creating Plugins section):

plugins.load_builtin_plugins('parsers')
plugins.view_plugins('parsers')

{'csv.basic': 'read *.csv delimited file with headers to {header:[column_values]}',
 'csv.literal': 'read *.literal.csv delimited files with headers to {header:column_values}, with number strings converted to int/float',
 'hdf5.read': 'read *.hdf5 (in read mode) files using h5py',
 'json.basic': 'read *.json files using json.load',
 'keypair': "read *.keypair, where each line should be; '<key> <pair>'"}

LazyLoad then takes a path name, path-like object or dict-like object, which will lazily load each file with a compatible plugin.

lazy = edict.LazyLoad(datadir)
lazy

{file1.json:..,subdir1:..,subdir2:..}

Lazyload can then be treated like a dictionary, or indexed by tab completion:

list(lazy.keys())

['subdir1', 'subdir2', 'file1.json']

lazy[['file1.json','key1']]

[1, 2, 3]

lazy.subdir1.file1_literal_csv.header2

[1.1, 2.2, 3.3]

For pretty printing of the dictionary:

edict.pprint(lazy,depth=2)

file1.json:
  key1: [1, 2, 3]
  key2: {...}
subdir1:
  file1.csv: {...}
  file1.literal.csv: {...}
subdir2:
  subsubdir21: {...}

Numerous functions exist to manipulate the nested dictionary:

edict.flatten(lazy.subdir1)

{('file1.csv', 'header1'): ['val1', 'val4', 'val7'],
 ('file1.csv', 'header2'): ['val2', 'val5', 'val8'],
 ('file1.csv', 'header3'): ['val3', 'val6', 'val9'],
 ('file1.literal.csv', 'header1'): [1, 2, 3],
 ('file1.literal.csv', 'header2'): [1.1, 2.2, 3.3],
 ('file1.literal.csv', 'header3'): ['string1', 'string2', 'string3']}

LazyLoad parses the plugins.decode function to parser plugin's read_file method (keyword 'object_hook'). Therefore, bespoke decoder plugins can be set up for specific dictionary key signatures:

print(example_mockpaths.jsonfile2.to_string())

File("file2.json") Contents:
{"key1":{"_python_set_": [1, 2, 3]},"key2":{"_numpy_ndarray_": {"dtype": "int64", "value": [1, 2, 3]}}}

edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()

{u'key1': {u'_python_set_': [1, 2, 3]},
 u'key2': {u'_numpy_ndarray_': {u'dtype': u'int64', u'value': [1, 2, 3]}}}

plugins.load_builtin_plugins('decoders')
plugins.view_plugins('decoders')

{'decimal.Decimal': 'encode/decode Decimal type',
 'numpy.ndarray': 'encode/decode numpy.ndarray',
 'pint.Quantity': 'encode/decode pint.Quantity object',
 'python.set': 'decode/encode python set'}

dct = edict.LazyLoad(example_mockpaths.jsonfile2).to_dict()
dct

{u'key1': {1, 2, 3}, u'key2': array([1, 2, 3])}

This process can be reversed, using encoder plugins:

plugins.load_builtin_plugins('encoders')
plugins.view_plugins('encoders')

{'decimal.Decimal': 'encode/decode Decimal type',
 'numpy.ndarray': 'encode/decode numpy.ndarray',
 'pint.Quantity': 'encode/decode pint.Quantity object',
 'python.set': 'decode/encode python set'}

import json
json.dumps(dct,default=plugins.encode)

'{"key2": {"_numpy_ndarray_": {"dtype": "int64", "value": [1, 2, 3]}}, "key1": {"_python_set_": [1, 2, 3]}}'

Creating and Loading Plugins

from jsonextended import plugins, utils

Plugins are recognised as classes with a minimal set of attributes matching the plugin category interface:

plugins.view_interfaces()

{'decoders': ['plugin_name', 'plugin_descript', 'dict_signature'],
 'encoders': ['plugin_name', 'plugin_descript', 'objclass'],
 'parsers': ['plugin_name', 'plugin_descript', 'file_regex', 'read_file']}

plugins.unload_all_plugins()
plugins.view_plugins()

{'decoders': {}, 'encoders': {}, 'parsers': {}}

For example, a simple parser plugin would be:

class ParserPlugin(object):
    plugin_name = 'example'
    plugin_descript = 'a parser for *.example files, that outputs (line_number:line)'
    file_regex = '*.example'
    def read_file(self, file_obj, **kwargs):
        out_dict = {}
        for i, line in enumerate(file_obj):
            out_dict[i] = line.strip()
        return out_dict

Plugins can be loaded as a class:

plugins.load_plugin_classes([ParserPlugin],'parsers')
plugins.view_plugins()

{'decoders': {},
 'encoders': {},
 'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)'}}

Or by directory (loading all .py files):

fobj = utils.MockPath('example.py',is_file=True,content="""
class ParserPlugin(object):
    plugin_name = 'example.other'
    plugin_descript = 'a parser for *.example.other files, that outputs (line_number:line)'
    file_regex = '*.example.other'
    def read_file(self, file_obj, **kwargs):
        out_dict = {}
        for i, line in enumerate(file_obj):
            out_dict[i] = line.strip()
        return out_dict
""")
dobj = utils.MockPath(structure=[fobj])
plugins.load_plugins_dir(dobj,'parsers')
plugins.view_plugins()

{'decoders': {},
 'encoders': {},
 'parsers': {'example': 'a parser for *.example files, that outputs (line_number:line)',
  'example.other': 'a parser for *.example.other files, that outputs (line_number:line)'}}

For a more complex example of a parser, see jsonextended.complex_parsers

Interface specifications

Parsers:
- file_regex attribute, a str denoting what files to apply it to. A file will be parsed by the longest regex it matches.
- read_file method, which takes an (open) file object and kwargs as parameters
Decoders:
- dict_signature attribute, a tuple denoting the keys which the dictionary must have, e.g. dict_signature=('a','b') decodes {'a':1,'b':2}
- from_... method(s), which takes a dict object as parameter. The plugins.decode function will use the method denoted by the intype parameter, e.g. if intype='json', then from_json will be called.
Encoders:
- objclass attribute, the object class to apply the encoding to, e.g. objclass=decimal.Decimal encodes objects of that type
- to_... method(s), which takes a dict object as parameter. The plugins.encode function will use the method denoted by the outtype parameter, e.g. if outtype='json', then to_json will be called.

Extended Examples

For more information, all functions contain doc-strings with tested examples.

Data Folders JSONisation

from jsonextended import ejson, edict, utils

path = utils.get_test_path()
ejson.jkeys(path)

['dir1', 'dir2', 'dir3']

jdict1 = ejson.to_dict(path)
edict.pprint(jdict1,depth=2)

dir1:
  dir1_1: {...}
  file1: {...}
  file2: {...}
dir2:
  file1: {...}
dir3:

edict.to_html(jdict1,depth=2)

To try the rendered JSON tree, output in the Jupyter Notebook, go to : https://chrisjsewell.github.io/

Nested Dictionary Manipulation

jdict2 = ejson.to_dict(path,['dir1','file1'])
edict.pprint(jdict2,depth=1)

initial: {...}
meta: {...}
optimised: {...}
units: {...}

filtered = edict.filter_keys(jdict2,['vol*'],use_wildcards=True)
edict.pprint(filtered)

initial:
  crystallographic:
    volume: 924.62752781
  primitive:
    volume: 462.313764
optimised:
  crystallographic:
    volume: 1063.98960509
  primitive:
    volume: 531.994803

edict.pprint(edict.flatten(filtered))

(initial, crystallographic, volume):   924.62752781
(initial, primitive, volume):          462.313764
(optimised, crystallographic, volume): 1063.98960509
(optimised, primitive, volume):        531.994803

Units Schema

from jsonextended.units import apply_unitschema, split_quantities
withunits = apply_unitschema(filtered,{'volume':'angstrom^3'})
edict.pprint(withunits)

initial:
  crystallographic:
    volume: 924.62752781 angstrom ** 3
  primitive:
    volume: 462.313764 angstrom ** 3
optimised:
  crystallographic:
    volume: 1063.98960509 angstrom ** 3
  primitive:
    volume: 531.994803 angstrom ** 3

newunits = apply_unitschema(withunits,{'volume':'nm^3'})
edict.pprint(newunits)

initial:
  crystallographic:
    volume: 0.92462752781 nanometer ** 3
  primitive:
    volume: 0.462313764 nanometer ** 3
optimised:
  crystallographic:
    volume: 1.06398960509 nanometer ** 3
  primitive:
    volume: 0.531994803 nanometer ** 3

edict.pprint(split_quantities(newunits),depth=4)

initial:
  crystallographic:
    volume:
      magnitude: 0.92462752781
      units:     nanometer ** 3
  primitive:
    volume:
      magnitude: 0.462313764
      units:     nanometer ** 3
optimised:
  crystallographic:
    volume:
      magnitude: 1.06398960509
      units:     nanometer ** 3
  primitive:
    volume:
      magnitude: 0.531994803
      units:     nanometer ** 3

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.7.11

Jun 18, 2019

0.7.10

Apr 22, 2019

0.7.9

Feb 26, 2019

0.7.8

Feb 26, 2019

0.7.7

Jul 2, 2018

0.7.6

May 4, 2018

0.7.4

Oct 20, 2017

0.7.3

Oct 16, 2017

0.7.2

Oct 16, 2017

0.7.1

Oct 16, 2017

0.7.0

Oct 16, 2017

0.6.4

Oct 15, 2017

0.6.3

Oct 9, 2017

0.6.2

Sep 23, 2017

0.6.1

Sep 18, 2017

0.6.0

Sep 15, 2017

0.5.7

Sep 6, 2017

0.5.6

Sep 5, 2017

0.5.5

Sep 4, 2017

0.5.4

Sep 2, 2017

0.5.3

Aug 30, 2017

0.5.2

Aug 30, 2017

0.5.0

Aug 28, 2017

0.4.6

Aug 25, 2017

0.4.5

Aug 24, 2017

0.4.4

Aug 23, 2017

0.4.3

Aug 22, 2017

0.4.2

Aug 11, 2017

0.4.1

Aug 10, 2017

0.4.0

Aug 10, 2017

0.3.7

Aug 5, 2017

0.3.6

Aug 1, 2017

0.3.5

Jul 9, 2017

0.3.4

Jul 9, 2017

0.3.3

Jul 7, 2017

0.3.2

Jul 7, 2017

0.3.1

Jul 7, 2017

0.3.0

Jul 5, 2017

0.1.4

Jun 13, 2017

0.1.3.4

Jun 11, 2017

0.1.3.3

Jun 3, 2017

0.1.3.2

Jun 2, 2017

0.1.3.1

Jun 2, 2017

0.1.3

Jun 2, 2017

0.1.2

Jun 1, 2017

0.1.1

Jun 1, 2017

0.1.0

Jun 1, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonextended-0.7.11.tar.gz (430.8 kB view details)

Uploaded Jun 18, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jsonextended-0.7.11-py2.py3-none-any.whl (466.9 kB view details)

Uploaded Jun 18, 2019 Python 2Python 3

File details

Details for the file jsonextended-0.7.11.tar.gz.

File metadata

Download URL: jsonextended-0.7.11.tar.gz
Upload date: Jun 18, 2019
Size: 430.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.7

File hashes

Hashes for jsonextended-0.7.11.tar.gz
Algorithm	Hash digest
SHA256	`8044ddc359c8ff91b5b3183be33822131bfddf85ddcc2fd91640029b2c51464a`
MD5	`f337a765dbaa6d64c0a7b842e60b676d`
BLAKE2b-256	`9a0b423feb7f13c1b1f15f9ef89c078c40a33799d56ead6465c962457a863590`

See more details on using hashes here.

File details

Details for the file jsonextended-0.7.11-py2.py3-none-any.whl.

File metadata

Download URL: jsonextended-0.7.11-py2.py3-none-any.whl
Upload date: Jun 18, 2019
Size: 466.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.7

File hashes

Hashes for jsonextended-0.7.11-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`f4d8d7099af352156ad6babe9633225329183ca7a81f9d93bb55238a5f312bbe`
MD5	`d11c83914f9bf3493bfd3cf7c2d4e0be`
BLAKE2b-256	`7baae084e46ed3a7aab0b910790ca82f496e71dc5a2b7cc64793ee54f5d8bbd3`

See more details on using hashes here.

jsonextended 0.7.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

JSON Extended

Contents

Installation

Basic Example

Creating and Loading Plugins

Interface specifications

Extended Examples

Data Folders JSONisation

Nested Dictionary Manipulation

Units Schema

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes