Skip to main content

This is a easy to use library for walking through json data.

Project description

JsonWalker

Allows simple, quick, and easy use of nested Json

JsonWalker's goal is to allow the user to specify a path through nested json, and get the items at each match in the json. Currently, it only outputs generators.

Installation

JsonWalker is a pip installable package. It is a public package, and will thus not need any special permissions to install.

Command Line Installation

To install from the command line, pip install git+https://github.com/byuawsfhtl/JsonWalker.git.

Use in other pip packages

As this is a public package, it can be added to the required packages of any pip installable packages and will be installed automatically when those are installed in other projects.

Usage

To use this in a project, install using one of the installation methods shown above.

Import the walk command into your file: from JsonWalker.walk import walk

As the walk command makes a generator, it can be used in multiple ways.

Use in a for loop

for context1, context2, item1, item2 in walk(json, path):
    ...

Use outside of a for loop

context1, context2, item1, item2 = next(walk(json, path))

The next function is a python function that gets the "next" value in the generator, and can be called multiple times in a row if desired.

Paths

JsonWalker has its own syntax for the path provided for walking through nested json.

Below are the symbols used and some example usages

Symbol Meaning
| A path divider, used to organize path sections
[ The beginning of an index specification
] The end of an index specification
: Index range indicator
( The beginning of a default specification
) The end of a default specification
; Used to separate a default value and its type
, The delimiter for the multi item return syntax
> A continuing symbol for the multi item return syntax that works much like |
^ Raises the current item as a context of the path
* The wildcard symbol for the index and dict iteration specifications
{ The beginning of a dict iteration specifictation
} The end of a dict iteration specifictation

Examples

Simple

{
  "key1": [
    {
      "key2": 100
    },
    {}
  ]
}

The following path can be used to iterate through the json, getting the 'key2' value out and raising the current key1 item as a context

'key1[*]^ | key2(-1;int)'

It would be called as such

for key1Val, key2 in walk(json, 'key1[*]^ | key2(-1;int)'):
    print(key2)

and would output

100
-1

with key1Val being the current item inside of the key1 list

Path Breakdown

First Part 'key1[*]^'

The key 'key1' is accessed, returning the inner list.

Then, the index specifier tells JsonWalker to iterate over all values in the list.

Finally, the current value in the iteration of the list is raised as a context.

Second Part 'key2(-1;int)'

The key 'key2' is accessed, returning the value at that spot in the current dictionary.

If 'key2' is not a key in the dictionary, then the default specification '(-1;int)' is used. In this case, it returns -1 when 'key2' does not exist.

Multi Item Return

{
  "key1": {
    "key2": {
      "item1": "hello",
      "item2": "world"
    }
  }
}

The following path would iterate through the json, returning item1 and item2 as values, with the outer key1 dictionary as the context

'key1^ | key2 | item1, item2'

It would be called as such

for context, item1, item2 in walk(json, 'key1^ | key2 | item1, item2'):
    print(item1 + " " + item2)

and would output

hello world

Range specification

{
  "key1": [
    {
      "item": "do not want"
    },
    {
      "item": "hello"
    },
    {
      "item": "world"
    }
  ]
}

The following path would iterate through the json, only looking in the given range of the key1 list, outputing the 'item' key values

'key1[1:*] | item'

It would be called as such

for item in walk(json, 'key1[1:*] | item'):
    print(item)

and would output

hello
world

Multi Value Continue

{
  "key1": {
    "key2": {
      "item1": "hello"
    },
    "key3": {
      "item2": "world"
    },
    "key4": {
      "item4": "not accessing"
    }
  }
}

The following path would iterate through the json, returing the three different items that have diverging paths

'key1 | key2 > item1(''), key3 > item2(''), key4 > item3('')'

It would be called as such

for item1, item2, item3 in walk(json, 'key1 | key2 > item1(''), key3 > item2(''), key4 > item3('')'):
    print(item1)
    print(item2)
    print(item3)

and would output

hello
world
Path Explanation

First Part 'key1'

The given json is accessed at the key1 key.

Second Part 'key2 > item1('';str), key3 > item2('';str), key4 > item3('';str)'

This part of the path specifies three different return items, as shown by the 2 commas separating the three sections.

Each section does the following:

Accesses the key before the >, accesses the key following the > on the item retrieved before the > and returns the default if the key does not exist on the previous item.

Dict Iteration

{
  "key1": {
    "key2": "value2",
    "key3": "value3",
    "key4": "value4"
  }
}

The following path would iterate through the json, returning the key as a context and the value as a value

'key1{*}'

It would be called as such

for key, value in walk(json, 'key1{*}'):
    print(f"{key} -- {value}")

and would output

key2 -- value2
key3 -- value3
key4 -- value4
Path Explanation

First Part: 'key1'

The given json is accessed at key1

Second Part: '{*}'

The value at 'key1' is iterated through, puting the key in the context list and the value as the current value

Involved Example and Demonstration of use benefits

Old Python Code

persons = arkInfo.get('persons', [])
familyId = ''
for person in persons:
    for name in person.get('names', []):
        for nameForms in name.get('nameForms', []):
            for parts in nameForms.get('parts', []):
                for fields in parts.get('fields', []):
                    for value in fields.get('values', []):
                        labelID: str = value.get('labelId', '')
                        if 'PR' in labelID and 'FTHR' not in labelID and 'MTHR' not in labelID:
                            arkPerson = person
                            headID = person.get('id', '')
                            personLinks = person.get('links', {})
                            persona = personLinks.get('persona', {})
                            href = persona.get('href', '')
                            if ark not in href:
                                hrefPersonaSplit = href.split('personas/')[1]
                                familyId = hrefPersonaSplit.split('?flag')[0]
                                isHead = False
                            familyId = ark
                            return headTuple(ark=ark, arkPerson=arkPerson, isHead=isHead, headID=headID, familyId=familyId)

    for fields in person.get('fields', {}):
        for values in fields.get('values', {}):
            text = values.get('text', '')
            if 'Head' in text or 'Глава' in text: #'Глава' is head in russian
                personLinks = person.get('links', {})
                persona = personLinks.get('persona', {})
                href = persona.get('href', '')
                if ark not in href:
                    hrefPersonaSplit = href.split('personas/')[1]
                    familyId = hrefPersonaSplit.split('?flag')[0]
                    isHead = False
                arkPerson = person
                familyId = ark
                return headTuple(ark=ark, arkPerson=arkPerson, isHead=isHead, headID=headID, familyId=familyId)

if persons:
    arkPerson = persons[0]
    headID = arkPerson.get('id', '')
    href = arkPerson.get('links', {}).get('persona', {}).get('href', '')
    splitId = href.split('personas/')
    if len(splitId) > 1:
        familyId = splitId[1].split('?flag')[0]
    else:
        familyId = ark

With JsonWalker

for person, headID, href in walk(arkInfo, 'persons[*]^ | id, links > persona > href('';str)'):
    for _, labelID in walk(person, 'names[*]^ | nameForms[*] | parts[*] | fields[*] | values[*] | labelId('';str)'):
        if 'PR' not in labelID or 'FTHR' in labelID or 'MTHR' in labelID:
            continue
        arkPerson = person
        if ark not in href:
            hrefPersonaSplit = href.split('personas/')[1]
            familyId = hrefPersonaSplit.split('?flag')[0]
            isHead = False
        familyId = ark
        return headTuple(ark=ark, arkPerson=arkPerson, isHead=isHead, headID=headID, familyId=familyId)

    for _, text in walk(person, "fields[*]^  | values[*] | text('';str)"):
        if 'Head' not in text and 'Глава' not in text: #'Глава' is head in russian
            continue
        if ark not in href:
            hrefPersonaSplit = href.split('personas/')[1]
            familyId = hrefPersonaSplit.split('?flag')[0]
            isHead = False
        arkPerson = person
        familyId = ark
        return headTuple(ark=ark, arkPerson=arkPerson, isHead=isHead, headID=headID, familyId=familyId)

familyId = ''
for person, href in walk(arkInfo, 'persons[0]^|links|persona|href'):
    headID = person.get('id', '')
    splitId = href.split('personas/')
    if len(splitId) > 1:
        familyId = splitId[1].split('?flag')[0]
    else:
        familyId = ark

Some Constraints

  1. Indexes must be in the form [n] or [n:m]
  2. Defaults must be in the form (value;type)
  3. MultiValues must be in the form value1, value2, ...
  4. The MultiValue must be the last path in the string
  5. The MultiValue must not contain a path divider, it instead uses a comma for separation and a greater than sign for continuation
  6. The MultiValue must not contain another MultiValue
  7. The Index must have a specific index if it is used in a MultiValue
  8. The MultiValue must not contain a DictIter or an Index with a wildcard
  9. In order to iterate through a list, a index/range must be specified
  10. The Default must contain a type
  11. Items leave the generator in this order, where their internal order is specified by order in the path: Contexts, Values
  12. DictIters must be {*}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jsonwalker-1.0.16.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jsonwalker-1.0.16-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file jsonwalker-1.0.16.tar.gz.

File metadata

  • Download URL: jsonwalker-1.0.16.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for jsonwalker-1.0.16.tar.gz
Algorithm Hash digest
SHA256 6eb350859fcf1dfb68a55b8d67e24c34ba83ec63faa1ab550c88c6ea22c26a64
MD5 aad46f7efec05f76cb0e3f06782ea533
BLAKE2b-256 5a8948dbd429d6f16981ce536da31c9f629884e5cc4132d38ff5683b19703196

See more details on using hashes here.

File details

Details for the file jsonwalker-1.0.16-py3-none-any.whl.

File metadata

  • Download URL: jsonwalker-1.0.16-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for jsonwalker-1.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 d909ea066f62d8da79fedb42f9b48c58a7a343c528f23191143617a779205583
MD5 39070bf266e908778e628ddfe991cb95
BLAKE2b-256 115e81e4e18965f9e9a79560b50bc197199d9cd397f117cf7b637930d4d320fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page