Skip to main content

easy data structure access utility

Project description

Travis

Travis

WildPath

A path abstraction to access items in composite (e.g. JSON) objects in python.

Introduction

This module is intended primarily as a practical tool to access data in complex data structures. Especially accessing multiple items usually requires for-loops or other constructs and there is no straightforward way to pass nested locations as single parameters. This module solves this problem by introdicing 2 classes:

  • Path allows to get, set and delete single items in the data structure; it is optimized for speed,

  • WildPath does the same and allows wildcards and boolean logic (and, or, not) in paths to get, set and delete to multiple items in one call,

  • Both have iterators (in the common baseclass) to run through all paths and values in a data structure.

As an typical example we take the JSON response of a call to maps.googleapis.com for the route between 2 addresses. The response is over 390 lines of JSON if nicely formatted. However we will only be interested in the geo_locations of the individual steps (turn-by-turn instructions) of the route.

In normal code this would look something like (with json_route the result from the call to the google API):

def get_geo_locations(json_route):
    geo_locs = []
    for json_step in json_route["routes"][0]["legs"][0]["steps"]:  #  there is only 1 route and 1 leg in the response
        geo_locs.append({"start_location": json_step["start_location"],
                         "end_location": json_step["end_location"]})
    return geo_locs

geo_locations = get_geo_locations(json_route)

Using WildPath the same result is obtained by:

location_path = WildPath("routes.0.legs.0.steps.*.*_location")

geo_locations = location_path.get_in(json_route)  # note the path.get_in(obj) syntax versus the usual obj.get(key)

Both produce the same list of items:

[
    {
        "start_location": {
            "lat": 52.0800134,
            "lng": 4.3271703
        },
        "end_location": {
            "lat": 52.0805958,
            "lng": 4.3286669
        }
    },
    ...
]

Essentially the function definition is replaced by a string, using WildPath.get_in for the correct lookup logic. This has some advantages:

  • Less lines of code means lower likelyhood of bugs,

  • Better readability and maintainablity (once you get used to the path-notation),

  • A Path or WildPath is easily serializable (str(Path("a.b.c")) == "a.b.c"), where a function definition is not.

Prerequisites

The module can be installed with pip install wildpath. It is tested for both python 2.7 and python 3.3 to 3.6.

Functionality

The ``Path`` class supports, with e.g. path = Path("a.0.b") and obj = {"a": [{"b": 1}]}:

  • get_in: getting items from data structures: path.get_in(obj),

  • set_in: setting values in data structures: path.set_in(obj, value),

  • del_in: deleting items from data structures: path.del_in(obj),

  • has_in: checking whether a value exists at path: path.has_in(obj),

  • pop_in: deleting and returning items from data structures: path.pop_in(obj).

It also has some iterators that run through all paths and values in a data structure:

  • Path.items(obj): iterator over all (path, value) tuples in the object,

  • Path.paths(obj): iterator over all paths in the object,

  • Path.values(obj): iterator over all values in the object.

The ``WildPath`` class supports the same functionality as Path, but with the following additions:

  • Keys referring to mappings (e.g. dict) or python class objects can contain wildcards: WildPath("*.a*.b?"), with * for any string and ? for any single character. Wildcards use the standard python fnmatch.fnmatchcase,

  • Keys referring to sequences (e.g. list, tuple) can contain slices: WildPath("1:3.::2"), with : from standard python slice notation some_list[start:stop:step],

  • All keys can contain boolean logic, using & for AND, | for OR and ! for NOT: WildPath("a*&!*b"): keys starting with 'a' and not ending with 'b'.

Note that:

  • The iterator methods of WildPath return paths of type WildPath, instead of Path,

  • If a key or index is not found in the data, a KeyError or IndexError will be raised,

  • get_in can take a default parameter, that is returned if no value exists at the path location: path.get_in(obj, None),

  • Using wildpaths will return instances of the classes in the original object for mappings and sequences. For (other) python objects it will return a dict. For example WildPath(":2").get_in((1, 2, 3)) will return (1, 2).

Examples

Starting with this example structure of an agenda item in some tool:

agenda = {
    "meeting": "progress on project X",
    "date": "2017-8-14",
    "start_time": "10:00",
    "end_time": "11:00",
    "invited": ["Joe", "Ann", "Boo"],
    "items": [
        {
            "name": "opening",
            "duration": "5 minutes",
            "subjects": ["purpose of the meeting"],
        },
        {
            "name": "progress",
            "duration": "25 minutes",
            "subjects": ["milestones", "project delays", "actions"],
        },
        {
            "name": "closing",
            "duration": "5 minutes",
            "subjects": ["questions", "roundup"],
        },
    ]
}

class Path

The ‘Path’ class let you get, set or delete items at a specific location:

from wildpath.paths import Path

path = Path("items.0.duration")
assert str(path) == "items.0.duration"  # str(..) returns the original path string

duration = path.get_in(agenda)  # retrieves value at path location
assert duration == "5 minutes"

path.set_in(agenda, "10 minutes")  # sets value at path location
assert path.get_in(agenda) == "10 minutes"

path.del_in(agenda)  # deletes key-value at path loation
assert path.has_in(agenda) == False  # has_in checks the presence of a value at the path location

class WildPath

WildPath supports the same API as Path, but additionally lets you use wildcards and slicing in the path definition to access multiple items in the structure (the Path class is there because for single lookups it is substantially faster):

from wildpath.paths import WildPath

wildpath = WildPath("items.*.duration")  # basic 'star' notation

durations = wildpath.get_in(agenda)  # retrieves all the durations of the items on the agenda
assert durations == ["5 minutes", "25 minutes", "5 minutes"]

wildpath.set_in(agenda, ["10 minutes", "50 minutes", "10 minutes"])  # setting all the values,
assert wildpath.get_in(agenda) == ["10 minutes", "50 minutes", "10 minutes"]

wildpath.set_in(agenda, "30 minutes")  #  or replacing all with a single value,
assert wildpath.get_in(agenda) == ["30 minutes", "30 minutes", "30 minutes"]

wildpath.del_in(agenda)  # delete all the items at wildpath from the structure
assert wildpath.has_in(agenda) == False  # `has_in` checks if all the items at wildpath are there

To get the start and end time of the meeting:

wildpath = WildPath("*_time")
assert wildpath.get_in(agenda) == {"start_time": "10:00", "end_time": "11:00"}

Similarly it supports slices as wildcard like path-elements

wildpath = WildPath("items.0:2.name")
assert wildpath.get_in(agenda) == ["opening", "progress"]

wildpath = WildPath("items.!0:2.name")  # slices can be negated
assert wildpath.get_in(agenda) == [ "closing"]

wildpath = WildPath("items.-1::-1.name")  # extended slicing also works, but orders are not reversed for a negative step parameter
assert wildpath.get_in(agenda) == ["opening", "progress", "closing"]

WildPath supports a boolean logic:

# '|' is the OR operator

assert WildPath("start_time|end_time").get_in(agenda) == {"start_time": "10:00", "end_time": "11:00"}

# '&' is the AND operator

assert WildPath("start_*&*_time").get_in(agenda) == {"start_time": "10:00"}


# '!' is the NOT operator:

assert WildPath("!item?").get_in({"item1": "chair", "item2": "table", "count": 2}) == {"count": 2}

# parentheses can be used to indicate precedence:

assert WildPath("!(a|b)") != WildPath("!a|b")

Notes:

  • WildPath also supports attribute lookup in nested objects, list attributes in objects, etc.,

  • All the examples of WildPath.get_in also work for set_in, del_in, pop_in and has_in,

  • In wildpath.set_in(obj, value), value can either be a single value (which will be used to set all target values), or a data structure with the same ‘shape’ as the result of wildpath.get_in(obj).

Iterators

The Path classes also have some iterator classmethods defined:

from wildpath.paths import Path

for path, value in Path.items(agenda):
    print(" ".join([str(path), ":", value]))

prints

date : 2017-8-14
end_time : 11:00
invited.0 : Joe
invited.1 : Ann
invited.2 : Boo
items.0.duration : 5 minutes
items.0.name : opening
items.0.subjects.0 : purpose of the meeting
items.1.duration : 25 minutes
items.1.name : progress

etc...

To create an alternative representation of the datastructure:

D = {str(path): value for path, value in Path.items(agenda)}

Path.items() has an optional argument all that if set to True will iterate over all path, value combination, including intermediary paths:

from wildpath.paths import Path

for path, value in Path.items(agenda, all=True):
    print(" ".join([str(path), ":", value]))

will print:

date : 2017-8-14
end_time : 11:00
invited : ['Joe', 'Ann', 'Boo']
invited.0 : Joe
invited.1 : Ann
invited.2 : Boo
items : [{'duration': '5 minutes', 'subjects': ['purpose of the meeting'], ...]
items.0 : {'duration': '5 minutes', 'subjects': ['purpose of the meeting'], 'name': 'opening'}
items.0.duration : 5 minutes
items.0.name : opening
items.0.subjects : ['purpose of the meeting']
items.0.subjects.0 : purpose of the meeting

etc...

With the Path.items(obj, all=True) and the ordering the items are produced, more manipulations are possible, e.g.:

from datetime import datetime
from wildpath.paths import Path

sample = {
    "name": "sample",
    "times": [datetime(1999,1,2,3), datetime(1999,1,2,4)]
}

new_sample = {}
for path, value in Path.items(sample, all=True):
    if isinstance(value, datetime):
        value = str(value)  # all values of type datetime are converted to strings
    path.set_in(new_sample, value)

# new_sample is now serializable to JSON

Notes:

  • Currently these iterators cannot handle circular relationships. This will result in a RuntimeError (recursion depth) ,

  • The iterators return generators, not lists or dicts. To do this, use list(Path.items(obj)), dict(Path.items(obj)),

  • These iterators can also be useful the get an alternative view on a datastructure: a starting point to define WildPaths,

  • To turn the items into a dict with string keys, use dct = {str(p): v for p, v in Path.items(obj)}.

Path manipulations

Path and WildPath are subclasses of tuple (via BasePath), so (almost) all tuple methods can be used with both, e.g.:

from wildpath.paths import Path

assert Path("a.b") + Path("c") == Path("a.b.c")
assert Path("a.b.c")[1:] == Path("b.c")
assert repr(Path("a.b.c")) == "('a', 'b', 'c')"

# however, tuple.__str__ is overridden to return the input string for the class constructor for easy (de)serialization:

assert str(Path("a.b.c")) == "a.b.c"

Note that some methods (like __add__ and path[1:]) are overridden to return the correct class (Path or WildPath)

Limitations

Because of the characters used to parse the paths, some keys in the target datastructures will cause the system to fail:

  • In python objects Path and WildPath will lookup keys in the instance __dict__. This means that some constructions like property and overridden __getattr__ will not be taken into account,

  • for Path and WildPath: keys in Mappings (e.g. dict, OrderedDict) cannot contain a .,

  • for WildPath: keys in Mappings cannot contain the characters *, ?, !, | and &, or to be precise, if they are present, they cannot be used in paths for lookups,

  • note that the . separator can easily be replaced in a subclass, allowing paths like "a/b/3/x" instead of "a.b.3.x" (and therefore path "a/b.c/3/x" with b.c a dictionary key):

from wildpath.paths import Path, WildPath

class SlashPath(Path):
    sep = '/'

class WildSlashPath(WildPath):
    sep = '/'

Overriding !, | and & will take a little more work: override class-attribute tokens in WildPath and override KeyParser.DEFAULT_TOKENS. Currently there is no way to override hte tokens * and ? in WildPath.

Testing

The unittests are standard python unittests and can be run as such.

Authors

Lars van Gemerden (rational-it) - initial code and documentation.

License

This project is licensed under the license in LICENSE.txt.

Acknowledgments

  • For convincing me to open-source this module, a big thanks to Jasper Hartong,

  • For the creators of the module boolean.py, thanks for making boolean parsing a lot easier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildpath-0.1.7.zip (18.1 kB view details)

Uploaded Source

Built Distribution

wildpath-0.1.7-py2.py3-none-any.whl (24.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file wildpath-0.1.7.zip.

File metadata

  • Download URL: wildpath-0.1.7.zip
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wildpath-0.1.7.zip
Algorithm Hash digest
SHA256 294c1897d8ac93773bf7111376c02409864fc711c2e858a50e2fb72267188829
MD5 31d5ea789cd56a1ab9c47721cc646368
BLAKE2b-256 2de5c321dabfef999a0ca4f0ba66dde8d7845fa15adcce3f7b4b04797b070276

See more details on using hashes here.

File details

Details for the file wildpath-0.1.7-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for wildpath-0.1.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 175e67acc747884a211aa6802fd1e3deea698c736a58bdcabc9cf2498f9ca429
MD5 12e47e89a021a1ffe17bab74c0d97445
BLAKE2b-256 cb9046bb7f73f741180c30b26b4abba134d717787df277b8b956cc9a30c06da1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page