Skip to main content

Recursively flattens a JSON-like structure into a list of flat dicts.

Project description

JSON Normalize

PyPI PyPI - License PyPI - Python Version PyPI - Status

This package contains a function, json_normalize. It will take a json-like structure and convert it to a map object which returns dicts. Output dicts will have their path joined by ".", this can of course be customized.

Data association will flows up and down inside dicts although in iterables, e.g. lists, data

Installation

Install the package json_normalize version 1.1+ from PyPI.
The recommended requirements.txt line is json_normalize~=1.1.

json_normalize.json_normalize

json_normalize.json_normalize(
    tree: Union[dict, Iterable],
    combine_lists: Literal["chain", "product"] = None,
    drop_nodes: Iterable[str] = (),
    freeze_nodes: Iterable[str] = (),
    key_joiner: Union[str, Callable] = ".",
)
  • tree - A json like structure. Any iterable inside the object that is not a dict or a string will be treated as a list.
  • combine_lists=None - If there are two different branches in the json like object the function will have to know how to combine these. If the default None is used the function does not know how to handle them and will raise an error. However if combine_lists="chain" simply put them after eachother similar to itertool.chain. The other option would be combine_lists="product" this will use the itertool.product to combine the different branches.
  • drop_nodes=() - This makes it possible to ignore nodes with certain names
  • freeze_nodes=() - This makes it possible to preserve nodes with certain names, the function will not recursivly keep normalizing anything below this node. If this node contains a dict it will be a dict in the end as well.
  • key_joiner="." - If you want to customize the path. key_joiner takes either a function or a string as input. If it is a function, it will recieve the path to a certain node in to form of a tuple. If key_joiner is a string it will be converted to a function as this: lambda p: key_joiner.join(p)

Examples

A General use case:

>>> from json_normalize import json_normalize
>>> json_like = {
...     "city": "Stockholm",
...     "coords": {
...         "lat": 59.331924,
...         "long": 18.062297
...     },
...     "measurements": [
...         {
...             "time": 1624363200,
...             "temp": {"val": 28, "unit": "C"},
...             "wind": {"val": 2.8, "dir": 290, "unit": "m/s"},
...         },
...         {
...             "time": 1624366800,
...             "temp": {"val": 26, "unit": "C"},
...         }
...     ]
... }
>>> normal_json = json_normalize(json_like)
>>> normal_json
<map object at ...>

>>> list(normal_json)
[
    {
        'city': 'Stockholm',
        'coords.lat': 59.331924,
        'coords.long': 18.062297,
        'measurements.time': 1624363200,
        'measurements.temp.val': 28,
        'measurements.temp.unit': 'C',
        'measurements.wind.val': 2.8,
        'measurements.wind.dir': 290,
        'measurements.wind.unit': 'm/s'
    },
    {
        'city': 'Stockholm',
        'coords.lat': 59.331924,
        'coords.long': 18.062297,
        'measurements.time': 1624366800,
        'measurements.temp.val': 26,
        'measurements.temp.unit': 'C'
    }
]

Information always flow both in and out of each container, here data in both a and c node are associated as their closest common node (the root) is a dict. linked via b.

>>> json_like = {
...     "a": 1,
...     "b": {
...         "c": "x",
...         "d": 2
...     }
... }
>>> list(json_normalize(json_like))
[
    {
        "a": 1,
        "b.c": "x",
        "b.d": 2
    }
]

However id the closest common node is a list like object the information is not associated with each other, e.g. the nodes g=2 and h=3 closest common node is a list and therefor, in the output, that data ends up in different objects.

>>> tree = {
...     "a": 1,
...     "b": [
...         {
...             "c": "x",
...             "g": 2
...         },
...         {
...             "c": "y",
...             "h": 3
...         }
...     ]
... }
>>> list(json_normalize(tree))
[
    {
        "a": 1,
        "b.c": "x",
        "b.h" 2
    },
    {
        "a": 1,
        "b.c": "y",
        "b.g": 3
    }
]

Even if a branch contains more data in a deeper layer as long as that data is contained inside a dict that data will be associated with the data in other branches.

>>> tree = {
...     "a": {
...         "j": 1.1,
...         "k": 1.2
...     },
...     "b": [
...         {
...             "c": "x",
...             "d": 2
...         },
...         {
...             "c": "y",
...             "d": 3
...         }
...     ]
... }
>>> list(json_normalize(tree))
[
    {
        "j": 1.1,
        "k": 1.2,
        "c": "x",
        "d": 2
    },
    {
        "j": 1.1,
        "k": 1.2,
        "c": "y",
        "d": 3
    }
]

When there are multiple lists in different branches the fucntion will have to know how to combine this. Default is None which will raise an error incase this happens. "chain" will put the information after eachother and "product" will combine the information as shown below.

>>> tree = {
...     "a": 1,
...     "b": [
...         {"x": "1"},
...         {"x": "2"}
...     ],
...     "c": [
...         {"y": "3"},
...         {"y": "4"}
...     ]
... }
>>> list(json_normalize(tree))
ValueError()

>>> list(json_normalize(tree, combine_lists="chain"))
[
    {"a": 1, "b.x": "1"},
    {"a": 1, "b.x": "1"},
    {"a": 1, "c.y": "3"},
    {"a": 1, "c.y": "4"},
]

>>> list(json_normalize(tree, combine_lists="product"))
[
    {"a": 1, "b.x": "1", "c.y": "3"},
    {"a": 1, "b.x": "1", "c.y": "4"},
    {"a": 1, "b.x": "2", "c.y": "3"},
    {"a": 1, "b.x": "2", "c.y": "4"},
]

If you want to make sure you do not copy information into to many branches you can leave the combine_lists=None and instead drop problematic nodes with the argument drop_nodes=("b",).

>>> tree = {
...     "a": 1,
...     "b": [
...         {"x": "1"},
...         {"x": "2"}
...     ],
...     "c": [
...         {"y": "1"},
...         {"y": "2"}
...     ]
... }
>>> list(json_normalize(tree, drop_nodes=("b",)))
[
    {"a": 1, "c.y": "1"},
    {"a": 1, "c.y": "2"},
]

If you wish to customize the path generated you can to that by giving the key_joiner argument.

>>> tree = {
...     "a": 1,
...     "b": [
...         {"x": "1"},
...         {"x": "2"}
...     ],
... }

>>> def key_joiner(path: tuple) -> string:
...     return path[-1]

>>> list(json_normalize(tree, key_joiner=key_joiner))
[
    {"a": 1, "x": "1"},
    {"a": 1, "x": "2"},
]

>>> list(json_normalize(tree, key_joiner=" -> "))
[
    {"a": 1, "b -> x": "1"},
    {"a": 1, "b -> x": "2"},
]

The function will also accept generators and simlar objects.

>>> from itertools import chain


>>> def meta_generator():
...     yield {"who": "generator", "val": a_generator(1)}
...     yield {"who": "range", "val": range(10, 12)}
...     yield {"who": "map", "val": map(lambda x: x**2, range(20, 22))}
...     yield {"who": "chain", "val": chain([30], [31])}


>>> def a_generator(n):
...     yield n
...     yield 2 * n


>>> list(json_normalize(meta_generator())):
[
    {'who': 'generator', 'val': 1},
    {'who': 'generator', 'val': 2},
    {'who': 'range', 'val': 10},
    {'who': 'range', 'val': 11},
    {'who': 'map', 'val': 400},
    {'who': 'map', 'val': 441},
    {'who': 'chain', 'val': 30},
    {'who': 'chain', 'val': 31},
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json-normalize-1.1.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

json_normalize-1.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file json-normalize-1.1.0.tar.gz.

File metadata

  • Download URL: json-normalize-1.1.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for json-normalize-1.1.0.tar.gz
Algorithm Hash digest
SHA256 35d7fe742acfae3d5b0b87c6f6f12c703010a825401c63ca9889107fcbdaf31e
MD5 8522e2b2bb5b216de53869be5f60166c
BLAKE2b-256 a02d0003aaee1fe285df9a7ca7a4233f8e05dc82daa075670f7a179b2e7ccf29

See more details on using hashes here.

File details

Details for the file json_normalize-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for json_normalize-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5eb82bb07cae8321f8d186d739ad1a114c9cc96484515287d897c7d7c1f894f9
MD5 5b49edeeaf48c9d0208095264642f370
BLAKE2b-256 a7c09739c8f5b556067929fa5b28cdf9c6afeb7ffc6aee63b5adfdf4655494df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page