Skip to main content

A generic, unlimited level json normalizer which unnests and relationalizes arrays at any location. Outputs a normalised set of data which can be imported into a relational table, and pulls in parent keys for joinability

Project description

Super Json Normalize (with Array Ultra)

A generic, unlimited level json normalizer which handles arrays at any location. Outputs a normalised set of data which can be imported into a relational table

Why do I need this?

  • This outputs relational data from pretty much and analytics focuse json
  • It'll create new entities for data which should be split out into a new table
  • It'll link those entries by a set list of ID keys you specify
  • All your join keys are yours, so you won't get random breakages if you switch tools etc

If you're reporting within a relational Database, or have something in the pipeline which doesn't like structured data, this tool will do the heavy lifting to relationalise the json for you. This will save you and your teams time doing the work of unnesting objects and pulling arrays into new tables so you can query them without the weird mappings that forced unnesting offers

How do i use it for my project?

This works well as a transform step for your ETL (or even ELT)

  • Extract your data
  • Iterate through each entry
  • Run super_json_normalize.normalize_record(<your_dict_here>) on the entry
  • pipe the output arrays of entries to your needed location

Features

TODO:

  • initial dumb unnesting with id as the Primary key (DONE)
  • unnesting with configurable Primary Keys
  • multiple Primary keys for parent
  • multi-layerd primary keys (eg entities inside entities)

Design

This works by pulling out arrays into their own entities.

for the following example payload properties for a house (in directory /samples/property_data/):

//for the file property_data.json
{
        "id": "ID001",
        "address": {
                "street_address": "123 Fake street",
                "suburb": "Fakeland",
                "state": "VIC",
                "country": "Australia"
        },
        "inspection_times": [
                {"id": "IID001", "description":"First inspection date on Sunday"},
                {"id": "IID002", "description":"Second inspection date on Tuesday"},
                {"id": "IID003", "description":"Final inpection date on Friday"}
        ]  
}

The following will occur when we run super_json_normalize.normalize_record() on the loaded json data:

  • the address object will be unnested
  • a new entitiy called inspection_times will be created, and the inspection_times array will be removed from the root entity
  • the parent id from the root will be pulled into the inspection_times entity so we can join the data

this would end up with:

[
   {
    "name": "properties",                // `properties` entity
    "data": [
      {
        "id": "ID001",                   // primary key
        "address_street_address": "123 Fake street",
        "address_suburb": "Fakeland",
        "address_state": "VIC",
        "address_country": "Australia"
      }
    ]
  },
  {
    "name": "inspection_times",          // `inspection_times` entity
    "data": [
      {
        "id": "IID001",         
        "description": "First inspection date on Sunday",
        "properties_id": "ID001"         // parent key (foreign key) inherited from the root 
      },
      {
        "id": "IID002",
        "description": "Second inspection date on Tuesday",
        "properties_id": "ID001"         // parent key (foreign key) inherited from the root
      },
      {
        "id": "IID003",
        "description": "Final inpection date on Friday",
        "properties_id": "ID001"         // parent key (foreign key) inherited from the root
      }
    ]
  }
]

This is useful for preparing data for a relational db or systems requiring relational data.

usage

Example to dump each entity to json from the sample above from /samples/ This one is the property_data example:

import super_json_normalize
import json

#load the data from file
my_sample_data = super_json_normalize.load_json("./samples/property_data.json")

#now create the entities in a json file
my_output_data = super_json_normalize.normalize_record(my_sample_data, parent_name="properties")

#finally, export the entities in each individual file
super_json_normalize.export_records(my_normalized_data, path="./export_data/", format="jsonl")

###
# creates files in ./export_data/ directory 
# properties.jsonl
# inspection_times.jsonl

### properties.jsonl
# {"id": "ID001", "address_street_address": "123 Fake street", "address_suburb": "Fakeland", "address_state": "VIC", "address_country": "Australia"}

### inspection_times.jsonl
# {"id": "IID001", "description": "First inspection date on Sunday", "properties_id": "ID001"}
# {"id": "IID002", "description": "Second inspection date on Tuesday", "properties_id": "ID001"}
# {"id": "IID003", "description": "Final inpection date on Friday", "properties_id": "ID001"}

Credits

This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter .. _audreyr/cookiecutter-pypackage: https://github.com/audreyr/cookiecutter-pypackage

History

0.1.0 (2021-10-03)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

super_json_normalize-0.1.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

super_json_normalize-0.1.0-py2.py3-none-any.whl (8.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file super_json_normalize-0.1.0.tar.gz.

File metadata

  • Download URL: super_json_normalize-0.1.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11

File hashes

Hashes for super_json_normalize-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8aa9442bb2535e28d6b985cb61b85718325675dc04e44e6766eb9f758d9e1044
MD5 e3d59da62f0f1cab63f40a34254895c1
BLAKE2b-256 9f51b36811aca81ecbf08ab6c8f0b782977829a09f5f1a90e1b6f7e06ccc4d0b

See more details on using hashes here.

File details

Details for the file super_json_normalize-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: super_json_normalize-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11

File hashes

Hashes for super_json_normalize-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c533e84769dd3c777ca52800ec09e27e17d1a50c64cea1883a1d6a9a038c843f
MD5 28d397e497228a4f939340df7d612802
BLAKE2b-256 6211cbf596065f4f0bf5689781517792c91ac2e3e25610cf614135babd503af9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page