Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

JSONB data flattener for Postgresl

Project description

The pg_jsonb_flattener is a tool to ease selection of Postgresql table which contains JSONB typed fields.

Provided with a description of the data (a sort of jsonschema) contained in a JSONB field, it is able to build a selectable sqlalchemy expression which can be used as a plain Postgresql table for SELECT queries.

Status

branch CI Coverage
master master_pipeline master_coverage
develop develop_pipeline develop_coverage

Usage

The tool is used by instantiating a JsonbTable providing it:

  • a sqlaclehmy table description as the only positionnal argument
  • a JSON data description for each JSONB field to flatten as named arguments where the argument name is the JSONB field name and the argument value is the JSON data description.

The code below illustrates how the tool can be used to obtain a flat selectable from a data description written in a JSON file.

from sqlalchemy import create_engine, MetaData, Table
from pg_jsonb_flattener import JsonbTable

ENGINE = create_engine('some database URL')
METADATA = MetaData()
ONE_JSON_FIELD_TABLE = Table(
    'one_json_field_table', METADATA,
    Column('field_one', JSONB)
)
DATA_DESCRIPTION = json.load('path/to/data_description.json')

jsonb_table = JsonbTable(ONE_JSON_FIELD_TABLE, field_one=DATA_DESCRIPTION)

flat_selectable = jsonb_table.get_query()

with engine.connect() as conn:
    result = conn.execute(flat_selectable).fetchall()

Then, the flat_selectable variable contains a selectable expression which acts as a table where the JSONB field_one field has been replaced by several scalar typed fields whose names are build from the original JSONB field name and the path of the data in the data structure, with a double undercore as separator.

You can execute directly the flat_selectable to inspect what it contains (this is what is done in the example to get the result variable) or use it as a plain table to build more complex queries.

See below examples for more details.

Simple types

Given the following data description:

{
  "type": "object",
  "properties": {
    "integer": {"type": "integer"},
    "number": {"type": "number"},
    "string": {"type": "string"}
    "boolean": {"type": "boolean"}
  }
}

flat_selectable would provide the following fields:

  • field_one__integer
  • field_one__number
  • field_one__string
  • field_one__boolean

Given the following JSON data in one record:

{
  "integer": 42,
  "number": 3.141592,
  "string": "pouet",
  "boolean": false
}

The result variable would contain this:

field_one__integer field_one__number field_one__string field_one__boolean
42 3.141592 pouet false

Simple objects

Given the following data description:

{
  "type": "object",
  "properties": {
    "object": {
      "type": "object",
      "properties": {
        "obj_int": {"type": "integer"},
        "obj_num": {"type": "number"},
        "obj_str": {"type": "string"}
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__object__obj_int
  • field_one__object__obj_num
  • field_one__object__obj_str

Given the following JSON data in one record:

{
  "object": {
    "obj_int": 42,
    "obj_num": 3.141592,
    "obj_str": "pouet"
  }
}

The result variable would contain this:

field_one__object__obj_int field_one__object__obj_num field_one__object__obj_str
42 3.141592 pouet

Simple arrays

Given the following data description:

{
  "type": "object",
  "properties": {
    "int_arr": {
      "type": "array",
      "items": {"type": "integer"}
    },
    "num_arr": {
      "type": "array",
      "items": {"type": "number"}
    },
    "str_arr": {
      "type": "array",
      "items": {"type": "string"}
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__int_arr__item
  • field_one__int_arr__item__index
  • field_one__num_arr__item
  • field_one__num_arr__item__index
  • field_one__str_arr__item
  • field_one__str_arr__item__index

Note

As you can see each array field is suffixed by an item part and each array data is identified by an index which allows to retrieve the order of the data in the JSONB array.

Given the following JSON data in one record:

{
  "int_arr": [42, 77, 1664],
  "num_arr": [3.141592, 1.618034, 37.2],
  "str_arr": ["pouet", "foo", "bar"]
}

The result variable would contain this:

field_one__int_arr__item field_one__int_arr__item__index field_one__num_arr__item field_one__num_arr__item__index field_one__str_arr__item field_one__str_arr__item__index
42 1 NULL NULL NULL NULL
77 2 NULL NULL NULL NULL
1664 3 NULL NULL NULL NULL
NULL NULL 3.141592 1 NULL NULL
NULL NULL 1.618034 2 NULL NULL
NULL NULL 37.2 3 NULL NULL
NULL NULL NULL NULL pouet 1
NULL NULL NULL NULL foo 2
NULL NULL NULL NULL bar 3

Array of objects

Given the following data description:

{
  "type": "object",
  "properties": {
    "array": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "obj_int": {"type": "integer"},
          "obj_num": {"type": "number"},
          "obj_str": {"type": "string"}
        }
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__array__item__index
  • field_one__array__item__obj_int
  • field_one__array__item__obj_num
  • field_one__array__item__obj_str

Given the following JSON data in one record:

{
  "array": [
    {
      "obj_int": 42,
      "obj_num": 3.141592,
      "obj_str": "pouet"
    },
    {
      "obj_int": 77,
      "obj_num": 1.618034,
      "obj_str": "toto"
    }
  ]
}

The result variable would contain this:

field_one__array__item__index field_one__array__item__obj_int field_one__array__item__obj_num field_one__array__item__obj_str
1 42 3.141592 pouet
2 77 1.618034 toto

Arrays in object

Given the following data description:

{
  "type": "object",
  "properties": {
    "object": {
      "type": "object",
      "properties": {
        "int_arr": {
          "type": "array",
          "items": {"type": "integer"}
        },
        "num_arr": {
          "type": "array",
          "items": {"type": "number"}
        },
        "str_arr": {
          "type": "array",
          "items": {"type": "string"}
        }
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__object__int_arr__item
  • field_one__object__int_arr__item__index
  • field_one__object__num_arr__item
  • field_one__object__num_arr__item__index
  • field_one__object__str_arr__item
  • field_one__object__str_arr__item__index

Given the following JSON data in one record:

{
  "object": {
    "int_arr": [42, 77, 1664],
    "num_arr": [3.141592, 1.618034, 37.2],
    "str_arr": ["pouet", "foo", "bar"]
  }
}

The result variable would contain this:

field_one__object__int_arr__item field_one__object__int_arr__item__index field_one__object__num_arr__item field_one__object__num_arr__item__index field_one__object__str_arr__item field_one__object__str_arr__item__index
42 1 NULL NULL NULL NULL
77 2 NULL NULL NULL NULL
1664 3 NULL NULL NULL NULL
NULL NULL 3.141592 1 NULL NULL
NULL NULL 1.618034 2 NULL NULL
NULL NULL 37.2 3 NULL NULL
NULL NULL NULL NULL pouet 1
NULL NULL NULL NULL foo 2
NULL NULL NULL NULL bar 3

Nested objects

Given the following data description:

{
  "type": "object",
  "properties": {
    "object": {
      "type": "object",
      "properties": {
        "nested": {
          "type": "object",
          "properties": {
            "obj_int": {"type": "integer"},
            "obj_num": {"type": "number"},
            "obj_str": {"type": "string"}
          }
        }
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__object__nested__obj_int
  • field_one__object__nested__obj_num
  • field_one__object__nested__obj_str

Given the following JSON data in one record:

{
  "object": {
    "nested": {
      "obj_int": 42,
      "obj_num": 3.141592,
      "obj_str": "pouet"
    }
  }
}

The result variable would contain this:

field_one__object__nested__obj_int field_one__object__nested__obj_num field_one__object__nested__obj_str
42 3.141592 pouet

Nested arrays

Given the following data description:

{
  "type": "object",
  "properties": {
    "nested_arrays": {
      "type": "array",
      "items": {
        "type": "array",
        "items": {"type": "string"}
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__nested_arrays__item__index
  • field_one__nested_arrays__item__item__index
  • field_one__nested_arrays__item__item

Given the following JSON data in one record:

{
  "nested_arrays": [
    ["abc", "def"],
    ["tuw", "xyz"]
  ]
}

The result variable would contain this:

field_one__nested_arrays__item__index field_one__nested_arrays__item__item__index field_one__nested_arrays__item__item
1 1 abc
1 2 def
2 1 tuw
2 2 xyz

Data “preservation”

the pg_jsonb_flattener tool allows to specify chunks of JSON data not to be flattened by providing a full name of data to preserve in a list passed at JsonbTable instantiation via the preserved optional argument. See examples below with nested arrays:

jsonb_table = JsonbTable(ONE_JSON_FIELD_TABLE, field_one=DATA_DESCRIPTION,
                         preserved=['field_one__nested_arrays__item'])

flat_selectable = jsonb_table.get_query()

with engine.connect() as conn:
    result = conn.execute(flat_selectable).fetchall()

In this case, the flat_selectable would provide the following fields:

  • field_one__nested_arrays__item__index
  • field_one__nested_arrays__item

The result variable would contain this:

field_one__nested_arrays__item__index field_one__nested_arrays__item
1 [“abc”, “def”]
2 [“tuw”, “xyz”]

This also works for the nesting array and preserving the field_one__nested_arrays would result in the following fields and result:

  • field_one__nested_arrays
field_one__nested_arrays
[[“abc”, “def”], [“tuw”, “xyz”]]

Objects can also be preserved, see examples with nested objects.

Preserving field_one__object__nested gives the following fields and result:

  • field_one__object__nested
field_one__object__nested
{‘obj_int’: 42, ‘obj_num’: 3.141592, ‘obj_str’: ‘pouet’}

Preserving field_one__object gives the following fields and result:

  • field_one__object
field_one__object__nested
{‘nested’: {‘obj_int’: 42, ‘obj_num’: 3.141592, ‘obj_str’: ‘pouet’}}

Note

Of course, it is possible to preserve objects in arrays and arrays in objects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pg-jsonb-flattener, version 0.2.0
Filename, size File type Python version Upload date Hashes
Filename, size pg_jsonb_flattener-0.2.0-py3-none-any.whl (9.8 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size pg_jsonb_flattener-0.2.0.tar.gz (22.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page