Skip to main content

JSONB data flattener for Postgresl

Project description

The pg_jsonb_flattener is a tool to ease selection of Postgresql table which contains JSONB typed fields.

Provided with a description of the data (a sort of jsonschema) contained in a JSONB field, it is able to build a selectable sqlalchemy expression which can be used as a plain Postgresql table for SELECT queries.

Status

branch

CI

Coverage

master

master_pipeline

master_coverage

develop

develop_pipeline

develop_coverage

Usage

The tool is used by instantiating a JsonbTable providing it:

  • a sqlaclehmy table description as the only positionnal argument

  • a JSON data description for each JSONB field to flatten as named arguments where the argument name is the JSONB field name and the argument value is the JSON data description.

The code below illustrates how the tool can be used to obtain a flat selectable from a data description written in a JSON file.

from sqlalchemy import create_engine, MetaData, Table
from pg_jsonb_flattener import JsonbTable

ENGINE = create_engine('some database URL')
METADATA = MetaData()
ONE_JSON_FIELD_TABLE = Table(
    'one_json_field_table', METADATA,
    Column('field_one', JSONB)
)
DATA_DESCRIPTION = json.load('path/to/data_description.json')

jsonb_table = JsonbTable(ONE_JSON_FIELD_TABLE, field_one=DATA_DESCRIPTION)

flat_selectable = jsonb_table.get_query()

with engine.connect() as conn:
    result = conn.execute(flat_selectable).fetchall()

Then, the flat_selectable variable contains a selectable expression which acts as a table where the JSONB field_one field has been replaced by several scalar typed fields whose names are build from the original JSONB field name and the path of the data in the data structure, with a double undercore as separator.

You can execute directly the flat_selectable to inspect what it contains (this is what is done in the example to get the result variable) or use it as a plain table to build more complex queries.

See below examples for more details.

Simple types

Given the following data description:

{
  "type": "object",
  "properties": {
    "integer": {"type": "integer"},
    "number": {"type": "number"},
    "string": {"type": "string"}
    "boolean": {"type": "boolean"}
  }
}

flat_selectable would provide the following fields:

  • field_one__integer

  • field_one__number

  • field_one__string

  • field_one__boolean

Given the following JSON data in one record:

{
  "integer": 42,
  "number": 3.141592,
  "string": "pouet",
  "boolean": false
}

The result variable would contain this:

field_one__integer

field_one__number

field_one__string

field_one__boolean

42

3.141592

pouet

false

Simple objects

Given the following data description:

{
  "type": "object",
  "properties": {
    "object": {
      "type": "object",
      "properties": {
        "obj_int": {"type": "integer"},
        "obj_num": {"type": "number"},
        "obj_str": {"type": "string"}
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__object__obj_int

  • field_one__object__obj_num

  • field_one__object__obj_str

Given the following JSON data in one record:

{
  "object": {
    "obj_int": 42,
    "obj_num": 3.141592,
    "obj_str": "pouet"
  }
}

The result variable would contain this:

field_one__object__obj_int

field_one__object__obj_num

field_one__object__obj_str

42

3.141592

pouet

Simple arrays

Given the following data description:

{
  "type": "object",
  "properties": {
    "int_arr": {
      "type": "array",
      "items": {"type": "integer"}
    },
    "num_arr": {
      "type": "array",
      "items": {"type": "number"}
    },
    "str_arr": {
      "type": "array",
      "items": {"type": "string"}
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__int_arr__item

  • field_one__int_arr__item__index

  • field_one__num_arr__item

  • field_one__num_arr__item__index

  • field_one__str_arr__item

  • field_one__str_arr__item__index

Given the following JSON data in one record:

{
  "int_arr": [42, 77, 1664],
  "num_arr": [3.141592, 1.618034, 37.2],
  "str_arr": ["pouet", "foo", "bar"]
}

The result variable would contain this:

field_one__int_arr__item

field_one__int_arr__item__index

field_one__num_arr__item

field_one__num_arr__item__index

field_one__str_arr__item

field_one__str_arr__item__index

42

1

NULL

NULL

NULL

NULL

77

2

NULL

NULL

NULL

NULL

1664

3

NULL

NULL

NULL

NULL

NULL

NULL

3.141592

1

NULL

NULL

NULL

NULL

1.618034

2

NULL

NULL

NULL

NULL

37.2

3

NULL

NULL

NULL

NULL

NULL

NULL

pouet

1

NULL

NULL

NULL

NULL

foo

2

NULL

NULL

NULL

NULL

bar

3

Array of objects

Given the following data description:

{
  "type": "object",
  "properties": {
    "array": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "obj_int": {"type": "integer"},
          "obj_num": {"type": "number"},
          "obj_str": {"type": "string"}
        }
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__array__item__index

  • field_one__array__item__obj_int

  • field_one__array__item__obj_num

  • field_one__array__item__obj_str

Given the following JSON data in one record:

{
  "array": [
    {
      "obj_int": 42,
      "obj_num": 3.141592,
      "obj_str": "pouet"
    },
    {
      "obj_int": 77,
      "obj_num": 1.618034,
      "obj_str": "toto"
    }
  ]
}

The result variable would contain this:

field_one__array__item__index

field_one__array__item__obj_int

field_one__array__item__obj_num

field_one__array__item__obj_str

1

42

3.141592

pouet

2

77

1.618034

toto

Arrays in object

Given the following data description:

{
  "type": "object",
  "properties": {
    "object": {
      "type": "object",
      "properties": {
        "int_arr": {
          "type": "array",
          "items": {"type": "integer"}
        },
        "num_arr": {
          "type": "array",
          "items": {"type": "number"}
        },
        "str_arr": {
          "type": "array",
          "items": {"type": "string"}
        }
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__object__int_arr__item

  • field_one__object__int_arr__item__index

  • field_one__object__num_arr__item

  • field_one__object__num_arr__item__index

  • field_one__object__str_arr__item

  • field_one__object__str_arr__item__index

Given the following JSON data in one record:

{
  "object": {
    "int_arr": [42, 77, 1664],
    "num_arr": [3.141592, 1.618034, 37.2],
    "str_arr": ["pouet", "foo", "bar"]
  }
}

The result variable would contain this:

field_one__object__int_arr__item

field_one__object__int_arr__item__index

field_one__object__num_arr__item

field_one__object__num_arr__item__index

field_one__object__str_arr__item

field_one__object__str_arr__item__index

42

1

NULL

NULL

NULL

NULL

77

2

NULL

NULL

NULL

NULL

1664

3

NULL

NULL

NULL

NULL

NULL

NULL

3.141592

1

NULL

NULL

NULL

NULL

1.618034

2

NULL

NULL

NULL

NULL

37.2

3

NULL

NULL

NULL

NULL

NULL

NULL

pouet

1

NULL

NULL

NULL

NULL

foo

2

NULL

NULL

NULL

NULL

bar

3

Nested objects

Given the following data description:

{
  "type": "object",
  "properties": {
    "object": {
      "type": "object",
      "properties": {
        "nested": {
          "type": "object",
          "properties": {
            "obj_int": {"type": "integer"},
            "obj_num": {"type": "number"},
            "obj_str": {"type": "string"}
          }
        }
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__object__nested__obj_int

  • field_one__object__nested__obj_num

  • field_one__object__nested__obj_str

Given the following JSON data in one record:

{
  "object": {
    "nested": {
      "obj_int": 42,
      "obj_num": 3.141592,
      "obj_str": "pouet"
    }
  }
}

The result variable would contain this:

field_one__object__nested__obj_int

field_one__object__nested__obj_num

field_one__object__nested__obj_str

42

3.141592

pouet

Nested arrays

Given the following data description:

{
  "type": "object",
  "properties": {
    "nested_arrays": {
      "type": "array",
      "items": {
        "type": "array",
        "items": {"type": "string"}
      }
    }
  }
}

flat_selectable would provide the following fields:

  • field_one__nested_arrays__item__index

  • field_one__nested_arrays__item__item__index

  • field_one__nested_arrays__item__item

Given the following JSON data in one record:

{
  "nested_arrays": [
    ["abc", "def"],
    ["tuw", "xyz"]
  ]
}

The result variable would contain this:

field_one__nested_arrays__item__index

field_one__nested_arrays__item__item__index

field_one__nested_arrays__item__item

1

1

abc

1

2

def

2

1

tuw

2

2

xyz

Data “preservation”

the pg_jsonb_flattener tool allows to specify chunks of JSON data not to be flattened by providing a full name of data to preserve in a list passed at JsonbTable instantiation via the preserved optional argument. See examples below with nested arrays:

jsonb_table = JsonbTable(ONE_JSON_FIELD_TABLE, field_one=DATA_DESCRIPTION,
                         preserved=['field_one__nested_arrays__item'])

flat_selectable = jsonb_table.get_query()

with engine.connect() as conn:
    result = conn.execute(flat_selectable).fetchall()

In this case, the flat_selectable would provide the following fields:

  • field_one__nested_arrays__item__index

  • field_one__nested_arrays__item

The result variable would contain this:

field_one__nested_arrays__item__index

field_one__nested_arrays__item

1

[“abc”, “def”]

2

[“tuw”, “xyz”]

This also works for the nesting array and preserving the field_one__nested_arrays would result in the following fields and result:

  • field_one__nested_arrays

field_one__nested_arrays

[[“abc”, “def”], [“tuw”, “xyz”]]

Objects can also be preserved, see examples with nested objects.

Preserving field_one__object__nested gives the following fields and result:

  • field_one__object__nested

field_one__object__nested

{‘obj_int’: 42, ‘obj_num’: 3.141592, ‘obj_str’: ‘pouet’}

Preserving field_one__object gives the following fields and result:

  • field_one__object

field_one__object__nested

{‘nested’: {‘obj_int’: 42, ‘obj_num’: 3.141592, ‘obj_str’: ‘pouet’}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_jsonb_flattener-0.2.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pg_jsonb_flattener-0.2.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file pg_jsonb_flattener-0.2.0.tar.gz.

File metadata

  • Download URL: pg_jsonb_flattener-0.2.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.0 CPython/3.6.8

File hashes

Hashes for pg_jsonb_flattener-0.2.0.tar.gz
Algorithm Hash digest
SHA256 eeb18907fe0b3265396e123f7b15eeeda73849f045afc035cd6c11f51a4e876f
MD5 5035b52ba22bc8523187427ff687ebff
BLAKE2b-256 4e8a983d0d7c6444a491b37f7a3b24d4c56bb17709961fb634217989fb8fe752

See more details on using hashes here.

File details

Details for the file pg_jsonb_flattener-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pg_jsonb_flattener-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.0 CPython/3.6.8

File hashes

Hashes for pg_jsonb_flattener-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ee4afa475b70ae6ad152b9c7f9e6d9b7bec2ef0784fbb9c795b2df590922fb4
MD5 5adba07cdca9a4a9bc5f0acd228bd4ce
BLAKE2b-256 7d95b5f16ec310f86fca463a778ad051cd92a2a7e2a049848a38afa0f0b35a78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page