Skip to main content

Get deeply nested JSON into tabular format

Project description

json_tabularize

Get arbitrarily nested JSON into tabular format

[Read the Docs] (https://json_tabularize.readthedocs.io/en/latest/index.html)

Features

  • Convert deeply nested JSON into tabular format.
  • Easy to use; the build_tab function only requires one argument to parse any JSON that contain some tabular or "tabularizable" JSON.
  • Recognize multiple formats that can be converted into tables.
  • CLI tool for outputting the tabular JSON as JSON.

How to use

  • Install Python 3.6 or newer.

  • Install

    # or PyPI
    pip install json_tabularize
    

Here's a motivating example of where to use this:

>>> bball = {'leagues': [
    {
    'league': 'American',
    'teams': [
            {
                'name': 'foo',
                'players': [
                    {'name': 'alice', 'hits': [1], 'at-bats': [3]},
                ]
            },
            {
                'name': 'bar',
                'players': [
                    {'name': 'carol', 'hits': [1], 'at-bats': [2]}
                ]
            }
        ],
    },
    {
    'league': 'National',
    'teams': [
            {
                'name': 'baz',
                'players': [
                    {'name': 'bob', 'hits': [2], 'at-bats': [3]}
                ]
            }
        ]
    }
]}

This JSON has a regular structure, and it would be reasonable to try converting this into a table. However, algorithms like pandas' normalize_json can't fully normalize it, but rather puts everything in one row.

>>> import pandas as pd
>>> pd.json_normalize(bball, ['leagues', 'teams', 'players'])
    name hits at-bats
0  alice  [1]     [3]
1  carol  [1]     [2]
2    bob  [2]     [3]

This is pretty good, but it results in loss of information, and you have to spend some time troubleshooting and reading the documentation to be able to use it.

Let's try using my algorithm.

>>> pd.DataFrame(build_tab(bball))
  leagues.teams.players.name leagues.league leagues.teams.name  leagues.teams.players.hits  leagues.teams.players.at-bats
0                      alice       American                foo                           1                              3
1                      carol       American                bar                           1                              2
2                        bob       National                baz                           2                              3

All the information has been retained. Note that pandas is NOT a dependency of this package.

Another advantage of this algorithm is that it recognizes all of the following formats as tables:

>>> {'a': [1, 2], 'b': ['a', 'b']} # this is a table
>>> [{'a': 1, 'b': 'a'}, {'a': 2, 'b': 'b'}] # also a table
>>> [[1, 'a'], [2, 'b']] # yep, still a table

The program infers table formats without user input.

Limitations:

  1. This algorithm only works on JSON that has one or fewer possible tables within it.
  2. All arrays must be lists.
  3. This won't recognize a single flat list or dict as a table.
  4. You must have GenSON installed.

In conclusion, you should still use pandas for the 95+% of "tabularizable" real-world JSON that can be fully normalized into a table by read_json or json_normalize, but this package exists for those other rare cases.

Contributing

Be sure to read the contribution guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_tabularize-1.0.3.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

json_tabularize-1.0.3-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file json_tabularize-1.0.3.tar.gz.

File metadata

  • Download URL: json_tabularize-1.0.3.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.0 importlib-metadata/4.6.3 keyring/21.8.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for json_tabularize-1.0.3.tar.gz
Algorithm Hash digest
SHA256 1316c8ef0ff1736dd110fb5e6456194662179db58ddbfe153055ebf2b35d1a44
MD5 0f01fe41f5c5d1b72d0390b38c068e92
BLAKE2b-256 b5d1816c3d04141e3d6df7796b642670170a38ecd712bd31d061e511e432b520

See more details on using hashes here.

File details

Details for the file json_tabularize-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: json_tabularize-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.0 importlib-metadata/4.6.3 keyring/21.8.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for json_tabularize-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3a6952329a0026fae57b2d42ff04a0c82cf7f29de46d8c3ced0cb8c85aae0989
MD5 870ba08ce50e0b31427104c00985a327
BLAKE2b-256 a864a804799b0bd03a5ea1fc533a2ab9e5881dec15e863d0b722cb788f9c3b74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page