Skip to main content

Get deeply nested JSON into tabular format

Project description

json_tabularize

Get arbitrarily nested JSON into tabular format

[Read the Docs] (https://json_tabularize.readthedocs.io/en/latest/index.html)

Features

  • Convert deeply nested JSON into tabular format.
  • Easy to use; the build_tab function only requires one argument to parse any JSON that contain some tabular or "tabularizable" JSON.
  • Recognize multiple formats that can be converted into tables.
  • CLI tool for outputting the tabular JSON as JSON.

How to use

  • Install Python 3.6 or newer.

  • Install

    # or PyPI
    pip install json_tabularize
    

Here's a motivating example of where to use this:

>>> bball = {'leagues': [
    {
    'league': 'American',
    'teams': [
            {
                'name': 'foo',
                'players': [
                    {'name': 'alice', 'hits': [1], 'at-bats': [3]},
                ]
            },
            {
                'name': 'bar',
                'players': [
                    {'name': 'carol', 'hits': [1], 'at-bats': [2]}
                ]
            }
        ],
    },
    {
    'league': 'National',
    'teams': [
            {
                'name': 'baz',
                'players': [
                    {'name': 'bob', 'hits': [2], 'at-bats': [3]}
                ]
            }
        ]
    }
]}

This JSON has a regular structure, and it would be reasonable to try converting this into a table. However, algorithms like pandas' normalize_json can't fully normalize it, but rather puts everything in one row.

>>> import pandas as pd
>>> pd.json_normalize(bball, ['leagues', 'teams', 'players'])
    name hits at-bats
0  alice  [1]     [3]
1  carol  [1]     [2]
2    bob  [2]     [3]

This is pretty good, but it results in loss of information, and you have to spend some time troubleshooting and reading the documentation to be able to use it.

Let's try using my algorithm.

>>> pd.DataFrame(build_tab(bball))
  leagues.teams.players.name leagues.league leagues.teams.name  leagues.teams.players.hits  leagues.teams.players.at-bats
0                      alice       American                foo                           1                              3
1                      carol       American                bar                           1                              2
2                        bob       National                baz                           2                              3

All the information has been retained. Note that pandas is NOT a dependency of this package.

Another advantage of this algorithm is that it recognizes all of the following formats as tables:

>>> {'a': [1, 2], 'b': ['a', 'b']} # this is a table
>>> [{'a': 1, 'b': 'a'}, {'a': 2, 'b': 'b'}] # also a table
>>> [[1, 'a'], [2, 'b']] # yep, still a table

The program infers table formats without user input.

Limitations:

  1. This algorithm only works on JSON that has one or fewer possible tables within it.
  2. All arrays must be lists.
  3. This won't recognize a single flat list or dict as a table.
  4. You must have GenSON installed.

In conclusion, you should still use pandas for the 95+% of "tabularizable" real-world JSON that can be fully normalized into a table by read_json or json_normalize, but this package exists for those other rare cases.

Contributing

Be sure to read the contribution guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_tabularize-1.0.3.tar.gz (11.0 kB view hashes)

Uploaded Source

Built Distribution

json_tabularize-1.0.3-py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page