Get deeply nested JSON into tabular format
Project description
json_tabularize
Get arbitrarily nested JSON into tabular format
[Read the Docs] (https://json_tabularize.readthedocs.io/en/latest/index.html)
Features
- Convert deeply nested JSON into tabular format.
- Easy to use; the build_tab function only requires one argument to parse any JSON that contain some tabular or "tabularizable" JSON.
- Recognize multiple formats that can be converted into tables.
- CLI tool for outputting the tabular JSON as JSON.
How to use
-
Install Python 3.6 or newer.
-
Install
# or PyPI pip install json_tabularize
Here's a motivating example of where to use this:
>>> bball = {'leagues': [
{
'league': 'American',
'teams': [
{
'name': 'foo',
'players': [
{'name': 'alice', 'hits': [1], 'at-bats': [3]},
]
},
{
'name': 'bar',
'players': [
{'name': 'carol', 'hits': [1], 'at-bats': [2]}
]
}
],
},
{
'league': 'National',
'teams': [
{
'name': 'baz',
'players': [
{'name': 'bob', 'hits': [2], 'at-bats': [3]}
]
}
]
}
]}
This JSON has a regular structure, and it would be reasonable to try converting this into a table. However, algorithms like pandas' normalize_json can't fully normalize it, but rather puts everything in one row.
>>> import pandas as pd
>>> pd.json_normalize(bball, ['leagues', 'teams', 'players'])
name hits at-bats
0 alice [1] [3]
1 carol [1] [2]
2 bob [2] [3]
This is pretty good, but it results in loss of information, and you have to spend some time troubleshooting and reading the documentation to be able to use it.
Let's try using my algorithm.
>>> pd.DataFrame(build_tab(bball))
leagues.teams.players.name leagues.league leagues.teams.name leagues.teams.players.hits leagues.teams.players.at-bats
0 alice American foo 1 3
1 carol American bar 1 2
2 bob National baz 2 3
All the information has been retained. Note that pandas is NOT a dependency of this package.
Another advantage of this algorithm is that it recognizes all of the following formats as tables:
>>> {'a': [1, 2], 'b': ['a', 'b']} # this is a table
>>> [{'a': 1, 'b': 'a'}, {'a': 2, 'b': 'b'}] # also a table
>>> [[1, 'a'], [2, 'b']] # yep, still a table
The program infers table formats without user input.
Limitations:
- This algorithm only works on JSON that has one or fewer possible tables within it.
- All arrays must be lists.
- This won't recognize a single flat list or dict as a table.
- You must have GenSON installed.
In conclusion, you should still use pandas for the 95+% of "tabularizable" real-world JSON that can be fully normalized into a table by read_json or json_normalize, but this package exists for those other rare cases.
Contributing
Be sure to read the contribution guidelines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file json_tabularize-1.0.3.tar.gz
.
File metadata
- Download URL: json_tabularize-1.0.3.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.0 importlib-metadata/4.6.3 keyring/21.8.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1316c8ef0ff1736dd110fb5e6456194662179db58ddbfe153055ebf2b35d1a44 |
|
MD5 | 0f01fe41f5c5d1b72d0390b38c068e92 |
|
BLAKE2b-256 | b5d1816c3d04141e3d6df7796b642670170a38ecd712bd31d061e511e432b520 |
File details
Details for the file json_tabularize-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: json_tabularize-1.0.3-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.0 importlib-metadata/4.6.3 keyring/21.8.0 rfc3986/1.5.0 colorama/0.4.4 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a6952329a0026fae57b2d42ff04a0c82cf7f29de46d8c3ced0cb8c85aae0989 |
|
MD5 | 870ba08ce50e0b31427104c00985a327 |
|
BLAKE2b-256 | a864a804799b0bd03a5ea1fc533a2ab9e5881dec15e863d0b722cb788f9c3b74 |