Python library for denormalizing nested dicts or json objects to tables and back
Project description
json-flattener
Python library for denormalizing/flattening lists of complex objects to tables/data frames, with roundtripping
Notebook Example
Description
Given YAML/JSON/JSON-Lines such as:
- id: S001
name: Lord of the Rings
genres:
- fantasy
creator:
name: JRR Tolkein
from_country: England
books:
- id: S001.1
name: Fellowship of the Ring
price: 5.99
summary: Hobbits
- id: S001.2
name: The Two Towers
price: 5.99
summary: More hobbits
- id: S001.3
name: Return of the King
price: 6.99
summary: Yet more hobbits
- id: S002
name: The Culture Series
genres:
- scifi
creator:
name: Ian M Banks
from_country: Scotland
books:
- id: S002.1
name: Consider Phlebas
price: 5.99
- id: S002.2
name: Player of Games
price: 5.99
Denormalize using jfl
command:
jfl flatten -C creator=flat -C books=multivalued -i examples/books1.yaml -o examples/books1-flattened.tsv
id | name | genres | creator_name | creator_from_country | books_name | books_summary | books_price | books_id | creator_genres |
---|---|---|---|---|---|---|---|---|---|
S001 | Lord of the Rings | [fantasy] | JRR Tolkein | England | [Fellowship of the Ring|The Two Towers|Return of the King] | [Hobbits|More hobbits|Yet more hobbits] | [5.99|5.99|6.99] | [S001.1|S001.2|S001.3] | |
S002 | The Culture Series | [scifi] | Ian M Banks | Scotland | [Consider Phlebas|Player of Games] | [5.99|5.99] | [S002.1|S002.2] |
Convert back to JSON/YAML:
jfl unflatten -C creator=flat -C books=multivalued -i examples/books1.tsv -o examples/books1.yaml
This library also allows complex fields to be directly serialized as json or yaml (the default is to append _json
to the key). For example:
jfl flatten -C creator=json -C books=json -i examples/books1.yaml -o examples/books1-jsonified.tsv
id | name | genres | creator_json | books_json |
---|---|---|---|---|
S001 | Lord of the Rings | [fantasy] | {"name": "JRR Tolkein", "from_country": "England"} | [{"id": "S001.1", "name": "Fellowship of the Ring", "summary": "Hobbits", "price": 5.99}, {"id": "S001.2", "name": "The Two Towers", "summary": "More hobbits", "price": 5.99}, {"id": "S001.3", "name": "Return of the King", "summary": "Yet more hobbits", "price": 6.99}] |
S002 | The Culture Series | [scifi] | {"name": "Ian M Banks", "from_country": "Scotland"} | [{"id": "S002.1", "name": "Consider Phlebas", "price": 5.99}, {"id": "S002.2", "name": "Player of Games", "price": 5.99}] |
S003 | Book of the New Sun | [scifi, fantasy] | {"name": "Gene Wolfe", "genres": ["scifi", "fantasy"], "from_country": "USA"} | [{"id": "S003.1", "name": "Shadow of the Torturer"}, {"id": "S003.2", "name": "Claw of the Conciliator", "price": 6.99}] |
S004 | Example with single book | {"name": "Ms Writer", "genres": ["romance"], "from_country": "USA"} | [{"id": "S004.1", "name": "Blah"}] | |
S005 | Example with no books | {"name": "Mr Unproductive", "genres": ["romance", "scifi", "fantasy"], "from_country": "USA"} |
See
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRyM06peU9BkrZbXJazuMlajw5s4Vbj5f0t0TE4hj_X9Ex_EASLSUZuaWUxYIhWbOC6CtPRtxrTGWQD/embed?start=false&loop=false&delayms=60000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>The primary use case is to go from a rich normalized data model (as python objects, JSON, or YAML) to a flatter representation that is amenable to processing with:
- Solr/Lucene
- Pandas/R Dataframes
- Excel/Google sheets
- Unix cut/grep/cat/etc
- Simple denormalized SQL database representations
The target denormalized format is a list of rows / a data matrix, where each cell is either an atom or a list of atoms.
Method
- Each top level key becomes a column
- if the key value is a dict/object, then flatten
- by default a '_' is used to separate the parent key from the inner key
- e.g. the composition of
creator
andfrom_country
becomescreator_from_country
- currently one level of flattening is supported
- if the key value is a list of atomic entities, then leave as is
- if the key value is a list of dicts/objects, then flatten each key of this inner dict into a list
- e.g. if
books
is a list of book objects, andname
is a key on book, thenbooks_name
is a list of names of each book - order is significant - the first element of
books_name
is matched to the first element ofbooks_price
, etc
- e.g. if
- Allow any key to be serialized as yaml/json/pickle if configured
Command line usage (TODO)
Usage from Python
Documentation coming soon: see test folder for now
use within LinkML
Comparison
Pandas json_normalize
Java json-flattener
https://github.com/wnameless/json-flattener
Python
csvjson
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file json_flattener-0.1.9.tar.gz
.
File metadata
- Download URL: json_flattener-0.1.9.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84cf8523045ffb124301a602602201665fcb003a171ece87e6f46ed02f7f0c15 |
|
MD5 | f652ecf05bb3fbe29c17606b5613748c |
|
BLAKE2b-256 | 6d77b00e46d904818826275661a690532d3a3a43a4ded0264b2d7fcdb5c0feea |
File details
Details for the file json_flattener-0.1.9-py3-none-any.whl
.
File metadata
- Download URL: json_flattener-0.1.9-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b027746f08bf37a75270f30c6690c7149d5f704d8af1740c346a3a1236bc941 |
|
MD5 | 903d1ae6cf748972dcff6871ec72dbda |
|
BLAKE2b-256 | 00cc7fbd75d3362e939eb98bcf9bd22f3f7df8c237a85148899ed3d38e5614e5 |