Skip to main content

Convert JSON to a set of tidy CSV files

Project description

tidy-json-to-csv CircleCI

Converts a subset of JSON to a set of tidy CSVs. Supports both streaming processing of input JSON and output of CSV, and so suitable for large files in memory constrained environments.

Denormalised input JSON is assumed, and the output is normalised. If a nested object has an id field, it is assumed to be the primary key of a top-level table. All objects that have a nested object or array must have an id field that serves as its primary key in the final output. If present, id must be the first key in a map. All arrays must be arrays of objects rather than primitives.

Although mostly streaming, to support denormalised input JSON and to avoid repeating the same rows in normalised CSVs, an internal record of output IDs is maintained during processing.

Installation

pip install tidy-json-to-csv

Usage

from tidy_json_to_csv import to_csvs

# A save generator must be provided since a single JSON file
# maps to multiple CSVs
def save_csv_bytes(path):
    with open(f'{path}.csv', 'wb') as f:
        while True:
            chunk = yield
            f.write(chunk)

# Overkill for this example, but shows how a generator can be
# used to generate the bytes of a large JSON file
def json_bytes():
    with open(f'file.json', 'rb') as f:
        yield f.read()

to_csvs(json_bytes(), save_csv_bytes, null='#NA')

Example input and output

The JSON

{
  "songs": [
    {
      "id": "1",
      "title": "Walk through the fire",
      "categories": [
        {"id": "1", "name": "musicals"},
        {"id": "2", "name": "television-shows"}
      ],
      "comments": [
        {"content": "I love it"},
        {"content": "I've heard better"}
      ]
    },
    {
      "id": "2",
      "title": "I could have danced all night",
      "categories": [
        {"id": "1", "name": "musicals"},
        {"id": "3", "name": "films"}
      ],
      "comments": [
        {"content": "I also could have danced all night"}
      ]
    }
  ]
}

maps to four files:

songs[*].csv

"id","title"
"1","Walk through the fire"
"2","I could have danced all night"

songs[*].categories[*].id.csv

"songs.id","categories.id"
"1","1"
"1","2"
"2","1"
"2","3"

songs[*].comments[*].csv

"songs.id","name"
"1","I love it"
"1","I've heard better"
"2","I also could have danced all night"

categories[*].csv

"id","name"
"1","musicals"
"2","television-shows"
"3","films"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidy-json-to-csv-0.0.1.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

tidy_json_to_csv-0.0.1-py3-none-any.whl (4.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page