Convert JSON to a set of tidy CSV files

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

tidy-json-to-csv

Converts a subset of JSON to a set of tidy CSVs. Supports both streaming processing of input JSON and output of CSV, and so suitable for large files in memory constrained environments.

What problem does this solve?

Most JSON to CSV converters do not result in data suitable for immediate analysis. They usually output a single CSV, and to do this, result in some combination of:

JSON inside CSV fields;
values in lists presented as columms;
data duplicated in multiple rows / a row's position in the CSV determines its context.

Often these require subsequent manual, and so error-prone, data manipulation. This library aims to do all the conversion up-front, so you end up with a set of tidy tables, which is often a great place from which to start analysis.

Example input and output

The JSON

{
  "songs": [
    {
      "id": "1",
      "title": "Walk through the fire",
      "categories": [
        {"id": "1", "name": "musicals"},
        {"id": "2", "name": "television-shows"}
      ],
      "comments": [
        {"content": "I love it"},
        {"content": "I've heard better"}
      ],
      "artist": {
        "name": "Slayer"
      }
    },
    {
      "id": "2",
      "title": "I could have danced all night",
      "categories": [
        {"id": "1", "name": "musicals"},
        {"id": "3", "name": "films"}
      ],
      "comments": [
        {"content": "I also could have danced all night"}
      ],
      "artist": {
        "name": "Doolitle"
      }
    }
  ]
}

maps to four files:

`songs.csv`

"id","title","artist__name"
"1","Walk through the fire","Slayer"
"2","I could have danced all night","Doolitle"

`songscategoriesid.csv`

"songs__id","categories__id"
"1","1"
"1","2"
"2","1"
"2","3"

`songs__comments.csv`

"songs__id","content"
"1","I love it"
"1","I've heard better"
"2","I also could have danced all night"

`categories.csv`

"id","name"
"1","musicals"
"2","television-shows"
"3","films"

Installation

pip install tidy-json-to-csv

Usage: Convert JSON to multiple CSV files (Command line)

cat songs.json | tidy_json_to_csv

Usage: Convert JSON to multiple CSV files (Python)

from tidy_json_to_csv import to_csvs

# A save function, called by to_csvs for each CSV file to be generated.
# Will be run in a separate thread, started by to_csvs
def save_csv_bytes(path, chunks):
    with open(f'{path}.csv', 'wb') as f:
        for chunk in chunks:
            f.write(chunk)

def json_bytes():
    with open(f'file.json', 'rb') as f:
        chunk = f.read(65536)
        if chunk:
            yield chunk

to_csvs(json_bytes(), save_csv_bytes, null='#NA', output_chunk_size=65536)

Usage: Convert JSON to multiple Pandas data frames (Python)

import io
import queue

import pandas as pd
from tidy_json_to_csv import to_csvs

def json_to_pandas(json_filename):
    q = queue.Queue()

    class StreamedIterable(io.RawIOBase):
        def __init__(self, iterable):
            self.iterable = iterable
            self.remainder = b''
        def readable(self):
            return True
        def readinto(self, b):
            buffer_size = len(b)

            while len(self.remainder) < buffer_size:
                try:
                    self.remainder = self.remainder + next(self.iterable)
                except StopIteration:
                    if self.remainder:
                        break
                    return 0

            chunk, self.remainder = self.remainder[:buffer_size], self.remainder[buffer_size:]
            b[:len(chunk)] = chunk
            return len(chunk)

    def save_csv_bytes(path, chunks):
        q.put((path, pd.read_csv(io.BufferedReader(StreamedIterable(chunks), buffer_size=65536), na_values=['#NA'])))

    def json_bytes():
        with open(json_filename, 'rb') as f:
            chunk = f.read(65536)
            if chunk:
                yield chunk

    to_csvs(json_bytes(), save_csv_bytes, null='#NA')

    dfs = {}
    while not q.empty():
        path, df = q.get()
        dfs[path] = df

    return dfs

dfs = json_to_pandas('songs.json')
for path, df in dfs.items():
    print(path)
    print(df)

Constraints

Denormalised input JSON is assumed, and the output is normalised. If a nested object has an id field, it is assumed to be the primary key of a top-level table. All objects that have a nested object or array must have an id field that serves as its primary key in the final output. If present, id must be the first key in a map. All arrays must be arrays of objects rather than primitives.

Although mostly streaming, to support denormalised input JSON and to avoid repeating the same rows in normalised CSVs, an internal record of output IDs is maintained during processing.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.13

Nov 3, 2020

0.0.12

Nov 2, 2020

0.0.11

Oct 31, 2020

0.0.10

Oct 31, 2020

0.0.9

Oct 31, 2020

0.0.8

Oct 31, 2020

0.0.7

Oct 30, 2020

0.0.6

Oct 21, 2020

0.0.5

Oct 21, 2020

0.0.4

Oct 21, 2020

0.0.3

Jun 7, 2020

0.0.2

Jun 7, 2020

0.0.1

Jun 7, 2020

0.0.0

Jun 7, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidy-json-to-csv-0.0.13.tar.gz (5.5 kB view details)

Uploaded Nov 3, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tidy_json_to_csv-0.0.13-py3-none-any.whl (6.3 kB view details)

Uploaded Nov 3, 2020 Python 3

File details

Details for the file tidy-json-to-csv-0.0.13.tar.gz.

File metadata

Download URL: tidy-json-to-csv-0.0.13.tar.gz
Upload date: Nov 3, 2020
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for tidy-json-to-csv-0.0.13.tar.gz
Algorithm	Hash digest
SHA256	`c2dd299267ef5087d55b7746a9c3adb7425dab94f53c49ca33e010e147d3fe07`
MD5	`d68fd2cf156a34f732fa1d976e069296`
BLAKE2b-256	`e361e382088330e4e90531fb6fe93e540affb1285ac045b54ef3070ed2f66ce3`

See more details on using hashes here.

File details

Details for the file tidy_json_to_csv-0.0.13-py3-none-any.whl.

File metadata

Download URL: tidy_json_to_csv-0.0.13-py3-none-any.whl
Upload date: Nov 3, 2020
Size: 6.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for tidy_json_to_csv-0.0.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45c5823a36f0397f18201803f7911ef0d73e8b845bc4e31e534ab31c87f6dcc0`
MD5	`c48d6ed6d14446261fd58d731850c1e3`
BLAKE2b-256	`9530da56ae141d5a97b0023123e94f9cc1c51dc77d10600562f713f6a578ee53`

See more details on using hashes here.

tidy-json-to-csv 0.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tidy-json-to-csv

What problem does this solve?

Example input and output

`songs.csv`

`songscategoriesid.csv`

`songs__comments.csv`

`categories.csv`

Installation

Usage: Convert JSON to multiple CSV files (Command line)

Usage: Convert JSON to multiple CSV files (Python)

Usage: Convert JSON to multiple Pandas data frames (Python)

Constraints

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

tidy-json-to-csv 0.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tidy-json-to-csv

What problem does this solve?

Example input and output

songs.csv

songs__categories__id.csv

songs__comments.csv

categories.csv

Installation

Usage: Convert JSON to multiple CSV files (Command line)

Usage: Convert JSON to multiple CSV files (Python)

Usage: Convert JSON to multiple Pandas data frames (Python)

Constraints

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`songs.csv`

`songscategoriesid.csv`

`songs__comments.csv`

`categories.csv`