Skip to main content

Convert hierarchical JSON data into multiple related CSV files

Project description

json-to-multicsv

Split a JSON file with hierarchical data into multiple CSV files.

A Python rewrite of jsnell/json-to-multicsv. See Juho Snellman's 2016 blog post for motivation and design.

Installation

pip install json-to-multicsv

Usage

$ json-to-multicsv --help
Usage: json-to-multicsv [OPTIONS]

  Split a JSON file with hierarchical data to multiple CSV files.

Options:
  --file FILENAME  JSON input file (default: stdin)
  --path TEXT      pathspec:handler[:name]
  --table TEXT     Top-level table name
  --no-prefix      Use only the last component of the table name for output
                   filenames.
  --help           Show this message and exit.

Examples

Nested objects and arrays

Given this input:

{
    "item 1": {
        "title": "The First Item",
        "genres": ["sci-fi", "adventure"],
        "rating": {
            "mean": 9.5,
            "votes": 190
        }
    },
    "item 2": {
        "title": "The Second Item",
        "genres": ["history", "economics"],
        "rating": {
            "mean": 7.4,
            "votes": 865
        },
        "sales": [
            { "count": 76, "country": "us" },
            { "count": 13, "country": "de" },
            { "count": 4, "country": "fi" }
        ]
    }
}
json-to-multicsv --file input.json \
    --path '/:table:item' \
    --path '/*/rating:column' \
    --path '/*/sales:table:sales' \
    --path '/*/genres:table:genres'

Produces three CSV files, joinable on the *._key columns:

item.csv:

item._key,rating.mean,rating.votes,title
item 1,9.5,190,The First Item
item 2,7.4,865,The Second Item

item.genres.csv:

item._key,item.genres._key,genres
item 1,0,sci-fi
item 1,1,adventure
item 2,0,history
item 2,1,economics

item.sales.csv:

item._key,item.sales._key,count,country
item 2,0,76,us
item 2,1,13,de
item 2,2,4,fi

Row handler, custom key names, and ignore

When the top-level JSON value is a single object (not a collection), use /:row with --table to name the output table. Custom key column names can be set with an extra :KEY_NAME argument on table handlers. Use :ignore to skip parts of the data.

{
    "name": "Summer Championship",
    "year": 2024,
    "games": {
        "game-1": {
            "home": "Eagles",
            "away": "Hawks",
            "score": { "home": 3, "away": 1 }
        },
        "game-2": {
            "home": "Bears",
            "away": "Lions",
            "score": { "home": 2, "away": 2 }
        }
    },
    "sponsors": ["Acme Corp", "Globex"]
}
json-to-multicsv --file tournament.json \
    --path '/:row' \
    --path '/games:table:game:gameId' \
    --path '/games/*/score:column' \
    --path '/sponsors:ignore' \
    --table main

main.csv:

name,year
Summer Championship,2024

main.game.csv:

gameId,away,home,score.away,score.home
game-1,Hawks,Eagles,1,3
game-2,Lions,Bears,2,2

Note that gameId replaces the default game._key column name, and sponsors are omitted entirely.

Top-level array

When the input is a JSON array, a single table handler at the root is all you need:

[
    {"title": "Dune", "author": "Frank Herbert", "year": 1965},
    {"title": "Neuromancer", "author": "William Gibson", "year": 1984},
    {"title": "Snow Crash", "author": "Neal Stephenson", "year": 1992}
]
json-to-multicsv --file books.json --path '/:table:book'

book.csv:

book._key,author,title,year
0,Frank Herbert,Dune,1965
1,William Gibson,Neuromancer,1984
2,Neal Stephenson,Snow Crash,1992

Options

--file INPUT

Read JSON input from a file. Defaults to stdin.

--path PATHSPEC:table:NAME[:KEY_NAME]

Values matching the pathspec open a new table with the given name. The value should be an object or array. For objects, each field produces a row, with the field name stored in the NAME._key column. For arrays, each element produces a row, with the 0-based index stored in the NAME._key column.

When tables are nested, key columns from all outer tables are included in inner tables.

An optional key name can be provided to customize the key column name (e.g., /:table:item:itemId produces an itemId column instead of item._key).

--path PATHSPEC:column

Values matching the pathspec are emitted as columns in the current table's row. If the value is a scalar, it becomes a single column. If the value is an object, its fields are flattened into multiple columns with dotted names.

--path PATHSPEC:row

Values matching the pathspec are emitted as new rows in the current table. The value must be an object. This is generally only useful for the top-level JSON value, combined with --table.

--path PATHSPEC:ignore

Values matching the pathspec (and all their children) are skipped.

--table NAME

Name the top-level table. Use this with a row handler on the root element.

--no-prefix

Use only the last component of the table name for output filenames. For example, item.sales.csv becomes sales.csv.

Paths and pathspecs

The path to a JSON value is determined by:

  • The root element's path is /
  • For values inside an object: parent path + / + field name
  • For values inside an array: parent path + / + 0-based index

In a pathspec, any path component can be replaced with *, which matches any single component. For example, /a/*/c matches /a/b/c but not /a/b/b/c.

License

MIT. Based on json-to-multicsv by Juho Snellman.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_to_multicsv-0.1.1.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

json_to_multicsv-0.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file json_to_multicsv-0.1.1.tar.gz.

File metadata

  • Download URL: json_to_multicsv-0.1.1.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for json_to_multicsv-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a505ac89062061e062691859ed433f18b6b3738ba21d076d0f17c25ceecbbda8
MD5 0e8a2cbb995ec77fe59ab56f78d4dc3a
BLAKE2b-256 7e843f5aab32be3862252520871fbaf3d809dd79c987c00eee8f00c4a2c0a6be

See more details on using hashes here.

Provenance

The following attestation bundles were made for json_to_multicsv-0.1.1.tar.gz:

Publisher: publish.yml on fgregg/json-to-multicsv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file json_to_multicsv-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for json_to_multicsv-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9c1459c243169b3a34b3457359f417122df799905e3392e989abd30cc9a9f6d9
MD5 818f285c5bbc07c4a8dca6af7195a22e
BLAKE2b-256 aaf17556cf1ff5eb2b5b07b4037fc01d8eecf1685560bb53f2241c8426d77757

See more details on using hashes here.

Provenance

The following attestation bundles were made for json_to_multicsv-0.1.1-py3-none-any.whl:

Publisher: publish.yml on fgregg/json-to-multicsv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page