Simple query language to extract tables from JSON.

These details have not been verified by PyPI

Project links

homepage

Project description

json_tabulator

A simple query language for extracting tables from JSON-like objects.

Working with tabular data is much easier than working with nested documents. json_tabulator helps to extract tables from JSON-like objects in a simple, declarative manner. All further processing is left to the many powerful tools that exist for working with tables, such as Spark or Pandas.

Installation

Install from pypi:

pip install json_tabulator

Quickstart

The json_tabulator module provides tools to extract a JSON document into a set of related tables. Let's start with a simple document

data = {
    'id': 'doc-1',
    'table': [
        {'id': 1, 'name': 'row-1'},
        {'id': 2, 'name': 'row-2'}
    ]
}

The document consists of a document-level value id as well as a nested sub-table table. We want to extract it into a single table, with the global value folded into the table.

To do this, we write a query that defines the conversion into a table like this:

from json_tabulator import tabulate

query = tabulate({
    'document_id': 'id',
    'row_id': 'table[*].id',
    'row_name': 'table[*].name'
})

rows = query.get_rows(data)

This returns an iterator of rows, where each row is of type Row, which is a subclass of dict. A row maps attribute names to values. In addition, extraction errors are reported in the attribute Row.errors.

>>> list(rows)
[
    {'document_id': 'doc-1', 'row_id': 1, 'row_name': 'row-1'},
    {'document_id': 'doc-1', 'row_id': 2, 'row_name': 'row-2'}
]

Path Syntax

The syntax for path expressions is very similar to a subset of JSON Path. A path consists of an optional root element '$' followed by a path that specifies what is to be extracted. The child operator . and subscripts [1], ['a'], [*] can be used for arrays or dicts.

Dict key

Can be any string. Key values can be quoted with single or double quotes. Within quoted strings, the quote character must be doubled to escape it. For example, "say \"hello\"" is interpreted as say "hello".

Keys must be quoted if

they contain any of the characters * $ . ' " [] (), or if
they start with a digit
they are used in a subscript, e.g. $["child"] is valid, but $[child] is not.

Subscripts

Subscripts are entered as []. Allowed subscript values are

Non-negative numbers representing array indices, e.g. $[123]
Quoted dict keys, e.g. $['a']
Wildcards, e.g. $[*]

Subscripts can be entered with or without a period, e.g. $[*] and $.[*] are both valid.

Wildcard `*`

An asterisk * is interpreted as a wildcard. Iterates over dict values or array items. Note that wildcards must be entered explicitly, there is no implicit iteration over arrays. A wildcard for dict keys can be entered like .* or [*]. For arrays only [*] is allowed. This is the same behavior as for JSON Path.

Functions

Functions are enclosed in parentheses. All available functions can only appear at the end of a path, separated by a .. Example: $.*.(path).

Structural Queries

There are two functions that query document structure:

(index) returns the index that corresponds to the preceding wildcard
(path) returns the full path up to the preceding wildcard

They must be placed directly after a wildcard and must be at the end of the path. For example *.(index) is valid, but a.(index) and *.(index).b are not.

The output of (path) is unique for all rows extracted from the document.

Inline Arrays

The function (inline <relative-path>) allows to return multiple values in a single row as an inline array. The argument <relative-path> follows the same rules as other paths with the exception that the initial $ is forbidden.

For example, the query $.a[*].(inline b[*].c) finds all elements that match the path $.a[*].b[*].c but aggregates them into lists, one per each match of $.a[*].

Data Extraction

Query Semantics

Values for all attributes in a query are combined into individual rows. Attributes from different parts of the document are combined by "joining" on the lowest common ancestor.

For this reason, all wildcards for all attributes must lie on a common path. Violating this condition would lead to implicit cross joins and the associated data blow-up.

For example, the paths $.a[*] and $.b[*] cannot be combined because the wildcards are not on the same path. On the other hand, $.a and $.b[*].c can be combined.

Queries are analysed and compiled independently of the data to be queried.

If you think you need to get a combination of attributes that is not allowed, think again. If you still think so just run multiple queries and do the join afterwards.

Returned values

By default, all extracted values are returned as-is with no further conversion. Returned values are not limited to atomic types, dicts and lists are also allowed.

All rows contain all keys in the order that is specified in the query, regardless of whether the attribute is missing.

In the query it is possible to specify processing of returned values by using the function json_tabulator.attribute instead of passing a path. You can configure a converter function and default value or default_factory. Converters are only run on values that are not missing, otherwise the default is applied. For example,

from json_tabulator import tabulate, attribute

query = tabulate({
    'x': attribute('$[*].a', converter=lambda x: str(x), default='nothing')
})

data = [
    {'a': 1},
    {'a': None},
    {}
]
[row['x'] for row in query.get_rows(data)]

# output
['1', 'nothing', 'nothing']

Error Reporting

The returned rows are of type Row which is a subclass of dict. It has an additional attribute Row.errors that is a dict mapping attributes to errors. There are two possible errors:

If the path to an attribute does not exist, the error is AttributeNotFound.
If a converter raises an exception the error is ConversionFailed with details about the source value and exception that caused the conversion to fail.

Example:

from json_tabulator import tabulate, attribute
from json_tabulator.exceptions import AttributeNotFound, ConversionFailed

query = tabulate({
    'a': '$.a',
    'b': attribute('$.b', converter=int)
})
data = {'b': 'not a number'}

row = next(query.get_rows(data))

assert isinstance(row.errors['a'], AttributeNotFound)
assert isinstance(row.errors['b'], ConversionFailed)

assert row.errors['b'].value == 'not a number'
assert isinstance(row.errors['b'].caused_by, ValueError)

Related Projects

jsontable has the same purpose but is not maintained.
jsonpath-ng and other jsonpath implementation are much more flexible but more cumbersome to extract nested tables.

Project details

These details have not been verified by PyPI

Project links

homepage

Release history Release notifications | RSS feed

This version

0.7.0

Nov 26, 2025

0.6.2

Nov 21, 2025

0.6.1

Nov 10, 2025

0.6.0

Oct 31, 2025

0.5.0

Sep 11, 2025

0.4.0

Sep 11, 2025

0.3.0

Sep 10, 2025

0.2.0

Sep 2, 2025

0.1.0

Aug 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

json_tabulator-0.7.0.tar.gz (11.3 kB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

json_tabulator-0.7.0-py3-none-any.whl (10.4 kB view details)

Uploaded Nov 26, 2025 Python 3

File details

Details for the file json_tabulator-0.7.0.tar.gz.

File metadata

Download URL: json_tabulator-0.7.0.tar.gz
Upload date: Nov 26, 2025
Size: 11.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.9.7 Darwin/22.5.0

File hashes

Hashes for json_tabulator-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`e1ab2b498580fbb9f88cc7160540904cb96ac2c238430e63a53a7383e9c89db6`
MD5	`c1c1422bc3b589b64372b36afb0d6191`
BLAKE2b-256	`e84f36e65f5f6dfe2c23c96c476ca9dd51ba663379afaa3fddf6d8c2e9061788`

See more details on using hashes here.

File details

Details for the file json_tabulator-0.7.0-py3-none-any.whl.

File metadata

Download URL: json_tabulator-0.7.0-py3-none-any.whl
Upload date: Nov 26, 2025
Size: 10.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.9.7 Darwin/22.5.0

File hashes

Hashes for json_tabulator-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93074aba59a00ae57b696752bc013ea7c84c3ae7fbc9f50434a026fdac5ddf68`
MD5	`88ecb4b9bdd9d5c27f15409c0cc7897e`
BLAKE2b-256	`384e4e078bad5d6671294476e1dac490e933c834a0c729557ac12928ff1453fe`

See more details on using hashes here.

json_tabulator 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

json_tabulator

Installation

Quickstart

Path Syntax

Dict key

Subscripts

Wildcard *

Functions

Structural Queries

Inline Arrays

Data Extraction

Query Semantics

Returned values

Error Reporting

Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Wildcard `*`