Skip to main content

JQ-ish tool to query/munge JSON files using Python

Project description

qpyson (WIP)

The JSON querying tool, jq, is a really powerful tool. However, it’s sometimes a bit involved and has a learning curve that requires digging into the jq manual and familiarizing yourself with a custom language.

qpyson is a thin tool to explore, transform, or munge JSON using Python as the processing language.

Goals

  • Process JSON file using Python
  • Thin layer to process or apply transforms written in Python
  • Provide the Python func as a string to the commandline or reference an external file where the function is defined
  • Custom functions can be paramaterized and configured from the commandline
  • Output results are emitted as JSON or in tabular form (using tabulate for quick viewing from the commandline

Non-Goals

  • A replacement for jq
  • No custom DSL for filtering or querying (use Python directly)
  • Does not support streaming (JSON files are loaded into memory)

Installation

Recommended to install using a virtualenv or conda env to install.

pip install qpyson

Quick Tour

Example data from the Iris dataset.

head examples/iris.json

[
  {"sepalLength": 5.1, "sepalWidth": 3.5, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.9, "sepalWidth": 3.0, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.7, "sepalWidth": 3.2, "petalLength": 1.3, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.6, "sepalWidth": 3.1, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 5.0, "sepalWidth": 3.6, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 5.4, "sepalWidth": 3.9, "petalLength": 1.7, "petalWidth": 0.4, "species": "setosa"},
  {"sepalLength": 4.6, "sepalWidth": 3.4, "petalLength": 1.4, "petalWidth": 0.3, "species": "setosa"},
  {"sepalLength": 5.0, "sepalWidth": 3.4, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.4, "sepalWidth": 2.9, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},

We can define a custom function to process the JSON dataset. By default the function is named f and can be customized by -f or --function-name commandline argument.

qpyson "def f(d): return d[0]" examples/iris.json

{
  "sepalLength": 5.1,
  "sepalWidth": 3.5,
  "petalLength": 1.4,
  "petalWidth": 0.2,
  "species": "setosa"
}

We can also write custom functions in a Python file.

cat examples/iris_explore.py

def f(d):
    return d[0]


def f2(d, max_items: int = 10):
    return d[:max_items]


def f3(d, max_items: int = 5):
    return [x for x in d if x["species"] == "setosa"][:max_items]


def f4(d, sort_field: str, sort_direction: str = "asc", max_items: int = 5):
    reverse = not (sort_direction == "asc")
    d.sort(key=lambda x: x[sort_field], reverse=reverse)
    return d[:max_items]


def f0(d):
    # Identity operator
    return d


def first(d):
    return d[0]


def p(d, field: str = "sepalLength"):
    import pandas as pd

    df = pd.DataFrame.from_dict(d)
    sdf = df.describe()
    return sdf.to_dict()[field]

Executing --help will show the output options.

qpyson examples/iris_explore.py examples/iris.json --help

usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       Non-table Pretty print the output of dicts and list of
                        dicts (default: False)
  --indent INDENT       Non-table Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)

Executing function f, yields:

qpyson examples/iris_explore.py examples/iris.json 

{
  "sepalLength": 5.1,
  "sepalWidth": 3.5,
  "petalLength": 1.4,
  "petalWidth": 0.2,
  "species": "setosa"
}

The output view can be changed to a table view using --print-table or -t.

qpyson examples/iris_explore.py examples/iris.json --print-table --table-style github

|   sepalLength |   sepalWidth |   petalLength |   petalWidth | species   |
|---------------|--------------|---------------|--------------|-----------|
|           5.1 |          3.5 |           1.4 |          0.2 | setosa    |

A better example using function f2 defined in iris_explore.py

qpyson examples/iris_explore.py examples/iris.json  --function-name f2 --print-table

  sepalLength    sepalWidth    petalLength    petalWidth  species
-------------  ------------  -------------  ------------  ---------
          5.1           3.5            1.4           0.2  setosa
          4.9           3              1.4           0.2  setosa
          4.7           3.2            1.3           0.2  setosa
          4.6           3.1            1.5           0.2  setosa
          5             3.6            1.4           0.2  setosa
          5.4           3.9            1.7           0.4  setosa
          4.6           3.4            1.4           0.3  setosa
          5             3.4            1.5           0.2  setosa
          4.4           2.9            1.4           0.2  setosa
          4.9           3.1            1.5           0.1  setosa

Custom functions can be defined with required or optional values (with defaults) combined with Python 3 type annotations to generate

cat examples/iris.py

# More examples that demonstrate generating commandline arguments


def f(d, sort_field: str, sort_direction: str = "asc", max_items: int = 5):
    reverse = not (sort_direction == "asc")
    d.sort(key=lambda x: x[sort_field], reverse=reverse)
    return d[:max_items]


def g(d, field: str = "sepalLength"):
    import pandas as pd

    df = pd.DataFrame.from_dict(d)
    sdf = df.describe()
    return sdf.to_dict()[field]

And calling --help will show the custom function specific arguments (e.g., --max_items and --sort_direction)

qpyson examples/iris.py examples/iris.json --help

usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help] [--sort_field SORT_FIELD]
              [--sort_direction SORT_DIRECTION] [--max_items MAX_ITEMS]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       Non-table Pretty print the output of dicts and list of
                        dicts (default: False)
  --indent INDENT       Non-table Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)
  --sort_field SORT_FIELD
                        sort_field type:<class 'str'> from custom func `f`
                        (default: _empty)
  --sort_direction SORT_DIRECTION
                        sort_direction type:<class 'str'> from custom func `f`
                        (default: asc)
  --max_items MAX_ITEMS
                        max_items type:<class 'int'> from custom func `f`
                        (default: 5)

And calling with custom options yields:

qpyson examples/iris.py examples/iris.json -t --max_items=3 --sort_direction=desc --sort_field sepalLength

  sepalLength    sepalWidth    petalLength    petalWidth  species
-------------  ------------  -------------  ------------  ---------
          7.9           3.8            6.4           2    virginica
          7.7           3.8            6.7           2.2  virginica
          7.7           2.6            6.9           2.3  virginica

Another Example calling pandas underneath the hood to get a quick summary of the data.

qpyson examples/iris.py examples/iris.json -t -f g --field=sepalLength

  count     mean       std    min    25%    50%    75%    max
-------  -------  --------  -----  -----  -----  -----  -----
    150  5.84333  0.828066    4.3    5.1    5.8    6.4    7.9

It’s also possible to create thin JSON munging tools for configuration of systems or tools that take JSON as input.

For example a JSON configuration template with defaults.

cat examples/config_template.json

{
  "alpha":  1234,
  "beta": null,
  "gamma": 90.1234
}

And a processing function, f.

cat examples/config_processor.py

def f(dx, alpha: float, beta: float, gamma: float):
    """Simple example of config munging"""

    def _set(k, v):
        if v:
            dx[k] = v

    items = [("alpha", alpha), ("beta", beta), ("gamma", gamma)]

    for name, value in items:
        _set(name, value)

    dx["_comment"] = "Configured with qpyson"
    return dx

Running --help will show the supported configuration options.

qpyson examples/config_processor.py examples/config_template.json --help

usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help] [--alpha ALPHA] [--beta BETA] [--gamma GAMMA]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       Non-table Pretty print the output of dicts and list of
                        dicts (default: False)
  --indent INDENT       Non-table Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)
  --alpha ALPHA         alpha type:<class 'float'> from custom func `f`
                        (default: _empty)
  --beta BETA           beta type:<class 'float'> from custom func `f`
                        (default: _empty)
  --gamma GAMMA         gamma type:<class 'float'> from custom func `f`
                        (default: _empty)

Now configuring alpha, beta and gamma.

qpyson examples/config_processor.py examples/config_template.json --alpha 1.23 --beta 2.34 --gamma 3.45

{
  "alpha": 1.23,
  "beta": 2.34,
  "gamma": 3.45,
  "_comment": "Configured with qpyson"
}

Testing

Testing is currently done using RMarkdown using the make target doc.

This should probably be ported to non-R based approach. However, this current approach does keep the docs (e.g., README.md) up to date.

Related JQ-ish tools

https://github.com/dbohdan/structured-text-tools#json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qpyson-0.2.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

qpyson-0.2.0-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file qpyson-0.2.0.tar.gz.

File metadata

  • Download URL: qpyson-0.2.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.3

File hashes

Hashes for qpyson-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3159f8d0084fdcb831d97d93ac6e7fe692ff2f0d36086c43af8cf7845de48146
MD5 84a4e5e32e2fcd8e57e21da0b120c136
BLAKE2b-256 78766fc76b57a3f53c5afddb5db510371ed14d893131caf35515c8bf91301ce1

See more details on using hashes here.

File details

Details for the file qpyson-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: qpyson-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.3

File hashes

Hashes for qpyson-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1341a4029081175378a0420d45f56ee58764e4effd7e58caa0905ada3526e6d9
MD5 3923f6e4cc73fdbb92efeccfc0969153
BLAKE2b-256 565ce4e00e93881253d784e4d7d1bf7149251ba09bc80120a7e44e0c53fedc4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page