Skip to main content

JQ-ish tool to query/munge JSON files using Python

Project description

qpyson: Thin Commandline Tool to Explore, Transform, and Munge JSON using Python

The JSON querying tool, jq, is a really powerful tool. However, it’s sometimes a bit involved and has a learning curve that requires digging into the jq manual and familiarizing yourself with a custom language.

qpyson is a thin commandline tool to explore, transform, or munge JSON using Python as the processing language.

Goals

  • Process (filter, map, general munging) of JSON files using Python
  • Thin layer to process or apply transforms written in Python
  • Provide Python function as a string to the commandline or define Python functions in an external file
  • Custom functions can be parameterized and configured from the commandline
  • Output results are emitted as JSON or in tabular form (using tabulate for quick viewing from the commandline

Non-Goals

  • A replacement for jq
  • No custom DSL for filtering or querying (use Python directly)
  • Does not support streaming (JSON files are loaded into memory)

Installation

Recommended to install using a virtualenv or conda env to install.

pip install qpyson

Quick Tour

Example data from the Iris dataset.

head examples/iris.json

[
  {"sepalLength": 5.1, "sepalWidth": 3.5, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.9, "sepalWidth": 3.0, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.7, "sepalWidth": 3.2, "petalLength": 1.3, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.6, "sepalWidth": 3.1, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 5.0, "sepalWidth": 3.6, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 5.4, "sepalWidth": 3.9, "petalLength": 1.7, "petalWidth": 0.4, "species": "setosa"},
  {"sepalLength": 4.6, "sepalWidth": 3.4, "petalLength": 1.4, "petalWidth": 0.3, "species": "setosa"},
  {"sepalLength": 5.0, "sepalWidth": 3.4, "petalLength": 1.5, "petalWidth": 0.2, "species": "setosa"},
  {"sepalLength": 4.4, "sepalWidth": 2.9, "petalLength": 1.4, "petalWidth": 0.2, "species": "setosa"},

The commandline tool takes a function written as commandline string or referenced in an external file as well as the JSON file to be processed.

qpyson --help

qpyson: error: the following arguments are required: path_or_cmd, json_file
usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       (Non-table) Pretty print the output of dicts and list
                        of dicts (default: False)
  --indent INDENT       (Non-table) Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)

We can define a custom function to process the JSON dataset. By default the function is named f and can be customized by -f or --function-name commandline argument.

qpyson "def f(d): return d[0]" examples/iris.json

{
  "sepalLength": 5.1,
  "sepalWidth": 3.5,
  "petalLength": 1.4,
  "petalWidth": 0.2,
  "species": "setosa"
}

We can also write custom functions in a Python file.

cat examples/iris_explore.py

def f(d):
    return d[0]


def f2(d, max_items: int = 10):
    return d[:max_items]


def f3(d, max_items: int = 5):
    return [x for x in d if x["species"] == "setosa"][:max_items]


def f4(d, sort_field: str, sort_direction: str = "asc", max_items: int = 5):
    reverse = not (sort_direction == "asc")
    d.sort(key=lambda x: x[sort_field], reverse=reverse)
    return d[:max_items]


def f0(d):
    # Identity operator
    return d


def first(d):
    return d[0]


def p(d, field: str = "sepalLength"):
    import pandas as pd

    df = pd.DataFrame.from_dict(d)
    sdf = df.describe()
    return sdf.to_dict()[field]

Executing --help will show the output options.

qpyson examples/iris_explore.py examples/iris.json --help

usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       (Non-table) Pretty print the output of dicts and list
                        of dicts (default: False)
  --indent INDENT       (Non-table) Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)

Executing function f, yields:

qpyson examples/iris_explore.py examples/iris.json 

{
  "sepalLength": 5.1,
  "sepalWidth": 3.5,
  "petalLength": 1.4,
  "petalWidth": 0.2,
  "species": "setosa"
}

The output view can be changed to a table view using --print-table or -t.

qpyson examples/iris_explore.py examples/iris.json --print-table --table-style github

|   sepalLength |   sepalWidth |   petalLength |   petalWidth | species   |
|---------------|--------------|---------------|--------------|-----------|
|           5.1 |          3.5 |           1.4 |          0.2 | setosa    |

A better example using function f2 defined in iris_explore.py

qpyson examples/iris_explore.py examples/iris.json  --function-name f2 --print-table

  sepalLength    sepalWidth    petalLength    petalWidth  species
-------------  ------------  -------------  ------------  ---------
          5.1           3.5            1.4           0.2  setosa
          4.9           3              1.4           0.2  setosa
          4.7           3.2            1.3           0.2  setosa
          4.6           3.1            1.5           0.2  setosa
          5             3.6            1.4           0.2  setosa
          5.4           3.9            1.7           0.4  setosa
          4.6           3.4            1.4           0.3  setosa
          5             3.4            1.5           0.2  setosa
          4.4           2.9            1.4           0.2  setosa
          4.9           3.1            1.5           0.1  setosa

Custom functions can be defined with required or optional values (with defaults) combined with Python 3 type annotations to generate

cat examples/iris.py

# More examples that demonstrate generating commandline arguments


def f(d, sort_field: str, sort_direction: str = "asc", max_items: int = 5):
    reverse = not (sort_direction == "asc")
    d.sort(key=lambda x: x[sort_field], reverse=reverse)
    return d[:max_items]


def g(d, field: str = "sepalLength"):
    import pandas as pd

    df = pd.DataFrame.from_dict(d)
    sdf = df.describe()
    return sdf.to_dict()[field]

And calling --help will show the custom function specific arguments (e.g., --max_items and --sort_direction)

qpyson examples/iris.py examples/iris.json --help

usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help] [--sort_field SORT_FIELD]
              [--sort_direction SORT_DIRECTION] [--max_items MAX_ITEMS]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       (Non-table) Pretty print the output of dicts and list
                        of dicts (default: False)
  --indent INDENT       (Non-table) Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)
  --sort_field SORT_FIELD
                        sort_field type:<class 'str'> from custom func `f`
                        (default: _empty)
  --sort_direction SORT_DIRECTION
                        sort_direction type:<class 'str'> from custom func `f`
                        (default: asc)
  --max_items MAX_ITEMS
                        max_items type:<class 'int'> from custom func `f`
                        (default: 5)

And calling with custom options yields:

qpyson examples/iris.py examples/iris.json -t --max_items=3 --sort_direction=desc --sort_field sepalLength

  sepalLength    sepalWidth    petalLength    petalWidth  species
-------------  ------------  -------------  ------------  ---------
          7.9           3.8            6.4           2    virginica
          7.7           3.8            6.7           2.2  virginica
          7.7           2.6            6.9           2.3  virginica

Another Example calling pandas underneath the hood to get a quick summary of the data.

qpyson examples/iris.py examples/iris.json -t -f g --field=sepalLength

  count     mean       std    min    25%    50%    75%    max
-------  -------  --------  -----  -----  -----  -----  -----
    150  5.84333  0.828066    4.3    5.1    5.8    6.4    7.9

It’s also possible to create thin JSON munging tools for configuration of systems or tools that take JSON as input.

For example a JSON configuration template with defaults.

cat examples/config_template.json

{
  "alpha":  1234,
  "beta": null,
  "gamma": 90.1234
}

And a processing function, f.

cat examples/config_processor.py

def f(dx, alpha: float, beta: float, gamma: float):
    """Simple example of config munging"""

    def _set(k, v):
        if v:
            dx[k] = v

    items = [("alpha", alpha), ("beta", beta), ("gamma", gamma)]

    for name, value in items:
        _set(name, value)

    dx["_comment"] = "Configured with qpyson"
    return dx

Running --help will show the supported configuration options.

qpyson examples/config_processor.py examples/config_template.json --help

usage: qpyson [-f FUNCTION_NAME] [-n] [--indent INDENT] [-t]
              [--table-style TABLE_STYLE]
              [--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
              [--help] [--alpha ALPHA] [--beta BETA] [--gamma GAMMA]
              path_or_cmd json_file

Util to use Python to process (e.g., filter, map) JSON files

positional arguments:
  path_or_cmd           Path to python file, or python cmd
  json_file             Path to JSON file

optional arguments:
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        Function name (default: f)
  -n, --no-pretty       (Non-table) Pretty print the output of dicts and list
                        of dicts (default: False)
  --indent INDENT       (Non-table) Pretty print indent spacing (default: 2)
  -t, --print-table     Pretty print results (default: False)
  --table-style TABLE_STYLE
                        Table fmt style using Tabulate. See
                        https://github.com/astanin/python-tabulate#table-
                        format for available options (default: simple)
  --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        Log level (default: NOTSET)
  --help                Show this help message and exit (default: False)
  --alpha ALPHA         alpha type:<class 'float'> from custom func `f`
                        (default: _empty)
  --beta BETA           beta type:<class 'float'> from custom func `f`
                        (default: _empty)
  --gamma GAMMA         gamma type:<class 'float'> from custom func `f`
                        (default: _empty)

Now configuring alpha, beta and gamma.

qpyson examples/config_processor.py examples/config_template.json --alpha 1.23 --beta 2.34 --gamma 3.45

{
  "alpha": 1.23,
  "beta": 2.34,
  "gamma": 3.45,
  "_comment": "Configured with qpyson"
}

Testing

Testing is currently done using RMarkdown using the make target doc.

This should probably be ported to non-R based approach. However, this current approach does keep the docs (e.g., README.md) up to date.

Related JQ-ish tools

https://github.com/dbohdan/structured-text-tools#json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for qpyson, version 0.3.0
Filename, size File type Python version Upload date Hashes
Filename, size qpyson-0.3.0-py3-none-any.whl (8.9 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size qpyson-0.3.0.tar.gz (9.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page