Skip to main content

A Python-based stream editor for json documents

Project description

Python-based stream editor for json files. It is a simple setup that effectively works as a json-parsing awk, similar to jq, but allowing in-place editing and output of json documents as well, and using Python as the working language. It supports colorized output.

Motivation

This program exists for fairly minor convenience, and mostly for my own use. Whenever I end up needing to quickly edit some json, I find myself opening a Python REPL, writing a bunch of obvious loading code to load in the json, work on it a little bit, and then dump it back out to the relevant file. Also, whenever I end up needing to inspect json from a web page, I either curl it to a file and then do the same, or use requests or something to pull it directly in a Python REPL so I can properly inspect it, or I pipe it through Python’s json.tool and less.

This is meant to supplant those use cases entirely for my own uses. If you find it inconvenient to repeatedly undergo the busy work associated with working with or inspecting json data, and especially if you are most familiar and comfortable with the stream-editing way of doing things or spending time in a REPL, this tool might make things a little more convenient for you. It is also useful for inspecting and converting between formats, such as between msgpack and json.

Why not jq?

jq is a really great tool for a lot of what you would use this. I wrote this because jq doesn’t provide the user with a REPL to mangle data, and because Python is a much more powerful and flexible language for the modification process, especially if you want to access the filesystem or other I/O.

jq is a powerful program with a lot of development, active maintenence, maintainers, and its own filter language. If that’s what you want, use that. If you want a simple tool for loading json and working on it with python in either a stream or REPL fashion, this is probably a better fit.

Installation

pip install --user pyjawk

Use

Display the help text with -h.

In all evaluated python, data represents the parsed input data.

The program is passed an input string either through stdin or a -i argument, and an output through -o or stdout. -f arguments may pass in script files that are run first. The data object is then serialized and output. -e arguments are similar to -f, but run afterward and run as python source text. -c may be used to enable compact output and may be specified multiple times for some output formats. A positional parameter, if present, is evaluated as a python expression and used to replace the data object.

-I and -O may be used to set the input and output formats, respectively.

-n and -N disable input and output respectively.

If -r / --repl is specified, instead of writing output after processing, the function to write to the output is registered in the environment as write, the arguments structure is registered as args, and a ptpython REPL is started up with the same environment.

Multiple command line tools are available, but they all only set the default input and output formats.

Formats

  • json

    • Available as the command line tool pyjawk

    • Supports 3 levels of compactness.

    • Outputs trailing newline except on highest compaction.

    • Supports colorized output.

  • yaml

    • Available as the command line tool pyyawk

    • Supports 3 levels of compactness.

    • Outputs trailing newline

    • Supports colorized output.

  • xml

    • Available as the command line tool pyxawk

    • Parses into a xml.etree.ElementTree.Element object and dumps as xml text. Uses xml.etree.ElementTree.tostring to dump. and if uncompacted, uses xml.dom.minidom to prettify.

    • Supports 2 levels of compactness.

    • Outputs trailing newline

    • Supports colorized output.

  • python

    • Available as the command line tool pypawk

    • Uses eval to pull in objects, and either pprint or repr to dump, depending on compactness.

    • Supports 3 levels of compactness.

    • Outputs trailing newline.

    • Supports colorized output.

  • msgpack

    • Available as the command line tool pymawk

  • string

    • Available as the command line tool pysawk

    • Simply reads input into a string and outputs data as a string, using str on it before dumping.

    • Outputs trailing newline except when compaction is requested.

  • bytes

    • Available as the command line tool pybawk

    • Simply reads input into bytes and outputs data as bytes.

Examples

Dumping some data to past.ee

$ echo '{"a": "1", "b": null, "c": true, "d": false, "e": 7, "f": 8.5, "g": {"h": [1, 2, 3]}}' | pyjawk '{"sections": [{"contents": str(data)}]}' | curl -H 'Content-Type: application/json' -H 'X-Auth-Token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' -XPOST --data-binary '@-' https://api.paste.ee/v1/pastes
{"id":"umXKr","link":"https:\/\/paste.ee\/p\/umXKr","success":true}

With this, you can also do any arbitrary string data, and also extract the link from the output if you like:

$ echo this is some test data | pyjawk -Istring '{"sections": [{"contents": data}]}' | curl -H 'Content-Type: application/json' -H 'X-Auth-Token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' -XPOST --data-binary '@-' https://api.paste.ee/v1/pastes | pyjawk -Ostring 'data["link"]'
https://paste.ee/p/iomJR

Converting data between formats

$ echo '{"foo": "bar", "baz": ["spam", "Spam", {"SPAM?": "SPAM!"}]}' | pyjawk -Oyaml
baz:
- spam
- Spam
- SPAM?: SPAM!
foo: bar

Selecting a part of a data-structure with evals

$ echo '{"foo": "bar", "baz": ["spam", "Spam", {"SPAM?": "SPAM!"}]}' | pyjawk -c 'data["baz"][2]'
{"SPAM?": "SPAM!"}

Extracting a value as a string

$ echo '{"foo": "bar", "baz": ["spam", "Spam", {"SPAM?": "SPAM!"}]}' | pyjawk -Ostring 'data["baz"][1]'
Spam

Easily embedding string data from stdin into a json structure

$ echo 'this is a test string' | pyjawk -Istring -Ojson -c '{"foo": data}'
{"foo": "this is a test string\n"}

Relocating an xml child

$ echo '<root><foo><bar>first</bar></foo><baz /></root>' | pyxawk -e 'foo = list(data)[0]; bar = list(foo)[0]; baz = list(data)[1]; baz.append(bar); foo.remove(bar)'
<?xml version="1.0" ?>
<root>
  <foo/>
  <baz>
    <bar>first</bar>
  </baz>
</root>

The -e can also be specified separately:

$ echo '<root><foo><bar>first</bar></foo><baz /></root>' | pyxawk -e 'foo = list(data)[0]' -e 'bar = list(foo)[0]' -e 'baz = list(data)[1]' -e 'baz.append(bar)' -e 'foo.remove(bar)'

Or just as a script file:

$ echo '<root><foo><bar>first</bar></foo><baz /></root>' | pyxawk -f relocate.py
foo = list(data)[0]
bar = list(foo)[0]
baz = list(data)[1]
baz.append(bar)
foo.remove(bar)

Exploring a structure in a REPL

$ pyjawk -i<(echo '{"foo": "bar", "baz": ["spam", "Spam", {"SPAM?": "SPAM!"}]}') -r
>>> data
{'foo': 'bar', 'baz': ['spam', 'Spam', {'SPAM?': 'SPAM!'}]}

>>> write()
{
  "foo": "bar",
  "baz": [
    "spam",
    "Spam",
    {
      "SPAM?": "SPAM!"
    }
  ]
}

>>> data = data["baz"]

>>> write()
[
  "spam",
  "Spam",
  {
    "SPAM?": "SPAM!"
  }
]

Fixing Retroarch Playlists

If you had an issue with the way that RetroArch generates its playlist files for the Playstation (by default, it searches for .cue files, but not .bin), and had something like this in /tmp/Roms/psx, all Sony PlayStation games:

Alpha.bin
Alpha.cue
Bravo.bin
Charlie.bin
Delta.bin
Delta.cue

You might end up with a playlist file like this:

{
  "version": "1.2",
  "default_core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
  "default_core_name": "Sony - PlayStation (PCSX ReARMed)",
  "label_display_mode": 0,
  "right_thumbnail_mode": 0,
  "left_thumbnail_mode": 0,
  "items": [
    {
      "path": "/tmp/Roms/psx/Alpha.cue",
      "label": "Alpha",
      "core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
      "core_name": "Sony - PlayStation (PCSX ReARMed)",
      "crc32": "00000000|crc",
      "db_name": "Sony - PlayStation.lpl"
    },
    {
      "path": "/tmp/Roms/psx/Delta.cue",
      "label": "Delta",
      "core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
      "core_name": "Sony - PlayStation (PCSX ReARMed)",
      "crc32": "00000000|crc",
      "db_name": "Sony - PlayStation.lpl"
    }
  ]
}

If you want the file to just have the bins, you can easily scan the directory for these files and modify the json using this tool with this:

$ pyjawk -i 'Sony - PlayStation.lpl' -o 'Sony - PlayStation.lpl' -e 'from pathlib import Path' -e 'data["items"] = [{"path": str(path), "label": path.stem, "core_path": data["default_core_path"], "core_name": data["default_core_name"], "crc32": "00000000|crc", "db_name": "Sony - PlayStation.lpl"} for path in (Path("/tmp") / "Roms" / "psx").iterdir() if path.suffix == ".bin"]'

Making the output

{
  "version": "1.2",
  "default_core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
  "default_core_name": "Sony - PlayStation (PCSX ReARMed)",
  "label_display_mode": 0,
  "right_thumbnail_mode": 0,
  "left_thumbnail_mode": 0,
  "items": [
    {
      "path": "/tmp/Roms/psx/Delta.bin",
      "label": "Delta",
      "core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
      "core_name": "Sony - PlayStation (PCSX ReARMed)",
      "crc32": "00000000|crc",
      "db_name": "Sony - PlayStation.lpl"
    },
    {
      "path": "/tmp/Roms/psx/Charlie.bin",
      "label": "Charlie",
      "core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
      "core_name": "Sony - PlayStation (PCSX ReARMed)",
      "crc32": "00000000|crc",
      "db_name": "Sony - PlayStation.lpl"
    },
    {
      "path": "/tmp/Roms/psx/Bravo.bin",
      "label": "Bravo",
      "core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
      "core_name": "Sony - PlayStation (PCSX ReARMed)",
      "crc32": "00000000|crc",
      "db_name": "Sony - PlayStation.lpl"
    },
    {
      "path": "/tmp/Roms/psx/Alpha.bin",
      "label": "Alpha",
      "core_path": "/tmp/retroarch/cores/pcsx_rearmed_libretro.so",
      "core_name": "Sony - PlayStation (PCSX ReARMed)",
      "crc32": "00000000|crc",
      "db_name": "Sony - PlayStation.lpl"
    }
  ]
}

That might look heavy up-front, but you can rewrite it as a script file with simpler structure:

from pathlib import Path

data["items"] = []

for path in (Path('/tmp') / 'Roms' / 'psx').iterdir():
  if path.suffix == '.bin':
    data["items"].append({
         "path": str(path),
         "label": path.stem,
         "core_path": data["default_core_path"],
         "core_name": data["default_core_name"],
         "crc32": "00000000|crc",
         "db_name": "Sony - PlayStation.lpl",
    })

and run it with pyjawk as so:

pyjawk -i 'Sony - PlayStation.lpl' -o 'Sony - PlayStation.lpl' -f script.py

Or instead load it into a repl to work on it in real time with this:

pyjawk -i 'Sony - PlayStation.lpl' -o 'Sony - PlayStation.lpl' -r
>>> from pathlib import Path

>>> data["items"] = []

>>> for path in (Path('/tmp') / 'Roms' / 'psx').iterdir():
...     if path.suffix == '.bin':
...         data["items"].append({
...             "path": str(path),
...             "label": path.stem,
...             "core_path": data["default_core_path"],
...             "core_name": data["default_core_name"],
...             "crc32": "00000000|crc",
...             "db_name": "Sony - PlayStation.lpl",
...             })

>>> write()

>>> exit()

Just make sure you call write() in the repl, or nothing will be written.

Plans

I don’t plan to add too much to this, as I want it to be useful but also as lean and manageable as it possibly can be. Things like HTTP input and output are best left to other programs that can do it better, like curl, especially given that this program can operate in a streamable fashion.

This program needs some regression tests set up.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pyjawk-1.1.0-py3-none-any.whl (16.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page