Skip to main content

Python jsonl query engine

Project description

jf

jf, aka json filter pipeline, is a jq-clone written in python. It supports evaluation of python oneliners, making it especially appealing for data scientists who are used to python.

Basic usage

Filter selected fields

$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject})'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}

Filter selected items

$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject}), filter(x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}

Filter selected values

$ cat samples.jsonl | jf 'map(x.id)'
"87086895"
"87114792"

Filter items by age (and output yaml)

$ cat samples.jsonl | jf 'map({id: x.id, datetime: x["content-datetime"]}), filter(age(x.datetime) > age("456 days")),
        map(.update({age: age(x.datetime)}))' --indent=5 --yaml
age: 457 days, 4:07:54.932587
datetime: '2016-10-29 10:55:42+03:00'
id: '87086895'

Sort items by age and print their id, length and age

$ cat samples.jsonl|jf 'map(x.update({age: age(x["content-datetime"])})),
        sorted(x.age),
        map(.id, "length: %d" % len(.content), .age)' --indent=3 --yaml
- '14941692'
- 'length: 63'
- 184 days, 0:02:20.421829
- '90332110'
- 'length: 191'
- 215 days, 22:15:46.403613
- '88773908'
- 'length: 80'
- 350 days, 3:11:06.412088
- '14558799'
- 'length: 1228'
- 450 days, 6:30:54.419461
- '87182405'
- 'length: 251'

Import your own modules and hide fields:

$ cat test.json|jf --import demomodule --yaml 'map(x.update({id: x.sha})),
        demomodule.timestamppipe(),
        hide("sha", "committer", "parents", "html_url", "author", "commit", "comments_url"),
        islice(3,5)'
- Pipemod: was here at 2018-01-31 09:26:12.366465
  id: f5f879dd7303c35fa3712586af1e7df884a5b98b
  url: https://api.github.com/repos/alhoo/jf/commits/f5f879dd7303c35fa3712586af1e7df884a5b98b
- Pipemod: was here at 2018-01-31 09:26:12.368438
  id: b393d09215efc4fc0382dd82ec3f38ae59a287e5
  url: https://api.github.com/repos/alhoo/jf/commits/b393d09215efc4fc0382dd82ec3f38ae59a287e5

Read yaml:

$ cat test.yaml | jf --yamli 'map(x.update({id: x.sha, age: age(x.commit.author.date)})),
        filter(x.age < age("1 days"))' --indent=2 --yaml

Group duplicates (age is within the same hour):

$ cat test.json|jf --import demomodule 'map(x.update({id: x.sha})),
        sorted(.commit.author.date, reverse=True),
        demomodule.DuplicateRemover(int(age(.commit.author.date).total_seconds()/3600),
        group=1).process(lambda x: {"duplicate": x.id}),
        map(list(map(lambda y: {age: age(y.commit.author.date),
        id: y.id, date: y.commit.author.date, duplicate_of: y["duplicate"], comment: y.commit.message}, x))),
        first(2)'
[
  {
    "comment": "Add support for hiding fields",
    "duplicate_of": null,
    "id": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
    "age": "16:19:00.102299",
    "date": "2018-01-30 19:25:30+00:00"
  },
  {
    "comment": "Enhance error handling",
    "duplicate_of": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
    "id": "d3211e1141d8b2bf480cbbebd376b57bae9d8bdf",
    "age": "16:46:58.104188",
    "date": "2018-01-30 18:57:32+00:00"
  }
]
[
  {
    "comment": "Reduce verbosity when debugging",
    "duplicate_of": null,
    "id": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
    "age": "19:26:00.106777",
    "date": "2018-01-30 16:18:30+00:00"
  },
  {
    "comment": "Print help if no input is given",
    "duplicate_of": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
    "id": "b393d09215efc4fc0382dd82ec3f38ae59a287e5",
    "age": "19:35:16.108654",
    "date": "2018-01-30 16:09:14+00:00"
  }
]

Installing

pip install git+https://github.com/alhoo/jf

Features

  • json, jsonl and yaml files for input and output
  • construct generator pipeline with map, hide, filter
  • access json dict as classes with dot-notation for attributes
  • datetime and timedelta comparison
  • age() for timedelta between datetime and current time
  • first(N), last(N), islice(start, stop, step)
  • import your own modules for more complex filtering
  • Support stateful classes for complex interactions between items

Known bugs

  • Datetime-recognition is crude and will probably make mistakes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for jf, version 0.3.3
Filename, size File type Python version Upload date Hashes
Filename, size jf-0.3.3.tar.gz (12.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page