Skip to main content

Python jsonl query engine

Project description


JF, aka “jndex fingers” or more commonly “json filter pipeline”, is a jq-clone written in python. It supports evaluation of python oneliners, making it especially appealing for data scientists who are used to working with python.

How does it work

JF works by converting streaming json or yaml data structure through a map/filter-pipeline. The pipeline is compiled from a string representing a comma-separated list filters and mappers. The query parser assumes that each function of the pipeline reads items from a generator. The generator is given as the last non-keyword parameter to the function, so “map(conversion)” is interpreted as “map(conversion, inputgenerator)”. The result from a previous function is given as the input generator for the next function in the pipeline.

Some built-in functions headers have been remodeled to be more intuitive with the framework. Most noticeable is the sorted-function, which normally has the key defined as a keyword argument. This was done since it seems more logical to sort items by id by writing “sorted(” than “sorted(key=lambda x:”.

Basic usage

Filter selected fields

$ cat samples.jsonl | jf 'map({id:, subject: x.fields.subject})'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}

Filter selected items

$ cat samples.jsonl | jf 'map({id:, subject: x.fields.subject}), filter( == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}

Filter selected values

$ cat samples.jsonl | jf 'map('

Filter items by age (and output yaml)

$ cat samples.jsonl | jf 'map({id:, datetime: x["content-datetime"]}), filter(age(x.datetime) > age("456 days")),
        map(.update({age: age(x.datetime)}))' --indent=5 --yaml
age: 457 days, 4:07:54.932587
datetime: '2016-10-29 10:55:42+03:00'
id: '87086895'

Sort items by age and print their id, length and age

$ cat samples.jsonl|jf 'map(x.update({age: age(x["content-datetime"])})),
        map(.id, "length: %d" % len(.content), .age)' --indent=3 --yaml
- '14941692'
- 'length: 63'
- 184 days, 0:02:20.421829
- '90332110'
- 'length: 191'
- 215 days, 22:15:46.403613
- '88773908'
- 'length: 80'
- 350 days, 3:11:06.412088
- '14558799'
- 'length: 1228'
- 450 days, 6:30:54.419461
- '87182405'
- 'length: 251'

Filter items after a given datetime (test.json is a git commit history):

$ jf 'map(.update({age: age(})),filter(date( > date("2018-01-30T17:00:00Z")),sorted(x.age, reverse=True), map(.sha, .age,' test.json
  "2 days, 9:40:12.137919",
  "2 days, 9:18:07.134418",
  "2 days, 8:50:09.129790",

Import your own modules and hide fields:

$ cat test.json|jf --import demomodule --yaml 'map(x.update({id: x.sha})),
        hide("sha", "committer", "parents", "html_url", "author", "commit", "comments_url"),
- Pipemod: was here at 2018-01-31 09:26:12.366465
  id: f5f879dd7303c35fa3712586af1e7df884a5b98b
- Pipemod: was here at 2018-01-31 09:26:12.368438
  id: b393d09215efc4fc0382dd82ec3f38ae59a287e5

Read yaml:

$ cat test.yaml | jf --yamli 'map(x.update({id: x.sha, age: age(})),
        filter(x.age < age("1 days"))' --indent=2 --yaml

Group duplicates (age is within the same hour):

$ cat test.json|jf --import demomodule 'map(x.update({id: x.sha})),
        sorted(, reverse=True),
        group=1).process(lambda x: {"duplicate":}),
        map(list(map(lambda y: {age: age(,
        id:, date:, duplicate_of: y["duplicate"], comment: y.commit.message}, x))),
    "comment": "Add support for hiding fields",
    "duplicate_of": null,
    "id": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
    "age": "16:19:00.102299",
    "date": "2018-01-30 19:25:30+00:00"
    "comment": "Enhance error handling",
    "duplicate_of": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
    "id": "d3211e1141d8b2bf480cbbebd376b57bae9d8bdf",
    "age": "16:46:58.104188",
    "date": "2018-01-30 18:57:32+00:00"
    "comment": "Reduce verbosity when debugging",
    "duplicate_of": null,
    "id": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
    "age": "19:26:00.106777",
    "date": "2018-01-30 16:18:30+00:00"
    "comment": "Print help if no input is given",
    "duplicate_of": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
    "id": "b393d09215efc4fc0382dd82ec3f38ae59a287e5",
    "age": "19:35:16.108654",
    "date": "2018-01-30 16:09:14+00:00"


pip install jf


  • json, jsonl and yaml files for input and output
  • construct generator pipeline with map, hide, filter
  • access json dict as classes with dot-notation for attributes
  • datetime and timedelta comparison
  • age() for timedelta between datetime and current time
  • first(N), last(N), islice(start, stop, step)
  • import your own modules for more complex filtering
  • Support stateful classes for complex interactions between items
  • Drop your filtered data to IPython for manual data exploration

Known bugs

  • IPython doesn’t launch perfectly with piped data

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for jf, version 0.3.5
Filename, size File type Python version Upload date Hashes
Filename, size jf-0.3.5.tar.gz (16.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page