Skip to main content

Python jsonl query engine

Project description

jf
==

jf, aka json filter pipeline, is a jq-clone written in python. It supports evaluation of python oneliners, making it
especially appealing for data scientists who are used to python.

Basic usage
==

Filter selected fields

$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject})'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}

Filter selected items

$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject}), filter(x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}

Filter selected values

$ cat samples.jsonl | jf 'map(x.id)'
"87086895"
"87114792"

Filter items by age (and output yaml)

$ cat samples.jsonl | jf 'map({id: x.id, datetime: x["content-datetime"]}), filter(age(x.datetime) > age("456 days")),
map(.update({age: age(x.datetime)}))' --indent=5 --yaml
age: 457 days, 4:07:54.932587
datetime: '2016-10-29 10:55:42+03:00'
id: '87086895'

Sort items by age and print their id, length and age

$ cat samples.jsonl|jf 'map(x.update({age: age(x["content-datetime"])})),
sorted(x.age),
map(.id, "length: %d" % len(.content), .age)' --indent=3 --yaml
- '14941692'
- 'length: 63'
- 184 days, 0:02:20.421829
- '90332110'
- 'length: 191'
- 215 days, 22:15:46.403613
- '88773908'
- 'length: 80'
- 350 days, 3:11:06.412088
- '14558799'
- 'length: 1228'
- 450 days, 6:30:54.419461
- '87182405'
- 'length: 251'

Import your own modules and hide fields:

$ cat test.json|jf --import demomodule --yaml 'map(x.update({id: x.sha})),
demomodule.timestamppipe(),
hide("sha", "committer", "parents", "html_url", "author", "commit", "comments_url"),
islice(3,5)'
- Pipemod: was here at 2018-01-31 09:26:12.366465
id: f5f879dd7303c35fa3712586af1e7df884a5b98b
url: https://api.github.com/repos/alhoo/jf/commits/f5f879dd7303c35fa3712586af1e7df884a5b98b
- Pipemod: was here at 2018-01-31 09:26:12.368438
id: b393d09215efc4fc0382dd82ec3f38ae59a287e5
url: https://api.github.com/repos/alhoo/jf/commits/b393d09215efc4fc0382dd82ec3f38ae59a287e5

Read yaml:

$ cat test.yaml | jf --yamli 'map(x.update({id: x.sha, age: age(x.commit.author.date)})),
filter(x.age < age("1 days"))' --indent=2 --yaml

Group duplicates (age is within the same hour):

$ cat test.json|jf --import demomodule 'map(x.update({id: x.sha})),
sorted(.commit.author.date, reverse=True),
demomodule.DuplicateRemover(int(age(.commit.author.date).total_seconds()/3600),
group=1).process(lambda x: {"duplicate": x.id}),
map(list(map(lambda y: {age: age(y.commit.author.date),
id: y.id, date: y.commit.author.date, duplicate_of: y["duplicate"], comment: y.commit.message}, x))),
first(2)'
[
{
"comment": "Add support for hiding fields",
"duplicate_of": null,
"id": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
"age": "16:19:00.102299",
"date": "2018-01-30 19:25:30+00:00"
},
{
"comment": "Enhance error handling",
"duplicate_of": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
"id": "d3211e1141d8b2bf480cbbebd376b57bae9d8bdf",
"age": "16:46:58.104188",
"date": "2018-01-30 18:57:32+00:00"
}
]
[
{
"comment": "Reduce verbosity when debugging",
"duplicate_of": null,
"id": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
"age": "19:26:00.106777",
"date": "2018-01-30 16:18:30+00:00"
},
{
"comment": "Print help if no input is given",
"duplicate_of": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
"id": "b393d09215efc4fc0382dd82ec3f38ae59a287e5",
"age": "19:35:16.108654",
"date": "2018-01-30 16:09:14+00:00"
}
]






Installing
==

pip install git+https://github.com/alhoo/jf


Features
==

* json, jsonl and yaml files for input and output
* construct generator pipeline with map, hide, filter
* access json dict as classes with dot-notation for attributes
* datetime and timedelta comparison
* age() for timedelta between datetime and current time
* first(N), last(N), islice(start, stop, step)
* import your own modules for more complex filtering
* Support stateful classes for complex interactions between items

Known bugs
==

* Datetime-recognition is crude and will probably make mistakes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jf-0.3.tar.gz (10.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page