Python jsonl query engine
Project description
jf
==
jf, aka json filter pipeline, is a jq-clone written in python. It supports evaluation of python oneliners, making it
especially appealing for data scientists who are used to python.
Basic usage
==
Filter selected fields
$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject})'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}
Filter selected items
$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject}), filter(x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}
Filter selected values
$ cat samples.jsonl | jf 'map(x.id)'
"87086895"
"87114792"
Filter items by age (and output yaml)
$ cat samples.jsonl | jf 'map({id: x.id, datetime: x["content-datetime"]}), filter(age(x.datetime) > age("456 days")),
map(.update({age: age(x.datetime)}))' --indent=5 --yaml
age: 457 days, 4:07:54.932587
datetime: '2016-10-29 10:55:42+03:00'
id: '87086895'
Sort items by age and print their id, length and age
$ cat samples.jsonl|jf 'map(x.update({age: age(x["content-datetime"])})),
sorted(x.age),
map(.id, "length: %d" % len(.content), .age)' --indent=3 --yaml
- '14941692'
- 'length: 63'
- 184 days, 0:02:20.421829
- '90332110'
- 'length: 191'
- 215 days, 22:15:46.403613
- '88773908'
- 'length: 80'
- 350 days, 3:11:06.412088
- '14558799'
- 'length: 1228'
- 450 days, 6:30:54.419461
- '87182405'
- 'length: 251'
Import your own modules and hide fields:
$ cat test.json|jf --import demomodule --yaml 'map(x.update({id: x.sha})),
demomodule.timestamppipe(),
hide("sha", "committer", "parents", "html_url", "author", "commit", "comments_url"),
islice(3,5)'
- Pipemod: was here at 2018-01-31 09:26:12.366465
id: f5f879dd7303c35fa3712586af1e7df884a5b98b
url: https://api.github.com/repos/alhoo/jf/commits/f5f879dd7303c35fa3712586af1e7df884a5b98b
- Pipemod: was here at 2018-01-31 09:26:12.368438
id: b393d09215efc4fc0382dd82ec3f38ae59a287e5
url: https://api.github.com/repos/alhoo/jf/commits/b393d09215efc4fc0382dd82ec3f38ae59a287e5
Read yaml:
$ cat test.yaml | jf --yamli 'map(x.update({id: x.sha, age: age(x.commit.author.date)})),
filter(x.age < age("1 days"))' --indent=2 --yaml
Group duplicates (age is within the same hour):
$ cat test.json|jf --import demomodule 'map(x.update({id: x.sha})),
sorted(.commit.author.date, reverse=True),
demomodule.DuplicateRemover(int(age(.commit.author.date).total_seconds()/3600),
group=1).process(lambda x: {"duplicate": x.id}),
map(list(map(lambda y: {age: age(y.commit.author.date),
id: y.id, date: y.commit.author.date, duplicate_of: y["duplicate"], comment: y.commit.message}, x))),
first(2)'
[
{
"comment": "Add support for hiding fields",
"duplicate_of": null,
"id": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
"age": "16:19:00.102299",
"date": "2018-01-30 19:25:30+00:00"
},
{
"comment": "Enhance error handling",
"duplicate_of": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
"id": "d3211e1141d8b2bf480cbbebd376b57bae9d8bdf",
"age": "16:46:58.104188",
"date": "2018-01-30 18:57:32+00:00"
}
]
[
{
"comment": "Reduce verbosity when debugging",
"duplicate_of": null,
"id": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
"age": "19:26:00.106777",
"date": "2018-01-30 16:18:30+00:00"
},
{
"comment": "Print help if no input is given",
"duplicate_of": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
"id": "b393d09215efc4fc0382dd82ec3f38ae59a287e5",
"age": "19:35:16.108654",
"date": "2018-01-30 16:09:14+00:00"
}
]
Installing
==
pip install git+https://github.com/alhoo/jf
Features
==
* json, jsonl and yaml files for input and output
* construct generator pipeline with map, hide, filter
* access json dict as classes with dot-notation for attributes
* datetime and timedelta comparison
* age() for timedelta between datetime and current time
* first(N), last(N), islice(start, stop, step)
* import your own modules for more complex filtering
* Support stateful classes for complex interactions between items
Known bugs
==
* Datetime-recognition is crude and will probably make mistakes
==
jf, aka json filter pipeline, is a jq-clone written in python. It supports evaluation of python oneliners, making it
especially appealing for data scientists who are used to python.
Basic usage
==
Filter selected fields
$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject})'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}
Filter selected items
$ cat samples.jsonl | jf 'map({id: x.id, subject: x.fields.subject}), filter(x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}
Filter selected values
$ cat samples.jsonl | jf 'map(x.id)'
"87086895"
"87114792"
Filter items by age (and output yaml)
$ cat samples.jsonl | jf 'map({id: x.id, datetime: x["content-datetime"]}), filter(age(x.datetime) > age("456 days")),
map(.update({age: age(x.datetime)}))' --indent=5 --yaml
age: 457 days, 4:07:54.932587
datetime: '2016-10-29 10:55:42+03:00'
id: '87086895'
Sort items by age and print their id, length and age
$ cat samples.jsonl|jf 'map(x.update({age: age(x["content-datetime"])})),
sorted(x.age),
map(.id, "length: %d" % len(.content), .age)' --indent=3 --yaml
- '14941692'
- 'length: 63'
- 184 days, 0:02:20.421829
- '90332110'
- 'length: 191'
- 215 days, 22:15:46.403613
- '88773908'
- 'length: 80'
- 350 days, 3:11:06.412088
- '14558799'
- 'length: 1228'
- 450 days, 6:30:54.419461
- '87182405'
- 'length: 251'
Import your own modules and hide fields:
$ cat test.json|jf --import demomodule --yaml 'map(x.update({id: x.sha})),
demomodule.timestamppipe(),
hide("sha", "committer", "parents", "html_url", "author", "commit", "comments_url"),
islice(3,5)'
- Pipemod: was here at 2018-01-31 09:26:12.366465
id: f5f879dd7303c35fa3712586af1e7df884a5b98b
url: https://api.github.com/repos/alhoo/jf/commits/f5f879dd7303c35fa3712586af1e7df884a5b98b
- Pipemod: was here at 2018-01-31 09:26:12.368438
id: b393d09215efc4fc0382dd82ec3f38ae59a287e5
url: https://api.github.com/repos/alhoo/jf/commits/b393d09215efc4fc0382dd82ec3f38ae59a287e5
Read yaml:
$ cat test.yaml | jf --yamli 'map(x.update({id: x.sha, age: age(x.commit.author.date)})),
filter(x.age < age("1 days"))' --indent=2 --yaml
Group duplicates (age is within the same hour):
$ cat test.json|jf --import demomodule 'map(x.update({id: x.sha})),
sorted(.commit.author.date, reverse=True),
demomodule.DuplicateRemover(int(age(.commit.author.date).total_seconds()/3600),
group=1).process(lambda x: {"duplicate": x.id}),
map(list(map(lambda y: {age: age(y.commit.author.date),
id: y.id, date: y.commit.author.date, duplicate_of: y["duplicate"], comment: y.commit.message}, x))),
first(2)'
[
{
"comment": "Add support for hiding fields",
"duplicate_of": null,
"id": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
"age": "16:19:00.102299",
"date": "2018-01-30 19:25:30+00:00"
},
{
"comment": "Enhance error handling",
"duplicate_of": "f8ba0ba559e39611bc0b63f236a3e67085fe8b40",
"id": "d3211e1141d8b2bf480cbbebd376b57bae9d8bdf",
"age": "16:46:58.104188",
"date": "2018-01-30 18:57:32+00:00"
}
]
[
{
"comment": "Reduce verbosity when debugging",
"duplicate_of": null,
"id": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
"age": "19:26:00.106777",
"date": "2018-01-30 16:18:30+00:00"
},
{
"comment": "Print help if no input is given",
"duplicate_of": "f5f879dd7303c35fa3712586af1e7df884a5b98b",
"id": "b393d09215efc4fc0382dd82ec3f38ae59a287e5",
"age": "19:35:16.108654",
"date": "2018-01-30 16:09:14+00:00"
}
]
Installing
==
pip install git+https://github.com/alhoo/jf
Features
==
* json, jsonl and yaml files for input and output
* construct generator pipeline with map, hide, filter
* access json dict as classes with dot-notation for attributes
* datetime and timedelta comparison
* age() for timedelta between datetime and current time
* first(N), last(N), islice(start, stop, step)
* import your own modules for more complex filtering
* Support stateful classes for complex interactions between items
Known bugs
==
* Datetime-recognition is crude and will probably make mistakes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jf-0.3.tar.gz
(10.0 kB
view hashes)