Extend dtool-lookup-server with ability to filter by annotations
Project description
GitHub: https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin
PyPI: https://pypi.python.org/pypi/dtool-lookup-server-annotation-filter-plugin
Free software: MIT License
Introduction
This dtool-lookup-server plugin adds the ability to get an overview of the dataset a user has got access to based on how those datasets have been annotated with key/value pairs.
The purpose of this API is to give users an overview of all the datasets available to them and to allow them to drill down on those results by filtering based upon keys and key/value pairs.
This API could be used to build a webapp that allows users to get an “eagle-eye” view of their data.
Installation
This plugin depends on having installed and configured a dtool-lookup-server. This plugin can then be installed by running the commands below.
git clone https://github.com/jic-dtool/dtool-lookup-server-annotation-filter-plugin.git cd dtool-lookup-server-annotation-filter-plugin python setup.py install
See dtool-lookup-server for more information about the setup of the base system.
Routes
This plugin has five routes.
POST /annotation_filter_plugin/annotation_keys
POST /annotation_filter_plugin/annotation_values
POST /annotation_filter_plugin/num_datasets
POST /annotation_filter_plugin/datasets
GET /annotation_filter_plugin/version
The first gives access to all annotations keys that have are present on at least one dataset with a basic value. The keys will only be extracted from datasets that pass any annotation filter in the post request. The response from this route includes information about the number of datasets associated with each key.
The second gives access to all values for the keys specified in the post request. The values will only be extracted from the datasets that pass the annotation filter in the post request. The response form this route includes information about the number of datasets associated with each key/value pair.
The third gives the number of datasets given a particular annotation filter.
The fourth gives the list of datasets given a particular annotation filter.
The fifth returns the version of the plugin.
Filter syntax
Below are examples of JSON queries that can be posted to the routes.
No filters, i.e. get all (this only really makes sense for the /annotation_filter_plugin/annotation_keys route).
{}
Get only datasets that have the key “color”:
{ "annotation_keys": ["color"] }
Get only datasets that have the “color” is set to “red”:
{ "annotations": {"color": "red"} }
Get only datasets that have both the keys “color” and “pattern”:
{ "annotation_keys": ["color", "pattern"] }
Get only datasets that have the “color” is set to “red” and “pattern” set to “stripey”:
{ "annotations": {"color": "red", "pattern": "stripey"} }
Get only datasets that have the keys “color” and “pattern” and where the “color” is set to “red”:
{ "annotation_keys": ["color", "pattern"], "annotations": {"color": "red"} }
Limitations
This plugin only recognises annotations where the value is a basic type, such as a string, a number or a boolean value. In other words a dataset’s annotations where the value is a data structures such as lists and dictionaries will be ignored.
Datasets that do not have any annotation with a basic type as a value will not be recognised up by this plugin.
Usage
Preparation
The dtool lookup server makes use of the Authorization header to pass through the JSON web token for authorization. Below we create environment variables for the token and the header used in the curl commands:
$ TOKEN=$(flask user token olssont) $ HEADER="Authorization: Bearer $TOKEN"
Find keys available for filtering and the number of datasets associated with them
The command below finds all annotations keys available for further filtering:
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below means that the annotation key “color” has 120 datasets associated with it and the annotation key “pattern” has 50 datasets associated with it.
{"color": 120, "pattern": 50, "size": 10}
Suppose that one chooses to filter further based on the “pattern” annotation key. Using the command below one could find the annotation keys that are still relevant given that each dataset has to have the annotation key “pattern”.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["pattern"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that no datasets that remain have the key “size” and 45 of the datasets with the key “pattern” also have the key “color”.
{"color": 45, "pattern": 50}
It is possible to filter based on an annotation key/value pair. For example, to limit the datasets to the case where the “pattern” is “stripey” one could use the command below.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that this is more specific and that there are fewer results.
{"color": 5, "pattern": 10}
It is possible to make more complex queries. The command below also requires that the datasets have the key “color”.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
In the response below there are now fewer datasets with the “pattern” key. That is because some of the datasets that were picked up previously did not have the “color” key.
{"color": 5, "pattern": 3}
It is also possible to filter using base URIs. The command below limits the keys to those from the base URIs “s3://snow-white” and “s3://mr-men”:
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"base_uris": ["s3://snow-white", "s3://mr-men"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that there are fewer hits than when all base URIs are included.
{"color": 77, "pattern": 35, "size": 4}
Find annotations available for filtering and the number of datasets associated with them
The pattern for finding annotation key/value pairs and the number of datasets assocated with them is similar to that of finding the keys (above).
The command below can be used to find all the values associated with the “color” key and the number of datasets that has been annotated with each particular value.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_values
The response below shows that there are five colors available and that most datasets have the color “red”.
{ "color": { "red": 50, "pink": 30, "blue": 20, "green": 15, "yellow": 5 } }
To get data for more keys they need to be included in the filter. The command below returns the datasets that have annotations for both “color” and “pattern”.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color", "pattern"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_values
The response contains less colors because some of the datasets annotated with color did not have a pattern annotation.
{ "color": { "red": 15, "pink": 10, "blue": 10, "green": 10 } "pattern": { "stripey": 40, "wavy": 10 }
It is possible to make more specific queries. The command below also requires that the datasets have the stripey pattern.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"], "annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that fewer datasets have been used to collect the annotation information.
{ "color": { "red": 15, "pink": 10, "blue": 10, "green": 5 } "pattern": { "stripey": 40, }
It is also possible to filter using base URIs. The command below limits the keys to those from the base URIs “s3://snow-white” and “s3://mr-men”:
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotation_keys": ["color"], "base_uris": ["s3://snow-white", "s3://mr-men"]}' \ http://localhost:5000/annotation_filter_plugin/annotation_keys
The response below shows that there are fewer hits than when all base URIs are included.
{ "color": { "red": 50, "pink": 20, "blue": 7, } }
Listing the number of datasets available for a particular filter
The number of datasets selected, using a particular filter, can be determined using the /annotation_filter_plugin/num_datasets route. The command below selects all datasets with at least one basic value (see the section below on limitations for an explanation of what a basic value is).
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{}' \ http://localhost:5000/annotation_filter_plugin/num_datasets
The response below shows that there are 145 such datasets.
145
The command below uses a filter to select only datasets that have the key/value pair “pattern”/”stripey”.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/num_datasets
The response shows that there are 10 such datasets.
10
Retrieving information about datasets selected by a particular filter
It is possible to get information about the datasets selected by a particular filter using the /annotation_filter_plugin/datasets route. The command below uses a filter to select only datasets that have the key/value pair “pattern”/”stripey”.
$ curl -H "$HEADER" -H "Content-Type: application/json" \ -X POST -d '{"annotations": {"pattern": "stripey"}}' \ http://localhost:5000/annotation_filter_plugin/datasets
Below is a truncated version of the response.
[ { "annotations": { "pattern": "stripey }, "base_uri": "s3://dtool-demo", "created_at": "1530803916.74", "creator_username": "olssont", "dtoolcore_version": "3.3.0", "frozen_at": "1536749825.85", "name": "hypocotyl3", "type": "dataset", "uri": "s3://dtool-demo/ba92a5fa-d3b4-4f10-bcb9-947f62e652db", "uuid": "ba92a5fa-d3b4-4f10-bcb9-947f62e652db" } ... ]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for dtool-lookup-server-annotation-filter-plugin-0.2.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bea6eef9709485662ba5acf2a5328d038d1fc65d64cef5e65997e010cd2a6c61 |
|
MD5 | 5f57cdf75436e3135be167d0ae16758b |
|
BLAKE2b-256 | a052327e72eba0539b029b8ef69329ebefa2b7f8879ee12bf321e81f57a2525f |