Skip to main content

This plugin allows to submit mongo queries and aggregation

Project description

GitHub Workflow Status PyPI GitHub tag (latest by date)

Features

  • Query datasets via mongo language

  • Funnel datasets through aggregation pipelines

Introduction

dtool is a command line tool for packaging data and metadata into a dataset. A dtool dataset manages data and metadata without the need for a central database.

However, if one has to manage more than a hundred datasets it can be helpful to have the datasets’ metadata stored in a central server to enable one to quickly find datasets of interest.

The dtool-lookup-server provides a web API for registering datasets’ metadata and provides functionality to lookup, list and search for datasets.

This plugin allows to submit plain mongo queries and aggregation pipelines directly to the lookup server.

Configuration

Inform this plugin about the Mongo database to use by setting the environment variables:

export MONGO_URI="mongodb://localhost:27017/"
export MONGO_DB="dtool_lookup_server"
export MONGO_COLLECTION="metadata"

If the Mongo search and retrieve plugins are used, then you may use the same database, but must use a different collection.

Use

export ALLOW_DIRECT_QUERY=true export ALLOW_DIRECT_AGGREGATION=false

to enable or disable direct mongo query and aggregation on this plugin.

ATTENTION: While direct queries respect user-wise access rights to database entries on the lookup server level, there is no guarantee for aggregation pipelines to do so per design. Don not enable direct aggregation in a production environment.

Authentication

The dtool lookup server makes use of the authorized header to pass through the JSON web token for authorization. Below we create environment variables for the token and the header used in the following curl command samples:

$ TOKEN=$(flask user token test-user)
$ HEADER="Authorization: Bearer $TOKEN"

Refer to the core dcumentation of dtool-lookup-server for more information.

Direct query

To look for a sepcific field key2: 42 in a dataset’s README.yml (provided the file is properly YAML-formatted), use

$ curl -H “$HEADER” -H “Content-Type: application/json” -X POST

-d ‘{“query”: {“readme.key2”: 42}}’ http://localhost:5000/mongo/query

Response content:

[
  {
    "base_uri": "s3://test-bucket",
    "created_at": 1683797360.056,
    "creator_username": "jotelha",
    "dtoolcore_version": "3.18.2",
    "frozen_at": 1683797362.855,
    "name": "test_dataset_2",
    "number_of_items": 1,
    "size_in_bytes": 19347,
    "tags": [],
    "type": "dataset",
    "uri": "s3://test-bucket/26785c2a-e8f8-46bf-82a1-cec92dbdf28f",
    "uuid": "26785c2a-e8f8-46bf-82a1-cec92dbdf28f"
  }
]

Direct aggregation

The following example of an aggregation pipeline identifies and counts instances of the same dataset at different base URIs:

$ curl -H "$HEADER" -H "Content-Type: application/json" -X POST \
    -d '{"aggregation": [
            {
                "$sort": {"base_uri": 1}
            }, {
                "$group":  {
                    "_id": "$name",
                    "count": {"$sum": 1},
                    "available_at": {"$push": "$base_uri"}
                }
            }, {
                "$project": {
                    "name": "$_id",
                    "count": true,
                    "available_at": true,
                    "_id": false
                }
            }, {
                "$sort": {"name": 1}
            }
        ]
    }' http://localhost:5000/mongo/aggregate

Response content:

[
  {
    "available_at": [
      "s3://test-bucket"
    ],
    "count": 1,
    "name": "test_dataset_1"
  },
  {
    "available_at": [
      "s3://test-bucket",
      "smb://test-share"
    ],
    "count": 2,
    "name": "test_dataset_2"
  }
]

Testing

Running unit tests with pytest requires a healthy lookup server installation and the availability of required services such as databases. Please refer to the core dtool-lookup-server for setup instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file dtool-lookup-server-direct-mongo-plugin-0.2.0.tar.gz.

File metadata

File hashes

Hashes for dtool-lookup-server-direct-mongo-plugin-0.2.0.tar.gz
Algorithm Hash digest
SHA256 827e6f358d4f5322f298ecee488c45ab12c745be00fd2bc6cd072aca20b25c75
MD5 336d42ca2506054a38d56d89a8e7ed52
BLAKE2b-256 132c197fdd5865596fca67ea3cfb12d5ef8dd87ea00dd57b187d4c748e550e16

See more details on using hashes here.

File details

Details for the file dtool_lookup_server_direct_mongo_plugin-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dtool_lookup_server_direct_mongo_plugin-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46cea176dd0f189029dbafbb12bba6f1b3404b07363b7449713eb11f91c49bdf
MD5 321fcc0b7766c86fdefece191da38a09
BLAKE2b-256 9e7982708f342cfc383cefc4a2529a40148446c6b8f15274e763b5ac571ac60b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page