Skip to main content

This plugin allows to submit mongo queries and aggregation pipelines directly an underlying MongoDB.

Project description

dtool pypi tag test zenodo

Features

  • Query datasets via mongo language

  • Funnel datasets through aggregation pipelines

Introduction

dtool is a command line tool for packaging data and metadata into a dataset. A dtool dataset manages data and metadata without the need for a central database.

However, if one has to manage more than a hundred datasets it can be helpful to have the datasets’ metadata stored in a central server to enable one to quickly find datasets of interest.

The dservercore provides a web API for registering datasets’ metadata and provides functionality to lookup, list and search for datasets.

This plugin allows to submit plain mongo queries and aggregation pipelines directly to the lookup server.

Configuration

Inform this plugin about the Mongo database to use by setting the environment variables

export DSERVER_MONGO_URI="mongodb://localhost:27017/"
export DSERVER_MONGO_DB="dserver"
export DSERVER_MONGO_COLLECTION="metadata"

If the Mongo search and retrieve plugins are used, then you may use the same database, but must use a different collection.

Use

export DSERVER_ALLOW_DIRECT_QUERY=true
export DSERVER_ALLOW_DIRECT_AGGREGATION=false

to enable or disable direct mongo query and aggregation on this plugin.

ATTENTION: While direct queries respect user-wise access rights to database entries on the lookup server level, there is no guarantee for aggregation pipelines to do so per design. Don not enable direct aggregation in a production environment.

Authentication

The dtool lookup server makes use of the authorized header to pass through the JSON web token for authorization. Below we create environment variables for the token and the header used in the following curl command samples

$ TOKEN=$(flask user token test-user)
$ HEADER="Authorization: Bearer $TOKEN"

Refer to the core dcumentation of dservercore for more information.

Direct query

To look for a sepcific field key2: 42 in a dataset’s README.yml (provided the file is properly YAML-formatted), use

$ curl -H "$HEADER" -H "Content-Type: application/json" -X POST \
    -d '{"query": {"readme.key2": 42}}' http://localhost:5000/mongo/query

Response content:

[
  {
    "base_uri": "s3://test-bucket",
    "created_at": 1683797360.056,
    "creator_username": "jotelha",
    "dtoolcore_version": "3.18.2",
    "frozen_at": 1683797362.855,
    "name": "test_dataset_2",
    "number_of_items": 1,
    "size_in_bytes": 19347,
    "tags": [],
    "type": "dataset",
    "uri": "s3://test-bucket/26785c2a-e8f8-46bf-82a1-cec92dbdf28f",
    "uuid": "26785c2a-e8f8-46bf-82a1-cec92dbdf28f"
  }
]

Next to the content of the README.yml, other fields of the database-internal dataset representation returned in the example above are directly queryable as well. All queries are formulated in the MongoDB language. The MongoDB documenatation offers information on how to formulate queries. The list of available query operators is particularly useful. The following illustrates a few other possible JSON-like query documents.

'{"base_uri":{"$regex":"^s3"}}' will find all datasets whose base URI matches the provided regular expression, here any s3-prefixed string.

{"readme.owners.name": {"$regex": "Testing User"}} will match any dataset with a README field that contains the sub string Testing User, such as

owners:
- name: A user who does not match the search pattern
  username: test_user
- name: Another Testing User matches the search pattern
  username: another_test_user

The query

{
  "creator_username": "jotelha",
  "readme.parameters.temperature": 298
}

will match all datasets created by user jotelha and annotated with:

parameters:
  temperature: 298

in its README.yml.

Direct aggregation

The following example of an aggregation pipeline identifies and counts instances of the same dataset at different base URIs:

$ curl -H "$HEADER" -H "Content-Type: application/json" -X POST \
    -d '{"aggregation": [
            {
                "$sort": {"base_uri": 1}
            }, {
                "$group":  {
                    "_id": "$name",
                    "count": {"$sum": 1},
                    "available_at": {"$push": "$base_uri"}
                }
            }, {
                "$project": {
                    "name": "$_id",
                    "count": true,
                    "available_at": true,
                    "_id": false
                }
            }, {
                "$sort": {"name": 1}
            }
        ]
    }' http://localhost:5000/mongo/aggregate

Response content:

[
  {
    "available_at": [
      "s3://test-bucket"
    ],
    "count": 1,
    "name": "test_dataset_1"
  },
  {
    "available_at": [
      "s3://test-bucket",
      "smb://test-share"
    ],
    "count": 2,
    "name": "test_dataset_2"
  }
]

Testing

Running unit tests with pytest requires a healthy lookup server installation and the availability of required services such as databases. Please refer to the core dservercore for setup instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dserver_direct_mongo_plugin-0.4.2.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dserver_direct_mongo_plugin-0.4.2-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file dserver_direct_mongo_plugin-0.4.2.tar.gz.

File metadata

File hashes

Hashes for dserver_direct_mongo_plugin-0.4.2.tar.gz
Algorithm Hash digest
SHA256 0ffd5e37ac1a895a5746ebae23b9f404e401f8ac37e1f062651f83d9a0b683d1
MD5 8d39f318c955bd8e673c5795074ca375
BLAKE2b-256 0113e5f4571cf84159aa921318a95841e6698732493c9f7159f270336e333de4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dserver_direct_mongo_plugin-0.4.2.tar.gz:

Publisher: publish.yml on livMatS/dserver-direct-mongo-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dserver_direct_mongo_plugin-0.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for dserver_direct_mongo_plugin-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e010eb3c3c43ab182dd20eea93664d96a3b0f647dc176dc9832b63bdeafb5c27
MD5 70bd2649c04eb130f48c967f6c490a11
BLAKE2b-256 809c77bcba9cdd0b2411cef35bf8088dc567b698795e0c862f838ab7811abd37

See more details on using hashes here.

Provenance

The following attestation bundles were made for dserver_direct_mongo_plugin-0.4.2-py3-none-any.whl:

Publisher: publish.yml on livMatS/dserver-direct-mongo-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page