Skip to main content

DRB Metadata Extractor

Project description

DRB Metadata Extractor

It's an applicative part using DRB allowing to extract metadata from data according its topic.

Metadata

How to extract metadata ?

from drb.factory import DrbFactoryResolver
from drb_metadata import DrbMetadataResolver


if __name__ == '__main__':
    node = DrbFactoryResolver().create('<my_resource_url>')
    metadata = DrbMetadataResolver().get_metadata(node)
    for md_name, md in metadata.items():
        print(md_name, ' -- ', md.extract(node))

How to define metadata ?

Metadata are defined in a cortex.yaml file following the template:

drbItemClass: <topic_uuid>           # target topic
variables:                           # variable list
  - name: <var_name>                   # variable name
    <extractor>: <extractor_content>   # an extractor
metadata:                            # metadata list
  - name: my_metadata                  # metadata name
    <extractor>: <extractor_content>   # an extractor
  • metadata are applied to their target topic and its derivatives
  • inherited metadata is override if it's redefined in a derivative topic
  • variables are not transitive between a topic and its derivatives

Extractor

An extractor as its name suggests allowing to extract information/data from a node. An extractor is defined by a YAML content. Three extractor types exists currently:

Constant

This extractor nothing from the node but give always the same value.

constant: 42

Some string values are automatically converted to a specific Python type:

Value Python type
2022-01-01 datatime.date
2022-01-01T00:00:00.000Z datatime.datetime

XQuery

This extractor allowing to extract data from the node via an XQuery script. See more details about XQuery.

xquery: |
  data(./manifest.safe/XFDU/metadataSection/
  metadataObject[@ID="generalProductInformation"]/metadataWrap/xmlData/
    *[matches(name(),"standAloneProductInformation|generalProductInformation")]/
    noiseCompressionType)

Python

The Python extractor allowing to extract data from a node via a Python script. Where the node variable represents the current node.

python: |
  return node['DATASTRIP'][0]['MTD_DS.xml']['Level-1C_DataStrip_ID']
      ['General_Info']['Datatake_Info'].get_attribute('datatakeIdentifier')

example:

drbItemClass: aff2191f-5b06-4121-a9fa-f3d93f6c6331
variables:
  - name: node_platform
    xquery: |
      ./manifest.safe/XFDU/metadataSection/metadataObject[@ID="platform"]/
        metadataWrap/xmlData/platform
metadata:
  - name: 'platformName'
    constant: 'Sentinel-1'
  - name: 'SatelliteNumber'
    xquery: |
      declare variable $node_platform external;
      data($node_platform/number)
  - name: 'platformIdentifier'
    python: |
      return node_platform['nssdcIdentifier'].value
  - name: 'resolutionDetail'
    python: |
      resolution = node.name[10:11]
      if resolution == 'F':
        return 'Full'
      elif resolution == 'H':
        return 'High'
      elif resolution == 'M':
        return 'Medium'
      return None

Packaging

The package python containing metadata of a DRB topic must have the following instruction:

  • a drb.metadata entry point whose its value is the targeted Python package containing the cortex.yaml file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drb-metadata-1.0.1.tar.gz (24.9 kB view hashes)

Uploaded Source

Built Distribution

drb_metadata-1.0.1-py3-none-any.whl (7.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page