Skip to main content

ManGO metadata conversion

Project description

Convert iRODS Metadata into a Python dictionary

The md2dict module of mango_mdconverter creates Python dictionaries by flattening namespaced iRODS metadata items. This can be done:

  • naively with regards to the semantics, simply unnesting the namespacing
    • also ignoring units
    • returning value-units tuples if units exist
  • reorganizing the dictionary to bring ManGO schemas together and “analysis” metadata together

The module can be imported like so:

from mango_mdconverter import md2dict

Example

To understand this better, let’s look at some examples. We’ll simulate a set of metadata from an iRODS item, and it looks like so:

from irods.meta import iRODSMeta

metadata_items = [
    iRODSMeta("mgs.book.author.name", "Fulano De Tal", "1"),
    iRODSMeta("mgs.book.author.age", "50", "1"),
    iRODSMeta("mgs.book.author.pet", "cat", "1"),
    iRODSMeta("mgs.book.author.name", "Jane Doe", "2"),
    iRODSMeta("mgs.book.author.age", "29", "2"),
    iRODSMeta("mgs.book.author.pet", "cat", "2"),
    iRODSMeta("mgs.book.author.pet", "parrot", "2"),
    iRODSMeta("mgs.book.title", "A random book title"),
    iRODSMeta("mg.mime_type", "text/plain"),
    iRODSMeta("page_n", "567", "analysis/reading"),
    iRODSMeta("chapter_n", "15", "analysis/reading"),
]

Naive conversion

The unflatten_namespace_into_dict() function updates a dictionary with the name-value pairs of an AVU, and optionally with the units as well. Given a dictionary metadict, we can provide it an AVU name and value to either add the respective keys and values to the dictionary or, if the key already exists, to append the value to the list of values.

metadict = {}
md2dict.unflatten_namespace_into_dict(metadict, "AVU_name", "AVU_value")
metadict
{'AVU_name': 'AVU_value'}

Metadata names with dots will be assumed to be namespaced: they will be split and their values will become dictionaries themselves.

metadict = {}
md2dict.unflatten_namespace_into_dict(metadict, "level1.level2.level3", "AVU_value")
metadict
{'level1': {'level2': {'level3': 'AVU_value'}}}

For a full list of metadata items, such as the output of the .metadata.items() method of an iRODS data object or collection, we could loop over the iterable:

metadict = {}
for avu in metadata_items:
    md2dict.unflatten_namespace_into_dict(metadict, avu.name, avu.value)
metadict
{'mgs': {'book': {'author': {'name': ['Fulano De Tal', 'Jane Doe'],
    'age': ['50', '29'],
    'pet': ['cat', 'cat', 'parrot']},
   'title': 'A random book title'}},
 'mg': {'mime_type': 'text/plain'},
 'page_n': '567',
 'chapter_n': '15'}

As you can see from the example, the function can work ignoring units. This functionality is sufficient for the opensearch indexing.

For ManGO schemas, however, we want to use the units to keep track of repeatable composite fields. In order to achieve that, we just have to also provide the unit and set the use_units argument to True.

The unpack_metadata_to_dict() is a wrapper around this function that always uses units and takes the whole irods.meta.iRODSMeta object as an argument instead of the name, value and units separately.

metadict = {}
for avu in metadata_items:
    md2dict.unpack_metadata_into_dict(metadict, avu)
metadict
{'mgs': {'book': {'author': {'name': [('Fulano De Tal', '1'),
     ('Jane Doe', '2')],
    'age': [('50', '1'), ('29', '2')],
    'pet': [('cat', '1'), ('cat', '2'), ('parrot', '2')]},
   'title': 'A random book title'}},
 'mg': {'mime_type': 'text/plain'},
 'page_n': ('567', 'analysis/reading'),
 'chapter_n': ('15', 'analysis/reading')}

Now items with units are rendered as tuples of values and units, but these are not interpreted in the context of ManGO. This is why this approach is the “naïve” one: in order to reorganize this dictionary into something that makes sense given how ManGO uses schemas and units, we need to use another function.

ManGO-specific conversion

The convert_metadata_to_dict() function takes an iterable of irods.meta.iRODSMeta instances and returns a nested dictionary based on the namespacing of the metadata names as well as the units. It works upon the result of unpack_metadata_into_dict() and then reformats the dictionary to group all metadata schemas under the “schemas” key (instead of “mgs”) and to group all items with units starting with “analysis/” under the “analysis” key. In addition, the repeatable composite fields of schemas are reorganized properly based on their units.

reorganized_dict = md2dict.convert_metadata_to_dict(metadata_items)
reorganized_dict
{'schema': {'book': {'author': [{'age': '50',
     'name': 'Fulano De Tal',
     'pet': 'cat'},
    {'age': '29', 'name': 'Jane Doe', 'pet': ['cat', 'parrot']}],
   'title': 'A random book title'}},
 'mg': {'mime_type': 'text/plain'},
 'analysis': {'reading': {'page_n': '567', 'chapter_n': '15'}}}

This function is to be used when converting ManGO metadata into a dictionary, in order to export it to a sidecar file, for downloading, or in the context of cold storage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mango_mdconverter-0.0.9.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mango_mdconverter-0.0.9-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file mango_mdconverter-0.0.9.tar.gz.

File metadata

  • Download URL: mango_mdconverter-0.0.9.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mango_mdconverter-0.0.9.tar.gz
Algorithm Hash digest
SHA256 3ac57eb40e2faafe4dd4e34de08ec5536649ebca641e5f5b56a257a838570b8d
MD5 574b31e64ad0c13d95473b8b99c7fb0f
BLAKE2b-256 171b4089d5f40480ecc219d24fcc2f34dcef137909116782ee79062d7fd32b0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mango_mdconverter-0.0.9.tar.gz:

Publisher: python-publish.yml on kuleuven/mango-mdconverter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mango_mdconverter-0.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for mango_mdconverter-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ea5f79d7680545de900ca5bf43b7d1068c6aef5705e85d2e3670b610ce15e485
MD5 d9e5ecb26a4d2905fa8da4984ffd92c1
BLAKE2b-256 78b6df5a39c7fad03b70f042c8d39b2c6e1e87b13b3ceb04eb193b57e6133966

See more details on using hashes here.

Provenance

The following attestation bundles were made for mango_mdconverter-0.0.9-py3-none-any.whl:

Publisher: python-publish.yml on kuleuven/mango-mdconverter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page