Skip to main content

Python reader, writer and validator for Meta yAML (MAML).

Project description

pymaml

Official python package for reading, writing, and parsing the Meta yAML format.

(see also https://github.com/asgr/MAML)

MAML

MAML is a YAML based metadata format for tabular data (roughly implying Metadata yAML). This package is the official python interface to help read, write, and parse MAML files.

Why MAML?

We have VOTable and FITS header already?! Well, for various projects we were keen on a rich metadata format that was easy for humans and computers to both read and write. VOTable headers are very hard for humans to read and write, and FITS is very restrictive with its formatting and only useful for FITS files directly. In comes YAML, a very human and machine-readable and writable format. By restricting ourselves to a narrow subset of the language we can easily describe fairly complex table metadata (including all IVOA information). Introducing Meta yAML (MAML).

The MAML format files should be saved as example.maml etc. And the idea is the maml string can be inserted directly into a number of different file formats that accept key-value metadata (like Apache Arrow Parquet files). In the case of Parquet files they should be written to a 'maml' extension in the metadata section of the file.

MAML Metadata Format

The MAML metadata format is a structured way to describe datasets, surveys, and tables using YAML. This format ensures that all necessary information about the data is captured in a clear and organized manner.

Structure

The superset of allowed entries for MAML is below. Not all are required, but if present they should obey the order and naming.

  • survey: The name of the survey. Scalar string. [optional]
  • dataset: The name of the dataset. Scalar string. [recommended]
  • table: The name of the table. Scalar string. [required]
  • version: The version of the dataset. Scalar string, integer or float. [required]
  • date: The date of the dataset in YYYY-MM-DD format. Scalar string. [required]
  • author: The lead author of the dataset, including their email. Scalar string. [required]
  • coauthors: A list of co-authors, each with their email. Vector string. [optional]
  • depends: A list of datasets that this dataset depends on. Vector string. [optional]
  • description: A sentence or two describing the table. Scalar string. [recommended]
  • comments: A list of comments or interesting facts about the data. Vector string. [optional]
  • fields: A list of fields in the dataset, each with the following attributes: [required]
    • name: The name of the field. Scalar string. [required]
    • unit: The unit of measurement for the field (if applicable). Scalar string. [recommended]
    • info: A short description of the field. Scalar string. [recommended]
    • ucd: Unified Content Descriptor for IVOA (can have many). Vector string. [recommended]
    • data_type: The data type of the field (e.g., int32, string, bool, double). Scalar string. [required]
    • array_size: Maximum length of character strings. Scalar integer or Scalar string. [optional]

This metadata format can be used to document datasets in a standardised way, making it easier to understand and share data within the research community. By following this format, you ensure that all relevant information about the dataset is captured and easily accessible.

This format contains the superset of metadata requirements for IVOA, Data Central and surveys like GAMA and WAVES.

If producing a maximal MAML then the metadata can be considered a MAML-Whale, and if only containing the required minimum entries it would be a MAML-Mouse. Between these two extremes you can choose your mammal of interest to reflect the quality/quantity of metadata. The sweet spot is obviously a MAML-Honey-Badger.

pymaml

Installation

pymaml can be installed easily with pip

pip install pymaml

Creating a new .maml file.

Reading in a .maml file.

Reading a maml file is easily done using the MAML object in pymaml. Reading it in this way will include validation "for free".

from pymaml import MAML
new_maml = MAML.from_file("example.maml")

This MAML object will only be created if all the the required fields are present in the maml file.

Validating a .maml file.

The pymaml package has a validate function that will audit a .maml file and return weather or not that file is valid as well as describe why it isnt valid and any warnigns that the users might wish to consider.

from pymaml import validate
validate("example.maml")

Creating a new maml file

The MAML object is the core object for building and writing maml formats and will do all validation. Using this method guarantees that the maml written is valid maml including ucd checking and date formats.

At the very least, a table name, author name, and at least one Field need to be passed:

from pymaml.maml import MAML, Field
new_maml = MAML(table="New table Name", Author="Me, myself, and I", fields = [Field(name='ra', data_type='float')])

The Field object is the main way to build new fields and will also force checks to make sure that the fields are valid.

For convience, a default maml construction can be built quickly with the class method .default()

from pymaml import MAML
default_maml = MAML.default()

Values can be updated in the normal way in python classes. Or, for convience, several setter methods are available to use including add_comment(), add_field(), and set_date()

from pymaml import MAML
maml = MAML.default()

maml.set_date("2025-01-02")
maml.add_field(Field(name="Declination", ucd="pos.eq.dec", data_type="float"))
maml.add_comment("This is an easy way to add a comment to the existing maml.")

Once the MAML object is built, then it can easily be written to file:

maml.to_file("new_maml.maml")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymaml-0.5.2.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymaml-0.5.2-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file pymaml-0.5.2.tar.gz.

File metadata

  • Download URL: pymaml-0.5.2.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymaml-0.5.2.tar.gz
Algorithm Hash digest
SHA256 9aa721f88a50b49a6952c97a35bd5547c6e2b58a631f93abfacdcbd9b43282c7
MD5 9c99d67f216c4410f7198b56f8d75d2f
BLAKE2b-256 0264650c482ec3b52f3d71865424099f52af0492471f17feb844692f30dab07f

See more details on using hashes here.

File details

Details for the file pymaml-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: pymaml-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pymaml-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ca424c439ae1eac1baabe16473bc98b6694d8b45adeba788ad846f10906093ca
MD5 303bd8d70deaab5b0207bf45f716e528
BLAKE2b-256 9f4adfecd8de436f57a3151416efbe68ab7aa1b999d2074a477b52d1d728981f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page