Skip to main content

Python reader, writer and validator for Meta yAML (MAML).

Project description

pymaml

Official python package for reading, writing, and parsing the Meta yAML format.

(see also https://github.com/asgr/MAML)

MAML

MAML is a YAML based metadata format for tabular data (roughly implying Metadata yAML). This package is the official python interface to help read, write, and parse MAML files.

Why MAML?

We have VOTable and FITS header already?! Well, for various projects we were keen on a rich metadata format that was easy for humans and computers to both read and write. VOTable headers are very hard for humans to read and write, and FITS is very restrictive with its formatting and only useful for FITS files directly. In comes YAML, a very human and machine-readable and writable format. By restricting ourselves to a narrow subset of the language we can easily describe fairly complex table metadata (including all IVOA information). Introducing Meta yAML (MAML).

The MAML format files should be saved as example.maml etc. And the idea is the maml string can be inserted directly into a number of different file formats that accept key-value metadata (like Apache Arrow Parquet files). In the case of Parquet files they should be written to a 'maml' extension in the metadata section of the file.

MAML Metadata Format

The MAML metadata format is a structured way to describe datasets, surveys, and tables using YAML. This format ensures that all necessary information about the data is captured in a clear and organized manner.

Structure

The superset of allowed entries for MAML is below. Not all are required, but if present they should obey the order and naming.

  • survey: The name of the survey. Scalar string. [optional]
  • dataset: The name of the dataset. Scalar string. [recommended]
  • table: The name of the table. Scalar string. [required]
  • version: The version of the dataset. Scalar string, integer or float. [required]
  • date: The date of the dataset in YYYY-MM-DD format. Scalar string. [required]
  • author: The lead author of the dataset, including their email. Scalar string. [required]
  • coauthors: A list of co-authors, each with their email. Vector string. [optional]
  • depends: A list of datasets that this dataset depends on. Vector string. [optional]
  • description: A sentence or two describing the table. Scalar string. [recommended]
  • comments: A list of comments or interesting facts about the data. Vector string. [optional]
  • fields: A list of fields in the dataset, each with the following attributes: [required]
    • name: The name of the field. Scalar string. [required]
    • unit: The unit of measurement for the field (if applicable). Scalar string. [recommended]
    • info: A short description of the field. Scalar string. [recommended]
    • ucd: Unified Content Descriptor for IVOA (can have many). Vector string. [recommended]
    • data_type: The data type of the field (e.g., int32, string, bool, double). Scalar string. [required]
    • array_size: Maximum length of character strings. Scalar integer or Scalar string. [optional]

This metadata format can be used to document datasets in a standardised way, making it easier to understand and share data within the research community. By following this format, you ensure that all relevant information about the dataset is captured and easily accessible.

This format contains the superset of metadata requirements for IVOA, Data Central and surveys like GAMA and WAVES.

If producing a maximal MAML then the metadata can be considered a MAML-Whale, and if only containing the required minimum entries it would be a MAML-Mouse. Between these two extremes you can choose your mammal of interest to reflect the quality/quantity of metadata. The sweet spot is obviously a MAML-Honey-Badger.

pymaml

Installation

pymaml can be installed easily with pip

pip install pymaml

Creating a new .maml file.

Reading in a .maml file.

Reading a maml file is easily done using the MAML object in pymaml. Reading it in this way will include validation "for free".

from pymaml import MAML
new_maml = MAML.from_file("example.maml")

This MAML object will only be created if all the the required fields are present in the maml file.

Validating a .maml file.

The pymaml package has a validate function that will audit a .maml file and return weather or not that file is valid as well as describe why it isnt valid and any warnigns that the users might wish to consider.

from pymaml import validate
validate("example.maml")

Creating a new maml file

The MAML object is the core object for building and writing maml formats and will do all validation. Using this method guarantees that the maml written is valid maml including ucd checking and date formats.

At the very least, a table name, author name, and at least one Field need to be passed:

from pymaml.maml import MAML, Field
new_maml = MAML(table="New table Name", Author="Me, myself, and I", fields = [Field(name='ra', data_type='float')])

The Field object is the main way to build new fields and will also force checks to make sure that the fields are valid.

For convience, a default maml construction can be built quickly with the class method .default()

from pymaml import MAML
default_maml = MAML.default()

Values can be updated in the normal way in python classes. Or, for convience, several setter methods are available to use including add_comment(), add_field(), and set_date()

from pymaml import MAML
maml = MAML.default()

maml.set_date("2025-01-02")
maml.add_field(Field(name="Declination", ucd="pos.eq.dec", data_type="float"))
maml.add_comment("This is an easy way to add a comment to the existing maml.")

Once the MAML object is built, then it can easily be written to file:

maml.to_file("new_maml.maml")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymaml-0.4.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymaml-0.4.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file pymaml-0.4.0.tar.gz.

File metadata

  • Download URL: pymaml-0.4.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pymaml-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a953f6bc7e7356f323672cd30984091e0b0ca7530421dc7eb479f924167082ff
MD5 7efc887d2794c6180ed60b65bae9d18a
BLAKE2b-256 8c39fd8866105e0afa0f8ed35119701a06fc27e0b61e4025a1e36c85e755af1c

See more details on using hashes here.

File details

Details for the file pymaml-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pymaml-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pymaml-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 372ba651a25bed52ebfb2a0d85bd2192630f65e1fd41d3f18517333cbf04aacb
MD5 1886ca8194657c2ac45d39d8985eea3b
BLAKE2b-256 a3ad8df659dab492d14bcb8fdf45cf9e1311120eea2e3d6894c80f16dd459eda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page