automated metadata validation for ONS metadata templates

These details have been verified by PyPI

Project links

homepage

GitHub Statistics

Maintainers

data_engineering_ons

Reason this release was yanked:

broken

Project description

ONS metadata validation tool

Background

This project is for automatically validating metadata templates that accompany IDS data deliveries. The fields in a filled metadata template are each checked against a set of defined conditions.

For example, many fields are mandatory; many have a maximum number of characters; some fields are not allowed to contain spaces or special characters; and so on.

A metadata template with missing mandatory values or other issues with its format or content will prevent the accompanying dataset from being ingested. This then requires the back-and-forth of resubmission, and causes delays.

What it does

This tool produces an excel report detailing failed validation checks for a given metadata template. There are also two optional outputs:

A commented version of the input file, where cells with validation issues are highlighted and a mouseover note names each check that has failed.
An edited version of the input file, where cells with easily-fixed issues such as missing full stops or trailing whitespace have been automatically updated.

Note that some metadata requirements cannot be programmatically validated. Some human inspection will always be necessary, for example to sense-check free text fields.

The tool is designed to work with metadata templates from v2.0 onwards. When pointed at a v2.1 file, it should identify the version and update its expectations of the form's format without requiring specific input from the user.

Documentation

This readme is written for the benefit of end users, such as the CMAR team. It aims to make minimal assuptions about previous experience.

More technical documentation for future developers and maintainers can be found in the documentation folder.

Future versions of this tool will increase the supporting information available in the validation output report, for example by listing the exact set of checks run for each variable.

Contact

metadata.validation.tool@ons.gov.uk

Please contact us if you wish to report any issues or bugs, or to request features. Which tables of the output report were most useful? Did you prefer the aggregated tables, or the commented version with individual cells highlighted?

Also, we cannot currently guarantee that there will be no false positives or false negatives in the output, so your feedback is very valuable!

Please also contact us if you are using this tool and haven't yet spoken to us, the developers. We wish to keep in contact with our community of users.

Using the tool

Installation

The commands below are for use in a command prompt terminal, such as Anaconda Powershell.

To install this package: pip3 install ons-metadata-validation

Basic usage

Default settings have been set so that a general non-technical user will not often need to specify optional parameters.

The only parameter that must be specified each time to tool is used is the location of the filled metadata template to validate. This can be specified as an absolute or relative path.

Thus, to use as CMAR with all default settings: python3 -m ons_metadata_validation "path/to/file.xlsx"

This will produce an excel file reporting on failed validation checks. It will be saved in the same folder as the input file.

Note that the ability to process all metadata templates in a specified folder is planned for a future release.

Optional configurations

Optional parameters always come after the filename when calling the command.

variable_check_set

This tool is designed for users of at various pipeline stages and in various contexts. Some template variables are populated later, and therefore might not exist yet for upstream users. This parameter is used to select the appropriate set of variables to check.

default: "cmar"
choices: ["cmar", "full"]

Example: python3 -m ons_metadata_validation "path/to/file.xlsx" variable_check_set="full"

save_report

Whether or not to save the output report.

default: True
choices: True, False

Example: python3 -m ons_metadata_validation "path/to/file.xlsx" save_report=False

save_commented_copy

Whether or not to save a copy of the metadata template with invalid cells highlighted and commented. Please note that you must then update and resubmit the original file - do not edit and submit this copy!

default: True
choices: True, False

Example: python3 -m ons_metadata_validation "path/to/file.xlsx" save_commented_copy=True

save_corrected_copy

Some simple validation issues, such as missing full stops, double spaces, or trailing whitespace, can be fixed programmatically. Setting this parameter to True will save an edited copy of the original file.

default: True
choices: True, False

Example: python3 -m ons_metadata_validation "path/to/file.xlsx" save_corrected_copy=True

destination_folder

By default, all outputs are saved in the same folder as the input file. However, you can specify a different location if you wish. This is only for specifying the output folder; the names of individual outputs combine the input file's name with a description indicating the type of output.

default: None

Example: python3 -m ons_metadata_validation "path/to/file.xlsx" destination_folder="some/other/directory"

Reading the output report

Types of checks

Validation checks are considered to be "hard", "soft", or "comparative".

Hard checks are conditions that can be conclusively measured automatically. Failing a hard check means that something is definitely wrong and needs changing. This also means that hard check fails will usually also cause an ingest failure if untreated, since the ingest process also has fixed expectations about machine-readable content and formats.

Soft checks are checks that require inspection, but not necessarily action, if they fail. Either they relate to style recommendations that aren't strict requirements, or they involve checking something that can't be perfectly measured automatically. For example, we may expect a certain style of response most of the time, but there may be corner cases where unusual answers are still acceptable and correct.

Comparative checks involve more than one cell value at a time. For example, a column of table names might require that each name be unique within that column. Or, for consistency, a table name appearing on one sheet might be required to also appear on a list of tables from a previous sheet.

Output tables

Sheet	Variables	Description
Short % overview	All mandatory	each row is a tab & variable name combination; columns list the % of records that are missing or failed at least one check.
Long % overview	All mandatory	each row is a tab & variable name combination; columns are fail %s for every check.
Missing values	All mandatory	each row details the cells with missing values for a single variable.
Fails by cell	All mandatory	each row details the names of all hard and soft checks failed by a single cell.
Fails by check	All mandatory	each row details the cells of a single variable that have failed a particular hard or soft check.
Fails by value	All mandatory	each row details a value appearing in a variable, all the cells that value appears in, and all the hard and soft checks that value fails. NOTE: this view is experimental and has some known bugs with cell ranges.
Comparative checks	Comparative only	each row details one instance of a failed comparative check.
Non mandatory fails by cell	Non-mandatory only	each row details the names of all hard and soft checks failed by a single cell, including missing values. Non-mandatory variables only.

Project details

These details have been verified by PyPI

Project links

homepage

GitHub Statistics

Maintainers

data_engineering_ons

Release history Release notifications | RSS feed

1.2.10

Mar 23, 2026

1.2.9

Feb 16, 2026

1.2.8

Oct 9, 2025

1.2.7

Jul 18, 2025

1.2.6

Jun 6, 2025

1.2.5

May 22, 2025

1.2.4

May 9, 2025

1.2.3

May 6, 2025

1.2.2

Feb 25, 2025

1.2.1

Jan 27, 2025

1.2.0

Jan 23, 2025

1.1.1

Jan 21, 2025

1.1.0

Jan 13, 2025

1.0.1

Dec 19, 2024

1.0.0

Dec 19, 2024

0.1.7 yanked

Oct 29, 2024

0.1.6 yanked

Oct 28, 2024

Reason this release was yanked:

deprecated: use latest version

0.1.5

Sep 27, 2024

0.1.4

Sep 6, 2024

This version

0.1.3 yanked

Sep 6, 2024

Reason this release was yanked:

broken

0.1.2

Jul 16, 2024

0.1.1

Jul 16, 2024

0.1.0 yanked

Jul 12, 2024

Reason this release was yanked:

deprecated: use latest version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ons_metadata_validation-0.1.3.tar.gz (55.6 kB view details)

Uploaded Sep 6, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ons_metadata_validation-0.1.3-py3-none-any.whl (65.3 kB view details)

Uploaded Sep 6, 2024 Python 3

File details

Details for the file ons_metadata_validation-0.1.3.tar.gz.

File metadata

Download URL: ons_metadata_validation-0.1.3.tar.gz
Upload date: Sep 6, 2024
Size: 55.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for ons_metadata_validation-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f7ef17230ae25ad7b80db3fcbe7b5cdc7bf3474dbf06ebbdd25cac623f098306`
MD5	`fabb5f8a1450f672dec6ebe7e5593dad`
BLAKE2b-256	`7d9d3c0386c94f422b9fa174e9ac4973c3575e2548a964a830bd0d3d365b96d0`

See more details on using hashes here.

File details

Details for the file ons_metadata_validation-0.1.3-py3-none-any.whl.

File metadata

Download URL: ons_metadata_validation-0.1.3-py3-none-any.whl
Upload date: Sep 6, 2024
Size: 65.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for ons_metadata_validation-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13b45810cc3ed67535d444e2e4497ac6895fdb11f367096b32597a283c580170`
MD5	`2b46d10120a5f7910259125639cf496e`
BLAKE2b-256	`830fc85eac546c7b7617e319ddc542314de0e1dad60695bd807674ad21d01985`

See more details on using hashes here.

ons-metadata-validation 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

ONS metadata validation tool

Background

What it does

Documentation

Contact

Using the tool

Installation

Basic usage

Optional configurations

variable_check_set

save_report

save_commented_copy

save_corrected_copy

destination_folder

Reading the output report

Types of checks

Output tables

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes