Python utility for validating IDS

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

TetraScience IDS Validator

Overview
Usage
Components
Changelog
- v0.9.16
- v0.9.15
- v0.9.14
- v0.9.13
- v0.9.12
- v0.9.11
- v0.9.10
- v0.9.9
- v0.9.8
- v0.9.7
- v0.9.6

Overview

The TetraScience IDS Validator checks that IDS artifacts follow a set of rules which make them compatible with the Tetra Data Platform, and optionally that they are compatible with with additional IDS design conventions. The validator either passes or fails with a list of the checks which led to the failure.

The validator checks these files in an IDS folder:

schema.json
elasticsearch.json
athena.json

You can find the validation rules in:

Version

v0.9.16

Usage

poetry run python -m ids_validator --ids_dir=path/to/ids/folder

This will run the required checks for the @idsConventionVersion mentioned in schema.json.

If @idsConventionVersion is missing in schema.json or if it is not supported by schema_validator, only generic checks will be run.

Components

Node

Node: UserDict class is an abstraction for dict in schema.json
When crawling schema.json, each key-value pair where value is a dict, is casted into Node type.
For each K_V pair, Node has following attributes
- name (default=root): The key
- data: The value:dict
- path (default=root): The fully-qualified path for the key in schema.json
File: ids_node.py

Checker Classes

A checker class must implement AbstractChecker
When crawling schema.json, its run() method will be called for each node.
run() implements the rules/condition to be checked for validating the node.
run() accepts two arguments:
- node: Node: Node for which we are running the checks
- context: dict
  - It contains python dicts for schema.json, athena.json and convention_version.
  - It is used to supplementary data required for running complex checks.

Validator

Validator class is the one that implements the crawler.
It has following attributes:
- ids: dict: schema.json converted to python dict
- athena: dict: athena.json converted to python dict
- checks_list: A list of instantiated checker classes. These list of checks will be run for each node
Validator.traverse_ids() crawls from Node to Node in ids:dict, Calling run() for each checker in the checks_list on the node

List of Checker Classes

Base Classes

AbstractChecker
- Every checker class must implement it.
- File: abstract_checker.py
RuleBasedChecker
- It is base class that allows validating Node against a set of rules
- It comes in handy for implementing checks for property Nodes that has predefined template
- The child class inheriting RulesBasedChecker must define rules
- rules is a dict that maps Node.path to set of rules:dict
- The set of rules for a Node.path may contain following items:
  - "type", Union[List[str], str]: defines what should be the type value for the Node
  - "compatible_type", ids_validator.checks.rules_checker.BackwardCompatibleType: defines the allowable type values for a Node, matching either a preferred type, or one of a list of deprecated types which will lead to a warning.
  - "min_properties", List[str]: defines minimum set of property names that must exist for the Node. More properties can exist in addition to min_properties
  - "min_required", List[str]: The required list of the Node must at least contain the values mentioned in min_required
  - "required", List[str]: The required list of the Node must only contain values listed in required
Rules based checkers defined for v1 conventions can be found here

Generic

AdditionalPropertyChecker: additional_property.py
RequiredPropertiesChecker: required_property.py
DatacubesChecker: datacubes.py
RootNodeChecker: root_node.py
TypeChecker: type_check.py
AthenaChecker: athena.py

V1

V1ChildNameChecker: child_name.py
V1ConventionVersionChecker: convention_version_check.py
V1SystemNodeChecker: nodes_checker.py
V1SampleNodeChecker: nodes_checker.py
V1UserNodeChecker: nodes_checker.py
V1RootNodeChecker: root_node.py
V1SnakeCaseChecker: snake_case.py

Writing New Checks

Checkers must implement AbstractCheckers
run() method implement one or more checks for the node
In case of no failure an empty list must be returned
In case of failures, it must return a list of one or more tuple
The tuple will contain two values
- log message:str: The message to be logged when check fails
- criticality: either Log.CRITICAL or Log.WARNING

Extending Checkers Classes

Pattern 1

class ChildChecker(ParentChecker):
    def run(node: Node, context: dict):
        logs = []
        # Implement new checks and append failure to logs

        # Run Parent checkers and append logs
        logs += super().run(node, context)
        return logs

If check_list passed to Validator contains the ChildChecker, then it must not contain ParentChecker in the same list. Doing so will cause ParentCheck to run twice and populate failures logs if any, twice.

Pattern 2

class ChildChecker(ParentChecker):
    def run(node: Node, context: dict):
        logs = []
        # Implement new checks and append failure to logs
        # use or override helper function of the parent class
        return logs

Running Checks for Specific Nodes

class AdhocChecker(AbstractChecker):
    def run(node: Node, context: dict):
        logs = []
        paths = []
        # paths is a list of fully qualified path to a key in schema.json
        # each path must start form root
        # eg: root.samples
        # eg: root.samples.items.properties.property_name
        if node.path in paths:
            # Implement new checks and append failure to logs
            logs += perform_new_checks(node, context)
        return logs

List of Checks for Validator

checks_dict, defined here, maps the type of validation that we want to perform to the list the of checks needed to be run for the validation
The list off checks is actually a list of instantiated checker objects

Changelog

v0.9.16

Remove the upper bound of what properties samples may contain for Tetra Data validation. This means the samples schema can now include properties other than the ones in the samples Tetra Data component, such as primary and foreign key fields.

v0.9.15

Limit version of typing-extensions in dependencies to avoid a bug which causes the validator to always fail in Python 3.10 or later.

v0.9.14

Update samples[*] check to optionally allow for it to contain a property pk_samples of type "string".

v0.9.13

related_files is no longer checked against annotation fields like "description".

v0.9.12

Update check for samples[*].labels[*].source.name type: previously the type was required to be "string", now it is required to be either ["string", "null"] or "string", with "string" leading to a deprecation warning. This change makes this source definition the same as samples[*].properties[*].source in a backward-compatible way.

v0.9.11

Fix bug in AthenaChecker to allow root level IDS properties as partition paths.
Update TypeChecker to catch errors related to undefined/misspelled type key.
Update jsonschema version to fix package installation error

v0.9.10

Modify V1SnakeCaseChecker to ignore checks for keys present in definitions object.
Add temporary allowance for @link in *.properties

v0.9.9

Lock jsonschema version in requirements.txt

v0.9.8

Modify RulesChecker to log missing and extra properties

v0.9.7

Allow properties with const values to have non-nullable type

v0.9.6

Add checker classes for generic validation
Add checker classes for v1.0.0 convention validation

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

0.10.5

Mar 8, 2024

0.10.4

Feb 7, 2024

0.10.3

Feb 2, 2024

0.10.2

Dec 13, 2023

0.10.1

Dec 12, 2023

0.10.0

Dec 11, 2023

This version

0.9.16

Jun 13, 2023

0.9.15

May 23, 2023

0.9.14

Jan 23, 2023

0.9.13

Dec 7, 2022

0.9.12

May 4, 2022

0.9.11

Feb 23, 2022

0.9.10

Feb 8, 2022

0.9.9

Jan 7, 2022

0.9.8

Dec 14, 2021

0.9.7

Dec 14, 2021

0.9.6

Nov 23, 2021

0.9.5

Nov 17, 2021

0.9.4

Nov 10, 2021

0.9.3

Nov 8, 2021

0.9.2

Nov 8, 2021

0.9.1

Nov 8, 2021

0.9.0

Nov 5, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ts_ids_validator-0.9.16.tar.gz (27.2 kB view hashes)

Uploaded Jun 13, 2023 Source

Built Distribution

ts_ids_validator-0.9.16-py3-none-any.whl (35.6 kB view hashes)

Uploaded Jun 13, 2023 Python 3

Hashes for ts_ids_validator-0.9.16.tar.gz

Hashes for ts_ids_validator-0.9.16.tar.gz
Algorithm	Hash digest
SHA256	`c4078466b02b22197e74e143c9073dd4639ef3d114b0620713008eade822ed62`
MD5	`6854a1b1e64888a1513dd8118b2b54b9`
BLAKE2b-256	`ba8a28993bc1d9255cf10995a71f94c60ff88ed7504de923132b35825263db93`

Hashes for ts_ids_validator-0.9.16-py3-none-any.whl

Hashes for ts_ids_validator-0.9.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ccec1ce15de06a6b978975771a175ae4e786a86b0d5fbc0d2187bb5ad8545915`
MD5	`8ca42960e14570ea88073a461aa1b1b1`
BLAKE2b-256	`b3cdd357a99a0a215318606961b4917e20b40bbdac071ffe1ebcb28e54914785`

ts-ids-validator 0.9.16

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

TetraScience IDS Validator

Table of Contents

Overview

Version

Usage

Components

Node

Checker Classes

Validator

List of Checker Classes

Base Classes

Generic

V1

Writing New Checks

Extending Checkers Classes

Pattern 1

Pattern 2

Running Checks for Specific Nodes

List of Checks for Validator

Changelog

v0.9.16

v0.9.15

v0.9.14

v0.9.13

v0.9.12

v0.9.11

v0.9.10

v0.9.9

v0.9.8

v0.9.7

v0.9.6

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution