Python utility for validating IDS
Project description
TetraScience IDS Validator
Table of Contents
Overview
The TetraScience IDS Validator checks that IDS artifacts follow a set of rules which make them compatible with the Tetra Data Platform, and optionally that they are compatible with with additional IDS design conventions. The validator either passes or fails with a list of the checks which led to the failure.
The validator checks these files in an IDS folder:
- schema.json
- elasticsearch.json
- athena.json
You can find the validation rules in:
- IDS Design Conventions - schema.json
- IDS Design Conventions - elasticsearch.json
- IDS Design Conventions - athena.json
Version
v0.9.16
Usage
poetry run python -m ids_validator --ids_dir=path/to/ids/folder
This will run the required checks for the @idsConventionVersion
mentioned in schema.json
.
If @idsConventionVersion
is missing in schema.json
or if it is not supported by schema_validator
, only generic
checks will be run.
Components
Node
Node: UserDict
class is an abstraction fordict
inschema.json
- When crawling
schema.json
, eachkey-value
pair wherevalue
is adict
, is casted intoNode
type. - For each K_V pair,
Node
has following attributesname (default=root)
: Thekey
data
: Thevalue:dict
path (default=root)
: The fully-qualified path for thekey
inschema.json
- File: ids_node.py
Checker Classes
- A checker class must implement
AbstractChecker
- When crawling
schema.json
, itsrun()
method will be called for each node. run()
implements the rules/condition to be checked for validating the node.run()
accepts two arguments:node: Node
:Node
for which we are running the checkscontext: dict
- It contains python dicts for
schema.json
,athena.json
andconvention_version
. - It is used to supplementary data required for running complex checks.
- It contains python dicts for
Validator
Validator
class is the one that implements the crawler.- It has following attributes:
ids: dict
:schema.json
converted to pythondict
athena: dict
:athena.json
converted to pythondict
checks_list
: A list of instantiated checker classes. These list of checks will be run for each node
Validator.traverse_ids()
crawls fromNode
toNode
inids:dict
, Callingrun()
for each checker in the checks_list on the node
List of Checker Classes
Base Classes
-
AbstractChecker
- Every checker class must implement it.
- File: abstract_checker.py
-
RuleBasedChecker
- It is base class that allows validating
Node
against a set ofrules
- It comes in handy for implementing checks for property Nodes that has predefined template
- The child class inheriting
RulesBasedChecker
must definerules
rules
is adict
that mapsNode.path
toset of rules:dict
- The
set of rules
for aNode.path
may contain following items:"type"
,Union[List[str], str]
: defines what should be thetype
value for theNode
"compatible_type"
,ids_validator.checks.rules_checker.BackwardCompatibleType
: defines the allowabletype
values for aNode
, matching either apreferred
type, or one of a list ofdeprecated
types which will lead to a warning."min_properties"
,List[str]
: defines minimum set of property names that must exist for the Node. More properties can exist in addition tomin_properties
"min_required"
,List[str]
: The required list of theNode
must at least contain the values mentioned inmin_required
"required"
,List[str]
: The required list of theNode
must only contain values listed inrequired
- It is base class that allows validating
-
Rules based checkers defined for v1 conventions can be found here
Generic
AdditionalPropertyChecker
: additional_property.pyRequiredPropertiesChecker
: required_property.pyDatacubesChecker
: datacubes.pyRootNodeChecker
: root_node.pyTypeChecker
: type_check.pyAthenaChecker
: athena.py
V1
V1ChildNameChecker
: child_name.pyV1ConventionVersionChecker
: convention_version_check.pyV1SystemNodeChecker
: nodes_checker.pyV1SampleNodeChecker
: nodes_checker.pyV1UserNodeChecker
: nodes_checker.pyV1RootNodeChecker
: root_node.pyV1SnakeCaseChecker
: snake_case.py
Writing New Checks
- Checkers must implement
AbstractCheckers
run()
method implement one or more checks for the node- In case of no failure an empty list must be returned
- In case of failures, it must return a list of one or more tuple
- The tuple will contain two values
log message:str
: The message to be logged when check failscriticality
: eitherLog.CRITICAL
orLog.WARNING
Extending Checkers Classes
Pattern 1
class ChildChecker(ParentChecker):
def run(node: Node, context: dict):
logs = []
# Implement new checks and append failure to logs
# Run Parent checkers and append logs
logs += super().run(node, context)
return logs
If check_list
passed to Validator
contains the ChildChecker
, then it must not contain ParentChecker
in the same list.
Doing so will cause ParentCheck to run twice and populate failures logs if any, twice.
Pattern 2
class ChildChecker(ParentChecker):
def run(node: Node, context: dict):
logs = []
# Implement new checks and append failure to logs
# use or override helper function of the parent class
return logs
Running Checks for Specific Nodes
class AdhocChecker(AbstractChecker):
def run(node: Node, context: dict):
logs = []
paths = []
# paths is a list of fully qualified path to a key in schema.json
# each path must start form root
# eg: root.samples
# eg: root.samples.items.properties.property_name
if node.path in paths:
# Implement new checks and append failure to logs
logs += perform_new_checks(node, context)
return logs
List of Checks for Validator
checks_dict
, defined here, maps thetype of validation
that we want to perform to thelist the of checks
needed to be run for the validation- The list off checks is actually a list of instantiated checker objects
Changelog
v0.9.16
- Remove the upper bound of what properties
samples
may contain for Tetra Data validation. This means thesamples
schema can now include properties other than the ones in thesamples
Tetra Data component, such as primary and foreign key fields.
v0.9.15
- Limit version of
typing-extensions
in dependencies to avoid a bug which causes the validator to always fail in Python 3.10 or later.
v0.9.14
- Update
samples[*]
check to optionally allow for it to contain a propertypk_samples
of type"string"
.
v0.9.13
related_files
is no longer checked against annotation fields like "description".
v0.9.12
- Update check for
samples[*].labels[*].source.name
type: previously the type was required to be"string"
, now it is required to be either["string", "null"]
or"string"
, with"string"
leading to a deprecation warning. This change makes thissource
definition the same assamples[*].properties[*].source
in a backward-compatible way.
v0.9.11
- Fix bug in
AthenaChecker
to allow root level IDS properties as partition paths. - Update
TypeChecker
to catch errors related to undefined/misspelledtype
key. - Update
jsonschema
version to fix package installation error
v0.9.10
- Modify
V1SnakeCaseChecker
to ignore checks for keys present indefinitions
object. - Add temporary allowance for
@link
in*.properties
v0.9.9
- Lock
jsonschema
version in requirements.txt
v0.9.8
- Modify
RulesChecker
to log missing and extra properties
v0.9.7
- Allow properties with
const
values to have non-nullabletype
v0.9.6
- Add checker classes for generic validation
- Add checker classes for v1.0.0 convention validation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ts_ids_validator-0.9.16-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccec1ce15de06a6b978975771a175ae4e786a86b0d5fbc0d2187bb5ad8545915 |
|
MD5 | 8ca42960e14570ea88073a461aa1b1b1 |
|
BLAKE2b-256 | b3cdd357a99a0a215318606961b4917e20b40bbdac071ffe1ebcb28e54914785 |