Python utility for validating IDS
Project description
TetraScience IDS Validator
Table of Contents
Overview
TetraScience IDS Validator
- Each validation check should lead to a pass or fail
- Find as many failures as possible before terminating the validator, to make it easier to fix what’s wrong.
- Take definitions into account by using the "jsonref" library
The validator will validate these files in an IDS folder:
- schema.json
- elasticsearch.json
- athena.json (upcoming)
You can find the validation rules in:
- IDS Design Conventions - schema.json
- IDS Design Conventions - elasticsearch.json
- IDS Design Conventions - athena.json
Usage
pipenv run python -m ids_validator --ids_dir=path/to/ids/folder
This will run the required checks for the @idsConventionVersion
mentioned in schema.json
.
If @idsConventionVersion
is missing in schema.json
or if it is not supported by schema_validator
, only generic
checks will be run.
Components
Node
Node: UserDict
class is an abstraction fordict
inschema.json
- When crawling
schema.json
, eachkey-value
pair wherevalue
is adict
, is casted intoNode
type. - For each K_V pair,
Node
has following attributesname (default=root)
: Thekey
data
: Thevalue:dict
path (default=root)
: The fully-qualified path for thekey
inschema.json
- File: ids_node.py
Checker Classes
- A checker class must implement
AbstractChecker
- When crawling
schema.json
, itsrun()
method will be called for each node. run()
implements the rules/condition to be checked for validating the node.run()
accepts two arguments:node: Node
:Node
for which we are running the checkscontext: dict
- It contains python dicts for
schema.json
,athena.json
andconvention_version
. - It is used to supplementary data required for running complex checks.
- It contains python dicts for
Validator
Validator
class is the one that implements the crawler.- It has following attributes:
ids: dict
:schema.json
converted to pythondict
athena: dict
:athena.json
converted to pythondict
checks_list
: A list of instantiated checker classes. These list of checks will be run for each node
Validator.traverse_ids()
crawls fromNode
toNode
inids:dict
, Callingrun()
for each checker in the checks_list on the node
List of Checker Classes
Base Classes
-
AbstractChecker
- Every checker class must implement it.
- File: abstract_checker.py
-
RuleBasedChecker
- It is base class that allows validating
Node
against a set ofrules
- It comes in handy for implementing checks for property Nodes that has predefined template
- The child class inheriting
RulesBasedChecker
must definerules
rules
is adict
that mapsNode.path
toset of rules:dict
- The
set of rules
for aNode.path
may contain following keys:type: str
: defines what should be thetype
value for theNode
min_properties: list
: defines minimum set of property names, that must exist for the Node. More properties can exist in addition tomin_properties
properties: list
: defines a set of property names that must must exactly match the property list of theNode
min_required: list
: The required list of theNode
must at least contain the values mentioned inmin_required
required: list
: The required list of theNode
must only contain values listed inrequired
- It is base class that allows validating
-
Rules based checkers defined for v1 conventions can be found here
Generic
AdditionalPropertyChecker
: additional_property.pyRequiredPropertiesChecker
: required_property.pyDatacubesChecker
: datacubes.pyRootNodeChecker
: root_node.pyTypeChecker
: type_check.pyAthenaChecker
: athena.py
V1
V1ChildNameChecker
: child_name.pyV1ConventionVersionChecker
: convention_version_check.pyV1SystemNodeChecker
: nodes_checker.pyV1SampleNodeChecker
: nodes_checker.pyV1UserNodeChecker
: nodes_checker.pyV1RootNodeChecker
: root_node.pyV1SnakeCaseChecker
: snake_case.py
Writing New Checks
- Checkers must implement
AbstractCheckers
run()
method implement one or more checks for the node- In case of no failure an empty list must be returned
- In case of failures, it must return a list of one or more tuple
- The tuple will contain two values
log message:str
: The message to be logged when check failscriticality
: eitherLog.CRITICAL
orLog.WARNING
Extending Checkers Classes
Pattern 1
class ChildChecker(ParentChecker):
def run(node: Node, context: dict):
logs = []
# Implement new checks and append failure to logs
# Run Parent checkers and append logs
logs += super().run(node, context)
return logs
If check_list
passed to Validator
contains the ChildChecker
, then it must not contain ParentChecker
in the same list.
Doing so will cause ParentCheck to run twice and populate failures logs if any, twice.
TODO: Instead of return logs
, we can return set(logs)
to remove duplicates, but we cannot avoid executing same code twice
Pattern 2
class ChildChecker(ParentChecker):
def run(node: Node, context: dict):
logs = []
# Implement new checks and append failure to logs
# use or override helper function of the parent class
return logs
Running Checks for Specific Nodes
class AdhocChecker(AbstractChecker):
def run(node: Node, context: dict):
logs = []
paths = []
# paths is a list of fully qualified path to a key in schema.json
# each path must start form root
# eg: root.samples
# eg: root.samples.items.properties.property_name
if node.path in paths:
# Implement new checks and append failure to logs
logs += perform_new_checks(node, context)
return logs
List of Checks for Validator
checks_dict
, defined here, maps thetype of validation
that we want to perform to thelist the of checks
needed to be run for the validation- The list off checks is actually a list of instantiated checker objects
Changelog
v0.9.11
- Fix bug in
AthenaChecker
to allow root level IDS properties as partition paths. - Update
TypeChecker
to catch errors related to undefined/misspelledtype
key. - Update
jsonschema
version to fix package installation error
v0.9.10
- Modify
V1SnakeCaseChecker
to ignore checks for keys present indefinitions
object. - Add temporary allowance for
@link
in*.properties
v0.9.9
- Lock
jsonschema
version in requirements.txt
v0.9.8
- Modify
RulesChecker
to log missing and extra properties
v0.9.7
- Allow properties with
const
values to have non-nullabletype
v0.9.6
- Add checker classes for generic validation
- Add checker classes for v1.0.0 convention validation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ts_ids_validator-0.9.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6308f9dbec470c4ca54bcf25dd6d4e32c3c81e258a187ed0d6fa9f3ca4153791 |
|
MD5 | 10519b5d06c67fa41efe640e5971b732 |
|
BLAKE2b-256 | 5d0c91b8253b6f12cd00d0c23a4f0fc684bf9cc1c20d56cfe6bc9e1640fb0b78 |