Skip to main content

cd2t validates data structure, data types and values with templates

Project description

cd2t

repository: https://gitlab.com/ko.no/cd2t

Table of Content

Key Features

  • Feature Rich Data Type and Value Validation
  • Unlimited Data Structure: Recursive linking of data types like lists or objects can represent any data structure.
  • Data Structure Nesting: Sub schemas allows you to define repeating data structures only once. Sub schema can unlimited nested. Loops are not allowed.
  • Multi Data Support: Multiple data sources can be check with one schema or many schemas. You can switch schemas during iterating over data sources. Referencing or Autogeneration works across schemas and data sources by using namespaces.
  • Referencing: Referencing can check the uniqueness of values at different positions in the data structure (i.e. lists of objects with ID attribute). It also can enforce a consumer/producer modell. In example, strings at some positions can be collected as producers. Strings at other positions must match one of those produced string. Scope of references can be limited to namespace.
  • Value Autogeneration: Some data types support creation of non-existing values. I.e. unique IDs can be added to the data structure. Uniqueness can be limited to namespaces.
  • Multi Data Support: Multiple data sources can be check with one schema or many schemas. You can switch schemas during iterating over data sources. Referencing or Autogeneration works across schemas and data sources - but can also be limited to current data.
  • Schema Validation: Typos, syntax mistakes or missing required options are reported as SchemaErrors (Exception) during schema loading. Reason and path through schema structured are provided.

Change Log

Version 1.6.2:

  • Fix: Reference false positives, if Validator object changes
  • Fix: Sorting of Validation or Autogeneration findings

Version 1.6.1:

  • Fix: Data Type shortcut mode with classic dictionary schema

Version 1.6.0:

  • Changed: Python 3.9 or higher required
  • New: Data Type Shortcuts - Specify data type with defaults as string only
  • Add: Validator API - Method to get reference findings
  • Changed: Multitype Data Type - Enhanced data type support

Version 1.5.0:

  • New: Integer Data Type - Autogenerate random value
  • New: Integer Data Type - Autogenerate default value
  • New: Improved Reference Findings
  • Changed: All Data Types - Schema and data path adjustment

Version 1.4.0:

  • New: None Data Type added
  • Fix: Consumer namespace space lookup/linking failed, if producer was processed later.

Version 1.3.1:

  • Fix: Schema Data Type - Validation failed

Version 1.3:

  • New: String Data Type - Consumer referencing with namespace lookup for providers support
  • Changed: Splitting Data Validation (Validator Class) and Data Autogeneration (Autogenerator Class)
  • New: 2-Phase Autogeneration
    • Phase 1: Build references for all data sources
    • Phase 2: Autogenerate data based only full reference information
  • Fix: Object Data Type - Attribute dependencies with regex fixed
  • Fix: Object Data Type - Preserve attribute order during autogeneration

Version 1.2:

  • New: Bool Data Type - Validate allowed value
  • New: Findings - Namespace information added
  • Changed/Fix: Validator method name 'change_namespace'
  • Fix: Schema Data Type - Subschema loop false positives
  • Fix: Referencing - Options for scope definition values
  • Fix: Referencing - Namespace local consumer producer linking
  • Fix: Object Data Type - Dependencies false positives

Version 1.1: withdrawn form pypi.org

  • New: Object Data Type - Allow empty dictionaries in validation
  • Changed: List Data Type - Option 'duplicates' renamed to 'allow_duplicates'

Version 1.0: withdrawn form pypi.org

Data Structure Schema

name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> false >
# Shortcuts are Data Type strings instead of dictionaries.
# Corresponding data type with default options is used.

root: { data type schema }

subschemas:
  < sub schema name >: { data type schema }

Data Type Options

Any Data Type

Description
This data type represents any data. The validator stops further data validation or autogeneration.

Schema Path Sympbol: *

Limitations

  • referencing is not supported
  • autogeneration is not supported

Any Schema Keys

type: 'any' # If type is omitted, validator uses Any Data Type

Bool Data Type

Description
This data type represents a boolean values (true/false).

Schema Path Sympbol: !

Limitations

  • referencing is not supported

Bool Schema Keys

type: 'bool'

allowed_value: < bool > # true or false

autogenerate: < bool | default -> false >
# Autogenerate the default value, if data is not existing.
# Requires 'autogenerate_default'

autogenerate_default: < bool > # true or false; must match 'allowed_value' (if set)

Enum Data Type

Description
This data type represents a selection of allowed values.

Schema Path Sympbol: <<

Limitations

  • referencing is not supported
  • autogeneration is not supported

Enum Schema Keys

type: 'enum'

allowed_values: # required
- < value >

Float Data Type

Description
This data type represents float values.

Schema Path Sympbol: .

Float Schema Keys

type: 'float'

reference: { unique options }

maximum: < float >
# value must be lower or equal to this

minimum: < float >
# value must be greater or equal to this

maximum_decimals: < int > # >= 0
# Maximum allowed decimal places.

allowed_values:
- < float > # value must match this value
- round: < int > # value rounded to < int > digits must match 'matches'
  matches: < float >
- range_start: < float > # 'range_start' <= value <= 'range_end'
  range_end: < float >
# List of directives which values must match.

not_allowed_values:
- < float > # value mustn't match this value
- round: < int > # value rounded to < int > digits mustn't match 'matches'
  matches: < float >
- range_start: < float > # value < 'range_start' and value > 'range_end'
  range_end: < float >
# List of directives which values mustn't match.

autogenerate: < bool | default -> false >
# uses 'autogenerate_default' value.
#
# OR
#
# try for 'autogenerate_random_tries' times:
#   1. Create a random float value, which is within the 'autogenerate_ranges'
#      or 'minimum' <= random value <= 'maximum'
#   2. Check if random value passes the validation process.

autogenerate_default: < float >
# Autogenerate uses this value.

autogenerate_random_tries: < int | default 10 > # 0 < x < 50
# Ignored, if 'autogenerate_default' is set.
# Maximum amount of tries to find a random float value, which is not used by any reference.
# Integer value must be greater than 0 and lower than 50.

autogenerate_ranges:
- minimum: < float >
  maximum: < float >
# Ignored, if 'autogenerate_default' is set.
# '[.]minimum' <= '[.]maximum'
# Autogenerated float is within the ranges.
# If omitted, global 'minimum' and 'maximum' limits the random value.

autogenerate_random_decimals: < int | default 2 > # >= 0
# Ignored, if 'autogenerate_default' is set.
# Limit the decimal places for the random value.

Validation Process
If options are missing, corresponding checks are skipped.

  1. value >= minimum
  2. value <= maximum
  3. round of value == value
  4. value is not in not_allowed_values
  5. value is in allowed_values

ID-List Data Type

Description
This data type represents a dictionary, where keys are IDs. IDs can be strings or integer.

Schema Path Sympbol: {id}

Limitations

  • autogeneration is not supported

ID-List Schema Keys

type: 'id_list'

reference: { unique options }
# Note: Every ID is referenced as a value with the 'reference.key'.

minimum: <int | default -> 0 > # >= 0
# Minimum required amount of IDs

maximum: < int > # >= 0
# Maximum allowed amount of IDs
# If omitted, even an empty id_list is allowed.

elements: { data type schema } # required
# Data schema defining element data type

id_type: < 'integer' | 'string' | default -> 'string' >
# Indicates if IDs are integer or string

id_minimum: <int | default -> 0 > # >= 0
# Minimum required ID string length or minimum ID integer value

id_maximum: < int > # >= 0
# Maximum required ID string length or maximum ID integer value
# If omitted, even '' is allowed as ID string.

allowed_ids:
- < string | integer >
# List of regex strings or integers - depending on 'id_type'
# If ID matches any of it, the ID is allowed.

not_allowed_ids:
- < string >
# List of regex strings or integers - depending on 'id_type'
# If ID matches any of it, the ID is not allowed.
# 'not_allowed_ids' are test before 'allowed_ids'.

Integer Data Type

Description
This data type represents integer values.

Schema Path Sympbol: +

Integer Schema Keys

type: 'integer'

reference: { unique options }

maximum: < int >
# value must be lower or equal to this

minimum: < int >
# value must be greater or equal to this

not_allowed_values:
- < int >
# List of integers which values mustn't match.

autogenerate: < bool | default -> false >
# Requires 'reference.key' to be defined and not ''.
# If no unique value could be generated, autogeneration fails.

autogenerate_default: < int >
# Generate this integer value if value is None. Ignores all other 'autogen' options

autogenerate_maximum: < int >
# Autogenerated integer must be lower or equals to this.
# If omitted, 'maximum' key is upper limit

autogenerate_minimum: < int >
# Autogenerated integer must be greater or equals to this.
# If omitted, 'minimum' key is lower limit

autogenerate_find: < 'next_higher' | 'next_lower' | 'random' | default -> 'next_higher' >
# Ignored, if 'autogenerate_default' is set.
# Tells autogenerate to try first available integer value
# starting at 'minimum' and increasing ('next_higher'),
# starting at 'maximum' and decreasing ('next_lower') or
# picking a random number within 'minimum' and 'maximum'.

List Data Type

Description
This data type represents a list of same data types.
If different data types are allowed in the list, use data type 'multitype' as elements.

Schema Path Sympbol: []

Limitations

  • referencing is not supported - use referencing in the 'elements' data type
  • autogeneration of list elements is not supported - but autogeneration within existing elements data structure is supported (pass-through).

List Schema Keys

type: 'list' # required

elements: { data type schema }  # required
# Data schema defining elements data type

minimum: <int | default -> 0 > # >= 0
# Minimum required amount of elements in the list.

maximum: < int > # >= 0
# Maximum allowed amount of elements in the list.

allow_duplicates: < bool | default -> true >
# Allow same element data multiple times

Multitype Data Type

Description
This data type represents a selection of allowed data types.

Schema Path Sympbol: ?

Limitations

  • referencing is not supported - use referencing in the 'elements' data type
  • autogeneration of data types is not supported - but autogeneration within existing data structure is supported (pass-through).
  • Multitype in * Multiype* is not allowed

Multitype Schema Keys

type: 'multitype'
types: # required
- { data type schema }
# List of data type schemas.

None Data Type

Description
This data type represents a none or null value.

Schema Path Sympbol: °

Limitations

  • referencing is not supported
  • autogeneration of data types is not supported - it is already none :wink:

Multitype Schema Keys

type: 'none'

Object Data Type

Description
This data type represents an object with attributes. Technically its a dictionary in Python.
Attributes of the object are keys in the dictionary.

Schema Path Sympbol: {}

Limitations

  • autogeneration of missing keys is supported, if value data type supports autogeneration

Object Schema Keys

type: 'object'

attributes:
  < attribute_name >: { data type schema }
# Mapping with key as attribute name and value as data type schema.
# If omitted, any data which is an dictionary is accepted.

required_attributes:
- < attribute_name >
# List of attribute names, which must be in the object.

ignore_undefined_attributes: < bool | default -> false >
# Tell validator to ignore attributes in data object, which are not defined in 'attributes'.

dependencies:
  < attribute_name >:
    requires:
    - < attribute_name >
    # List of attribute names, which must be in the object, if this attribute is in.
    excludes:
    - < attribute_name >
    # List of attribute names, which must not be in the object, if this attribute is in.

allow_regex_attributes: < bool | default -> False >
# If enabled, regular expressions are allowed in:
# 'attributes': If object attribute name matches, schema is verified.
# 'required_attributes': Each element must have a at least one matching attribute name.
# 'dependencies.<>.requires': Successful if any object attribute name matches each list entry.
# 'dependencies.<>.excludes': Error if any object attribute name matches any list entry.
# !!! Disables autogeneration of missing keys !!!

autogenerate: < bool | default -> True >
# Enable/Disable autogeneration of missing attributes,
# if 'allow_regex_attributes' == false and
# attribute's data type supports autogeneration and is defined within.

reference: { reference options }
# The validator checks, if the same combination of attribute values is specified at
# another data type with the same reference.key.
# Requires 'reference_attributes' to be defined.
    
reference_attributes:
- < attribute_name >
# List of attribute names, which values should be combined uniqueness check.

Schema Data Type

Description
This data type does not represents an expected data value.
It uses a subschema's root data type to process the data structure.

Schema Path Sympbol: < name >

Schema Schema Keys

type: 'schema'

subschema: < str >
# Name of the subschema, which is defined under 'subschemas' in schema.

String Data Type

Description
This data type represents a string.

Schema Path Sympbol: #

Limitations

  • autogeneration is not supported.

String Schema Keys

type: 'string'

reference:
  # global reference options plus:
  allow_namespace_lookups: < bool > # Only valid for 'consumer' mode
  namespace_separator_char: < string >
# Process:
# - Check if *namespace_separator_char* is in string value
# - extract namespace from left part of first finding
# - extract value from right part of first finding
# - lookup for *provider value* == *extracted value* in namespace *extracted namespace* for reference.key

minimum: <int | default -> 0 > # >= 0
# Minimum required string length

maximum: < int > # >= 0
# Maximum allowed string length

allowed_values:
- < string >
# List of strings
# Dependis on 'regex_mode':
# == false: String must be equal to any string in the list.
# == true: String must match with any regex in the list.

not_allowed_values:
- < string >
# List of strings
# Dependis on 'regex_mode':
# == false: String mustn't be equal to any string in the list.
# == true: String mustn't match with any regex in the list.

regex_mode: < bool | default -> false >
# Use strings in 'allowed_values' and 'not_allowed_values' for regex matching.

regex_multiline: < bool | default -> false >
# Use multiline matching for regex tests or not

regex_fullmatch: < bool | default -> true >
# String must fully match.

Reference Options

If data type supports referencing, these options are available.

reference:
  key: < string > # required
  # Identifier to map data at different positions in the data structure

  # Define the reference mode.
  mode: < 'unique' | 'producer' | 'consumer' | default -> 'unique' >
  # - 'producer': collect values as allowed values for 'consumer' positions.
  # - 'unique': Inherits 'producer' and checks uniqueness of the value
  #   among other values at other positions with the same key.
  # - 'consumer': data value must match to a 'producer' value.

  allow_orphan_producer: < bool | default -> true >
  # If disabled, producer value without a consumer are not allowed.

  # Select the scope of the reference.
  unique_scope: < 'namespace' | 'global' | default -> 'global' > # Ignored in 'provider' or 'consumer' mode
  provider_scope: < 'namespace' | 'global' | default -> 'global' > # 'ignored in 'consumer' mode
  consumer_scope: < 'namespace' | 'global' | default -> 'global' > # Ignored in 'unique' or 'provider' mode
  # 'namespace' scopes to the same namespace data only. References across namespaces only works,
  # if both 'ends' specify 'global'.

Python Code Example

import os
import yaml
from cd2t import Validator

with open('my_schema.yml') as f:
    schema = yaml.load(f)

validator = Validator()
validator.load_schema(schema)

results = list()
for filename in os.listdir('./my_data_folder'):
    with open(filename) as f:
        test_data = yaml.load(f)
    validator.change_namespace(filename)
    _results = validator.validate_data(test_data)
    results.extend(_results)

_results = validator.get_reference_findings()
results.extend(_results)

print('\n'.join(results))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cd2t-1.6.2.tar.gz (33.0 kB view details)

Uploaded Source

Built Distribution

cd2t-1.6.2-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file cd2t-1.6.2.tar.gz.

File metadata

  • Download URL: cd2t-1.6.2.tar.gz
  • Upload date:
  • Size: 33.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.0

File hashes

Hashes for cd2t-1.6.2.tar.gz
Algorithm Hash digest
SHA256 b8280a35054115f3d0b934f1e24bfaadbbaddaa9c926cbfd5e0a9a2578db62fc
MD5 8c15ee718f8477d1f65344bf112b2616
BLAKE2b-256 7281aaafbd3273e799f6f81bc89411dc8201be5abd2be681fd876c5f7b54b7ec

See more details on using hashes here.

File details

Details for the file cd2t-1.6.2-py3-none-any.whl.

File metadata

  • Download URL: cd2t-1.6.2-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.24.0

File hashes

Hashes for cd2t-1.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0cad2d52c30573d58a96a1e44a8257b69cd62e39dc7fc8dfe2f32247b43ff1d2
MD5 aa8cdeab0be1ec9b1747196525154078
BLAKE2b-256 fbd17a8d7eb3229bfc6c6f00e907f735447cd4c9df53b4f46cc4b3a945cf13d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page