Skip to main content

Pure python avro schema validator

Project description

CI Documentation Status PyPI version Downloads

Avro Validator

A pure python avro schema validator.

The default avro library for Python provide validation of data against the schema, the problem is that the output of this validation doesn't provide information about the error. All you get is the the datum is not an example of the schema error message.

When working with bigger avro schemas, sometimes is not easy to visually find the field that has an issue.

This library provide clearer exceptions when validating data against the avro schema, in order to be easier to identify the field that is not compliant with the schema and the problem with that field.

Installing

Install using pip:

$ pip install -U avro_validator

Validating data against Avro schema

The validator can be used as a console application. It receives a schema file, and a data file, validating the data and returning the error message in case of failure.

The avro_validator can also be used as a library in python code.

Console usage

In order to validate the data_to_validate.json file against the schema.avsc using the avro_validator callable, just type:

$ avro_validator schema.avsc data_to_valdate.json
OK

Since the data is valid according to the schema, the return message is OK.

Error validating the data

If the data is not valid, the program returns an error message:

$ avro_validator schema.avsc data_to_valdate.json
Error validating value for field [data,my_boolean_value]: The value [123] is not from one of the following types: [[NullType, BooleanType]]

This message indicates that the field my_boolean_value inside the data dictionary has value 123, which is not compatible with the bool type.

Command usage

It is possible to get information about usage of the avro_validator using the help:

$ avro_validator -h

Library usage

Using schema file

When using the avr_validator as a library, it is possible to pass the schema as a file:

from avro_validator.schema import Schema

schema_file = 'schema.avsc'

schema = Schema(schema_file)
parsed_schema = schema.parse()

data_to_validate = {
    'name': 'My Name'
}

parsed_schema.validate(data_to_validate)

In this example, if the data_to_validate is valid according to the schema, then the parsed_schema.validate(data_to_validate) call will return True.

Using a dict as schema

It is also possible to provide the schema as a json string:

import json
from avro_validator.schema import Schema

schema = json.dumps({
    'name': 'test schema',
    'type': 'record',
    'doc': 'schema for testing avro_validator',
    'fields': [
        {
            'name': 'name',
            'type': 'string'
        }
    ]
})

schema = Schema(schema)
parsed_schema = schema.parse()

data_to_validate = {
    'name': 'My Name'
}

parsed_schema.validate(data_to_validate)

In this example, the parsed_schema.validate(data_to_validate) call will return True, since the data is valid according to the schema.

Invalid data

If the data is not valid, the parsed_schema.validate will raise a ValueError, with the message containing the error description.

import json
from avro_validator.schema import Schema

schema = json.dumps({
    'name': 'test schema',
    'type': 'record',
    'doc': 'schema for testing avro_validator',
    'fields': [
        {
            'name': 'name',
            'type': 'string',
            'doc': 'Field that stores the name'
        }
    ]
})

schema = Schema(schema)
parsed_schema = schema.parse()

data_to_validate = {
    'my_name': 'My Name'
}

parsed_schema.validate(data_to_validate)

The schema defined expects only one field, named name, but the data contains only the field name_2, making it invalid according to the schema. In this case, the validate method will return the following error:

Traceback (most recent call last):
  File "/Users/leonardo.almeida/.pyenv/versions/avro_validator_venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-a5e8ce95d21c>", line 23, in <module>
    parsed_schema.validate(data_to_validate)
  File "/opt/dwh/avro_validator/avro_validator/avro_types.py", line 563, in validate
    raise ValueError(f'The fields from value [{value}] differs from the fields '
ValueError: The fields from value [{'my_name': 'My Name'}] differs from the fields of the record type [{'name': RecordTypeField <name: name, type: StringType, doc: Field that stores the name, default: None, order: None, aliases: None>}]

The message detailed enough to enable the developer to pinpoint the error in the data.

Invalid schema

If the schema is not valid according to avro specifications, the parse method will also return a ValueError.

import json
from avro_validator.schema import Schema

schema = json.dumps({
    'name': 'test schema',
    'type': 'record',
    'doc': 'schema for testing avro_validator',
    'fields': [
        {
            'name': 'name',
            'type': 'invalid_type',
            'doc': 'Field that stores the name'
        }
    ]
})

schema = Schema(schema)
parsed_schema = schema.parse()

Since the schema tries to define the name field as invalid_type, the schema declaration is invalid, thus the following exception will be raised:

Traceback (most recent call last):
  File "/Users/leonardo.almeida/.pyenv/versions/avro_validator_venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-7f3f77000f08>", line 18, in <module>
    parsed_schema = schema.parse()
  File "/opt/dwh/avro_validator/avro_validator/schema.py", line 28, in parse
    return RecordType.build(schema)
  File "/opt/dwh/avro_validator/avro_validator/avro_types.py", line 588, in build
    record_type.__fields = {field['name']: RecordTypeField.build(field) for field in json_repr['fields']}
  File "/opt/dwh/avro_validator/avro_validator/avro_types.py", line 588, in <dictcomp>
    record_type.__fields = {field['name']: RecordTypeField.build(field) for field in json_repr['fields']}
  File "/opt/dwh/avro_validator/avro_validator/avro_types.py", line 419, in build
    field.__type = cls.__build_field_type(json_repr)
  File "/opt/dwh/avro_validator/avro_validator/avro_types.py", line 401, in __build_field_type
    raise ValueError(f'Error parsing the field [{fields}]: {actual_error}')
ValueError: Error parsing the field [name]: The type [invalid_type] is not recognized by Avro

The message is clearly indicating that the the invalid_type is not recognized by avro.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avro_validator-1.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

avro_validator-1.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file avro_validator-1.1.0.tar.gz.

File metadata

  • Download URL: avro_validator-1.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for avro_validator-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ead619acd9a8d2ef4d54d83c2abc39fed30dfa8570b85ff8a98ec772d8158bfd
MD5 81b3ea379e55e6c33d0f0456ca76ce8d
BLAKE2b-256 3cdc3d47a6b374ce7d8dd9e188395a6003d2384c2700b73f8ad5ffd30b37eebd

See more details on using hashes here.

File details

Details for the file avro_validator-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for avro_validator-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81d5c3b75074772f933ac00147c5500ae86140178cb2dcdb1e14dc38ac833f4a
MD5 fbb99e6fa1cb2596d72b7bce9786ceed
BLAKE2b-256 f0e8d4d3e8a8495c463f87b4870039f212916cb01bbb763cba6e7e06e7657314

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page