Skip to main content

Tools to convert from and to Avro Schema from various other schema languages.

Project description

Avrotize

This tool is under very active development. Don't use it.

Avrotize is a command-line tool that allows you to convert between different schema formats. It is designed to be easy to use and flexible, supporting a variety of use cases.

Supported conversions to Avro Schema:

  • JSON Schema
  • XML Schema (XSD)
  • Protocol Buffers
  • ASN.1

Supported conversions from Avro Schema:

  • Kusto Data Table Definition (KQL)
  • T-SQL Table Definition (SQL)
  • Apache Parquet files
  • Protocol Buffers

Mind that many conversions are lossy and will not transfer all information to the target schema. This is very much by design. The key point of this tool is to use a "sane" schema format (Avro Schema) as the pivot point to and from which other schema formats are converted. The tool tries to preserve the most important information of the source schema format, but not all.

The conversion issues are documented below.

Installation

You can install Avrotize from PyPI:

pip install avrotize

Usage

Avrotize provides several commands for converting between different schema formats.

Convert Proto schema to Avro schema

avrotize p2a --proto <path_to_proto_file> --avsc <path_to_avro_schema_file>

Conversion issues:

  • Protobuf allows any scalar type as key in a map, Avro does not. When converting from Proto to Avro, the type information for the map keys is ignored.
  • The tool embeds all 'well-known' Protobuf 3.0 types in Avro format and injects them as needed when the respective types are included. Only the Timestamp type is mapped to the Avro logical type 'timestamp-millis'. The rest of the well-known Protobuf types are kept as Avro record types with the same field names and types.

Convert Avro schema to Proto schema

avrotize a2p --proto <path_to_proto_file> --avsc <path_to_avro_schema_file>

Convert JSON schema to Avro schema

avrotize j2a --jsons <path_to_json_schema_file> --avsc <path_to_avro_schema_file> [--namespace <avro_schema_namespace>]

JSON Schema is a very flexible schema format and extremely permissive. That results in many valid JSON schema documents for which it is difficult to translate all definitions into Avro Schema.

Conversion issues:

  • All field constraints and validations associated with the JSON Schema are ignored in the translation to Avro. Avro does not support the same level of validation as JSON Schema.
  • Very large schemas with many cross references ($ref) throughout the schema may have circular references that cannot be fully resolved in Avro Schema.
  • Schemas with top-level 'allOf', 'anyOf', 'oneOf' keywords are not supported.
  • JSON type unions as well as allOf, anyOf, and oneOf expressions that are shared and referenced by a $ref expression are mapped to a record type in Avro with a field value of the type union.
  • JSON enums are converted to the Avro enum type. Numeric values are not supported by Avro and the tool will ignore them. Numeric string values are prefixed with an underscore and the result is sanitized to be a valid Avro enum name.
  • Untyped object properties (without type attribute) are mapped to an Avro union that allows scalar values or two levels of array and/or map nesting.
  • Conditional schema validation is not translated to Avro. The tool will ignore all if/then/else, dependentRequired, and dependentSchemas keywords and the resulting Avro schema will not enforce the conditional validation.
  • JSON Schema allows for arbitrary property names, Avro does not. When converting from JSON to Avro, the property names in objects are sanitized by replacing any non-alphanumeric characters with underscores and prefixing the result with an underscore. This may lead to name conflicts and the tool will simply append a unique index to the name to avoid naming conflicts.
  • All patternProperties are converted into a fields holding arrays of records.
  • All external references ($ref) are resolved and embedded in the Avro schema. The tool does not support maintaining external references to other schemas. To perform a conversion, all external $ref references have to be resolvable by the tool.
  • When a JSON schema file does not define a top-level type, the tool will look for a definitions section and emit all definitions as a union of the types defined. This also works with Swagger and OpenAPI files.

Convert XML Schema (XSD) to Avro schema

avrotize x2a --xsd <path_to_xsd_file> --avsc <path_to_avro_schema_file> [--namespace <avro_schema_namespace>]

Conversion issues:

  • All XML Schema elements are mapped to Avro record types with fields, whereby both elements and attributes become fields in the record.
  • simpleType declarations and all type constraints are ignored. Avro does not support the same level of validation as XML Schema.

Convert Avro schema to Kusto table declaration

avrotize a2k --avsc <path_to_avro_schema_file> --kusto <path_to_kusto_kql_file> [--record-type <record_type>]

Conversion issues:

  • Only the Avro record type can be mapped to a Kusto table. If the Avro schema contains other types (like enum or array), the tool will ignore them.
  • Only the first record type in the Avro schema is converted to a Kusto table. If the Avro schema contains other record types, they will be ignored. The --record-type option can be used to specify which record type to convert.
  • The fields of the record are mapped to columns in the Kusto table. Fields that are records or arrays or maps are mapped to columns of type dynamic in the Kusto table.

Convert Avro schema to T-SQL table definition

avrotize a2tsql --avsc <path_to_avro_schema_file> --tsql <path_to_sql_file> [--record-type <record_type>]

Conversion issues:

  • Only the Avro record type can be mapped to a T-SQL table. If the Avro schema contains other types (like enum or array), the tool will ignore them.
  • Only the first record type in the Avro schema is converted to a T-SQL table. If the Avro schema contains other record types, they will be ignored. The --record-type option can be used to specify which record type to convert.
  • The fields of the record are mapped to columns in the T-SQL table. Fields that are records or arrays or maps are mapped to columns of type varchar(max) in the T-SQL table and it's assumed for them to hold JSON data.
  • The emitted script sets extended properties to the columns with the Avro schema definition of the field in a JSON format. This allows for easy introspection of the serialized Avro schema in the field definition.

Convert Avro schema to empty Parquet file

avrotize a2pq --avsc <path_to_avro_schema_file> --parquet <path_to_parquet_schema_file>

Conversion issues:

  • The emitted Parquet file contains only the schema, no data rows.

Convert ASN.1 schema to Avro schema

avrotize asn2a --asn <path_to_asn1_schema_file>  --avsc <path_to_avro_schema_file>

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

Avrotize is released under the Apache License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avrotize-0.7.5.tar.gz (34.5 kB view hashes)

Uploaded Source

Built Distribution

avrotize-0.7.5-py3-none-any.whl (43.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page