Skip to main content

Generation of classes from Avro schemas

Project description

Avro scheme generation

:exclamation: This is quite barebones. Not all avro structures/types will be handled correctly since it was made with a specific avro schema catalog in mind. If you find anything missing, please submit a PR or create an issue.

This library aims to create dataclasses and/or typed dict definitions from avro schemes, that can be used to type check creation of messages using a static type checker such as mypy. The aim is to catch bugs before they occurr on runtime, and provide better IDE support.

As the name suggests, it was thought of as an optional extension to use with the excellent fastavro library. Fastavro allows users to read avro schemas, create messages and write them as avro messages, or validate against a schema.

Fastavro-gen uses fastavro to read .avsc files and from the schema object generated, creates classes. Classes are written one per file, using the namespace to create a directory structure. For example, the following record output class will be created under ./example/avro/user.py.

{
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "favorite_number",  "type": ["int", "null"]},
        {"name": "favorite_color", "type": ["string", "null"]}
    ]
}

Building a User message would normally be done by building a dictionary:

{
    "name": "My User",
    "favorite_number": "1",
    "favorite_color": "green",
}

Notice that the favorite number field in the schema has type int (or None) but the one we created has a string. This would cause a runtime error when writing or validating the record. Using the generated dataclass we can get IDE support (screenshot using VSCode with the Pylance language server). Notice the underlined "1". Hovering over shows the relevant error.

VSCode IDE support

Mypy will also catch this issue:

test.py:9: error: Argument "favorite_number" to "User" has incompatible type "str"; expected "Optional[int]"
Found 1 error in 1 file (checked 1 source file)

Output

The library offers the user two different output class types, dataclasses and TypedDicts. Each has it's pros and cons that have to be weighed for the user's use cases.

TypedDict

TypedDicts are only valuable during type checking, and on runtime they are simply treated as normal dicts. As such they can be built using common python dict syntax with an added type annotation, or using a class instantiation syntax:

class A(TypedDict, total=True):
    field1: int
    field2: str
    ...

instance1: A = {
    "field1": 1,
    "field2": "2",
}

instance2 = A(
    field1=1,
    field2="2",
)

:heavy_plus_sign:Messages can be built using python dictionary syntax
:heavy_plus_sign:Fastavro expects messages as dictionaries
:heavy_minus_sign:All fields of the dictionary have to be given at the time of creation, unless the total option is given as False. Having total=False however restricts some aspects of the type checking e.g. checking if some keys are set or not. Currently this library has the total option hardcoded as False but that might be configurable at a later time.
:heavy_minus_sign:No ability to specify defaults.

dataclass

Dataclasses allow for easy declaration of python classes.

:heavy_plus_sign:Can handle default values for fields. As such only non-default fields have to be instantiated initially.
:heavy_plus_sign:Easy to transform to dictionaries with the provided fastavro_gen.asdict function. It is simply a wrapper around dataclasses.asdict.
:heavy_minus_sign:Complex nested schemas means a lot of objects being created
:heavy_minus_sign:Extra overhead transforming messages to dictionaries
:heavy_minus_sign:Overhead transforming dictionaries to dataclasses using fastavro_gen.fromdict.

Usage

This is a work in progress and can't currently be installed without cloning the repository.

To generate classes use the CLI or import the generate function from fastavro_gen. The library also exposes fastavro_gen.[asdict, fromdict] to map generated dataclasses to and from dictionaries.

:bulb: When the ordered option is specified, the file parameter will be ignored. Instead you can define schemas specified in the file parameter as singletons in the toml file passed to ordered.

usage: fastavro_gen [-h] [-o ORDERED] [--class-type {dataclass,TypedDict}] [--no-black] [--prefix PREFIX] [--output-dir OUTPUT_DIR] [file [file ...]]

Generate dataclasses or TypedDicts from avro schemas

positional arguments:
  file                  file(s) to parse, use '-' for stdin

optional arguments:
  -h, --help            show this help message and exit
  -o ORDERED, --ordered ORDERED
                        Path to a .toml file for multiple schemas or ordered schemas. Overwrites 'file' parameter.
  --class-type {dataclass,TypedDict}
  --no-black            Do not run output files through 'black'
  --prefix PREFIX       Removes this prefix from namespace if it is contained
  --output-dir OUTPUT_DIR
                        Specify the output location

The --ordered option

The option allows users to specify an order of files to read throught fastavro's load_schema_ordered function. This is useful when your files are laid out in a manner that does not follow the structure that the normal load_schema expects.

The option takes as value a path to a .toml file that describes what schemas to read and what their pre-requisites. For example, creating classes for a schema A that depends on B and C your .toml would include:

schemaA = [
    "/path/to/C.avsc",
    "/path/to/B.avsc",
    "/path/to/A.avsc",
]

The toml file can describe multiple schema dependencies, each as their own list.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastavro-gen-0.0.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

fastavro_gen-0.0.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file fastavro-gen-0.0.1.tar.gz.

File metadata

  • Download URL: fastavro-gen-0.0.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for fastavro-gen-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c8582fa63f04e40fd6b73d70e8d426706af055831787404a9d8d14b2744d113f
MD5 af43742641631d02d5b5b4a12697fc74
BLAKE2b-256 d8c9f0158bc7c00064ec6a939f0298dc226c9be499312ce78cb83ec9e2aa398a

See more details on using hashes here.

File details

Details for the file fastavro_gen-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: fastavro_gen-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for fastavro_gen-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e6bf16fe385e69e4734f4a31d2f0f689125d6d8870a285c0c509cbc141167714
MD5 60568d8e134ceb5e14b244baad9deffd
BLAKE2b-256 b7ab41a7e325f7d37b3529357aa88fdfa3377ea22d4078163c45a541adce3498

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page