Skip to main content

Python classes from Avro schemas with custom serializers and deserializers, supporting all logical types.

Project description

Avro DBO 🚀

PyPI version License Python Versions

Avro DBO is a robust Python library designed for handling Apache Avro schemas. It facilitates seamless data serialization and schema management, making it ideal for data engineering pipelines and stream processing applications.

Features

  • 🏗️ Schema-First Development: Generate Python classes from Avro schemas.
  • 🔄 Full Type Support: Supports all Avro logical types including arrays and enums.
  • 🛠️ Custom Serialization: Offers flexible serializers and deserializers.
  • 🌐 Schema Registry Integration: Integrates natively with Confluent Schema Registry.
  • 🔒 Type Safety: Ensures full static type checking.
  • High Performance: Optimized for high-load production environments.

🚀 Quick Start

Step 1: Install from PyPI

pip install avro-dbo

Step 2: Import attrs and avro_dbo Import the attrs and avro_dbo modules.

For more information on using attrs, visit the attrs documentation.

from attrs import field, define
from avro_dbo import avro_schema

Step 3: Define your schema Defining your schema is optional. If you don't define a schema, the class will be created without any schema metadata.

@define
@avro_schema(
    name="OverrideNameOptional",  # optional overrides inherits from the class name
    namespace="OverrideNamespaceOptional",  # optional overrides inherits from the class namespace
    type="record",  # optional overrides inherits from the class type
    doc="OverrideDocOptional"  # optional overrides inherits from the class doc
)
class DecimalModel:
    amount: Decimal = field(
        default=Decimal("100.00"),
        metadata={
            "logicalType": "decimal",
            "precision": 10,
            "scale": 2
        }
    )

Step 4: Use your schema

my_model = DecimalModel()
print(my_model.amount)
# > Decimal("100.00")
# extra precision is truncated to the scale
my_model.amount = Decimal("100.00383889328932")
print(my_model.amount)
    # > Decimal("100.00")

# values are validated and coerced to the correct type
# against the defined schema rules!

Avro Primitive and Logical Types Supported

At this time all Avro logical types are supported. Avro DBO will automatically coerce the values to the correct type and scale.

Avro Logical Types Supported

  • decimal
  • date
  • time-millis
  • timestamp-millis
  • uuid
  • fixed
  • enum
  • array
  • map
  • union
  • error

Avro Primitive Types Supported

  • string
  • bytes
  • int
  • long
  • float
  • double

decimal Type

decimal.Decimal are automatically coerced to the correct type and scale. Avro DBO quantizes the precision to the scale of the field everytime the field is set in any instance of the attrs.define decorated class.

from attrs import field, define
from decimal import Decimal

@define
@avro_schema
class DecimalModel:
    amount: Decimal = field(
        default=Decimal("100.00"),
        metadata={
            "logicalType": "decimal",
            "precision": 10,
            "scale": 2
        }
    )

timestamp-millis and time-millis and date Type

Timestamps are automatically coerced to the correct type (datetime.datetime and datetime.date).

All python datetime, date, and time types are supported.

The values serialize and deserialize to the correct type long (milliseconds since the epoch), or int (milliseconds since the epoch) for date types.

from attrs import field, define
import datetime

@define
@avro_schema
class TimestampModel:
    created_at: datetime.datetime = field(
        metadata={
            "logicalType": "timestamp-millis"
        }
    )

enum Type

Enums are supported and will be serialized to the correct type.

from attrs import field, define
from enum import Enum

class Status(Enum):
    ACTIVE = "ACTIVE"
    INACTIVE = "INACTIVE"

@define
@avro_schema
class EnumModel:
    status: Status = field(
        default=Status.ACTIVE,
        metadata={
            "logicalType": "enum",
            "symbols": list(Status)
        }
    )

This produces the following schema:

{
  "type": "record",
  "name": "EnumModel",
  "fields": [{"name": "status", "type": "enum", "symbols": ["ACTIVE", "INACTIVE"]}]
}

array Type

from attrs import field, define
from typing import List

@define
@avro_schema
class ArrayModel:
    tags: List[str] = field(
        factory=list,
        metadata={
            "logicalType": "array",
            "items": "string"
        }
    )

Kitchen Sink Example

The following example demonstrates all the supported types and logical types.

from attrs import field, define
from decimal import Decimal
from enum import Enum
from typing import List
import datetime

class Status(Enum):
    ACTIVE = "ACTIVE"
    INACTIVE = "INACTIVE"

@define
@avro_schema
class KitchenSinkModel:
    name: str = field(default="")
    amount: Decimal = field(
        default=Decimal("999.99"),
        metadata={
            "logicalType": "decimal",
            "precision": 10,
            "scale": 2
        }
    )
    status: Status = field(
        default=Status.ACTIVE,
        metadata={
            "logicalType": "enum",
            "symbols": list(Status)
        }
    )
    created_at: datetime.datetime = field(
        metadata={
            "logicalType": "timestamp-millis"
        }
    )
    tags: List[str] = field(
        factory=list,
        metadata={
            "logicalType": "array",
            "items": "string"
        }
    )

Example Avro Schema Output

You can use the export_schema() method to export the schema as a JSON object.

Running the Example

# ... import the KitchenSinkModel class
print(KitchenSinkModel.export_schema())

The result will be a JSON object that can be used to define the schema in a Confluent Schema Registry.

Example Avro Schema Output

{
  "type": "record",
  "name": "KitchenSinkModel",
  "fields": [
    {"name": "name", "type": "string", "default": ""},
    {"name": "amount", "type": "decimal", "precision": 10, "scale": 2},
    {"name": "status", "type": "enum", "symbols": ["ACTIVE", "INACTIVE"]},
    {"name": "created_at", "type": "long", "logicalType": "timestamp-millis"},
    {"name": "tags", "type": "array", "items": "string"}
  ]
}

Saving an Avro Schema to a File

You can use the export_schema() method to export the schema as a JSON object.

KitchenSinkModel.export_schema(filename="kitchen_sink_model.json")

Coercing a Python Class Using Avro Schema Model

Avro-DBO will coerce automnatically all fields in the schema to the correct type.

Avro to datetime, date, decimal, enum, array, and more.

Example with decimal.Decimal

from attrs import field, define
from decimal import Decimal

@define
@avro_schema
class DecimalModel:
    amount: Decimal = field(
        default=Decimal("100.00"),
        metadata={
            "logicalType": "decimal",
            "precision": 10,
            "scale": 2
        }
    )

my_model = DecimalModel()
print(my_model.amount)
# > Decimal("100.00")
# extra precision is truncated to the scale
my_model.amount = Decimal("100.00383889328932")
print(my_model.amount)
# > Decimal("100.00")

Additional Information

The following links are useful for more information on the project.

🤝 Contributing

We welcome contributions! To submit issues or propose changes, please visit our GitHub repository. See the CONTRIBUTING.md file for more information on how to contribute.

📜 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

About Mac Anderson (Author)

Avro DBO was created and is maintained by Mac Anderson. For more insights into my other projects, visit Mac Anderson's GitHub.

For additional information on my professional work, explore Tradesignals and connect with me on LinkedIn.

Thank you!

-- Mac Anderson

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

avro_dbo-1.0.0.tar.gz (863.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

avro_dbo-1.0.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file avro_dbo-1.0.0.tar.gz.

File metadata

  • Download URL: avro_dbo-1.0.0.tar.gz
  • Upload date:
  • Size: 863.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for avro_dbo-1.0.0.tar.gz
Algorithm Hash digest
SHA256 448530fdf72588dee007aeb23a3ad8a8b98df1ccfd161199efdc4d4decba0d73
MD5 6af2d4a9987f9c5ca08e8b5143871fa8
BLAKE2b-256 49d1e41e810a740ad4f9924d89c7f847de4280b6771c301088ac017a64d1ba71

See more details on using hashes here.

File details

Details for the file avro_dbo-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: avro_dbo-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for avro_dbo-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a92827a391084a519c87b30e54cc9745bf22d4f274c26111ec5e0f0813ea16ec
MD5 89781294deb28e22e1292851cfeb312f
BLAKE2b-256 f21b00749039bd617a26847eb350ec6d13d8aeb3800073fc703c85a66ab3188d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page