Python classes from Avro schemas with custom serializers and deserializers, supporting all logical types.
Project description
Avro DBO 🚀
Avro DBO is a robust Python library designed for handling Apache Avro schemas. It facilitates seamless data serialization and schema management, making it ideal for data engineering pipelines and stream processing applications.
✨ Features
- 🏗️ Schema-First Development: Generate Python classes from Avro schemas.
- 🔄 Full Type Support: Supports all Avro logical types including arrays and enums.
- 🛠️ Custom Serialization: Offers flexible serializers and deserializers.
- 🌐 Schema Registry Integration: Integrates natively with Confluent Schema Registry.
- 🔒 Type Safety: Ensures full static type checking.
- ⚡ High Performance: Optimized for high-load production environments.
🚀 Quick Start
Step 1: Install from PyPI
pip install avro-dbo
Step 2: Import attrs and avro_dbo
Import the attrs and avro_dbo modules.
For more information on using attrs, visit the attrs documentation.
from attrs import field, define
from avro_dbo import avro_schema
Step 3: Define your schema Defining your schema is optional. If you don't define a schema, the class will be created without any schema metadata.
@define
@avro_schema(
name="OverrideNameOptional", # optional overrides inherits from the class name
namespace="OverrideNamespaceOptional", # optional overrides inherits from the class namespace
type="record", # optional overrides inherits from the class type
doc="OverrideDocOptional" # optional overrides inherits from the class doc
)
class DecimalModel:
amount: Decimal = field(
default=Decimal("100.00"),
metadata={
"logicalType": "decimal",
"precision": 10,
"scale": 2
}
)
Step 4: Use your schema
my_model = DecimalModel()
print(my_model.amount)
# > Decimal("100.00")
# extra precision is truncated to the scale
my_model.amount = Decimal("100.00383889328932")
print(my_model.amount)
# > Decimal("100.00")
# values are validated and coerced to the correct type
# against the defined schema rules!
Avro Primitive and Logical Types Supported
At this time all Avro logical types are supported. Avro DBO will automatically coerce the values to the correct type and scale.
Avro Logical Types Supported
decimaldatetime-millistimestamp-millisuuidfixedenumarraymapunionerror
Avro Primitive Types Supported
stringbytesintlongfloatdouble
decimal Type
decimal.Decimal are automatically coerced to the correct type and scale. Avro DBO quantizes the precision to the scale of the field everytime the field is set in any instance of the attrs.define decorated class.
from attrs import field, define
from decimal import Decimal
@define
@avro_schema
class DecimalModel:
amount: Decimal = field(
default=Decimal("100.00"),
metadata={
"logicalType": "decimal",
"precision": 10,
"scale": 2
}
)
timestamp-millis and time-millis and date Type
Timestamps are automatically coerced to the correct type (datetime.datetime and datetime.date).
All python datetime, date, and time types are supported.
The values serialize and deserialize to the correct type long (milliseconds since the epoch), or int (milliseconds since the epoch) for date types.
from attrs import field, define
import datetime
@define
@avro_schema
class TimestampModel:
created_at: datetime.datetime = field(
metadata={
"logicalType": "timestamp-millis"
}
)
enum Type
Enums are supported and will be serialized to the correct type.
from attrs import field, define
from enum import Enum
class Status(Enum):
ACTIVE = "ACTIVE"
INACTIVE = "INACTIVE"
@define
@avro_schema
class EnumModel:
status: Status = field(
default=Status.ACTIVE,
metadata={
"logicalType": "enum",
"symbols": list(Status)
}
)
This produces the following schema:
{
"type": "record",
"name": "EnumModel",
"fields": [{"name": "status", "type": "enum", "symbols": ["ACTIVE", "INACTIVE"]}]
}
array Type
from attrs import field, define
from typing import List
@define
@avro_schema
class ArrayModel:
tags: List[str] = field(
factory=list,
metadata={
"logicalType": "array",
"items": "string"
}
)
Kitchen Sink Example
The following example demonstrates all the supported types and logical types.
from attrs import field, define
from decimal import Decimal
from enum import Enum
from typing import List
import datetime
class Status(Enum):
ACTIVE = "ACTIVE"
INACTIVE = "INACTIVE"
@define
@avro_schema
class KitchenSinkModel:
name: str = field(default="")
amount: Decimal = field(
default=Decimal("999.99"),
metadata={
"logicalType": "decimal",
"precision": 10,
"scale": 2
}
)
status: Status = field(
default=Status.ACTIVE,
metadata={
"logicalType": "enum",
"symbols": list(Status)
}
)
created_at: datetime.datetime = field(
metadata={
"logicalType": "timestamp-millis"
}
)
tags: List[str] = field(
factory=list,
metadata={
"logicalType": "array",
"items": "string"
}
)
Example Avro Schema Output
You can use the export_schema() method to export the schema as a JSON object.
Running the Example
# ... import the KitchenSinkModel class
print(KitchenSinkModel.export_schema())
The result will be a JSON object that can be used to define the schema in a Confluent Schema Registry.
Example Avro Schema Output
{
"type": "record",
"name": "KitchenSinkModel",
"fields": [
{"name": "name", "type": "string", "default": ""},
{"name": "amount", "type": "decimal", "precision": 10, "scale": 2},
{"name": "status", "type": "enum", "symbols": ["ACTIVE", "INACTIVE"]},
{"name": "created_at", "type": "long", "logicalType": "timestamp-millis"},
{"name": "tags", "type": "array", "items": "string"}
]
}
Saving an Avro Schema to a File
You can use the export_schema() method to export the schema as a JSON object.
KitchenSinkModel.export_schema(filename="kitchen_sink_model.json")
Coercing a Python Class Using Avro Schema Model
Avro-DBO will coerce automnatically all fields in the schema to the correct type.
Avro to datetime, date, decimal, enum, array, and more.
Example with decimal.Decimal
from attrs import field, define
from decimal import Decimal
@define
@avro_schema
class DecimalModel:
amount: Decimal = field(
default=Decimal("100.00"),
metadata={
"logicalType": "decimal",
"precision": 10,
"scale": 2
}
)
my_model = DecimalModel()
print(my_model.amount)
# > Decimal("100.00")
# extra precision is truncated to the scale
my_model.amount = Decimal("100.00383889328932")
print(my_model.amount)
# > Decimal("100.00")
Additional Information
The following links are useful for more information on the project.
🤝 Contributing
We welcome contributions! To submit issues or propose changes, please visit our GitHub repository. See the CONTRIBUTING.md file for more information on how to contribute.
📜 License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
About Mac Anderson (Author)
Avro DBO was created and is maintained by Mac Anderson. For more insights into my other projects, visit Mac Anderson's GitHub.
For additional information on my professional work, explore Tradesignals and connect with me on LinkedIn.
Thank you!
-- Mac Anderson
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file avro_dbo-1.0.0.tar.gz.
File metadata
- Download URL: avro_dbo-1.0.0.tar.gz
- Upload date:
- Size: 863.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.29
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
448530fdf72588dee007aeb23a3ad8a8b98df1ccfd161199efdc4d4decba0d73
|
|
| MD5 |
6af2d4a9987f9c5ca08e8b5143871fa8
|
|
| BLAKE2b-256 |
49d1e41e810a740ad4f9924d89c7f847de4280b6771c301088ac017a64d1ba71
|
File details
Details for the file avro_dbo-1.0.0-py3-none-any.whl.
File metadata
- Download URL: avro_dbo-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.29
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a92827a391084a519c87b30e54cc9745bf22d4f274c26111ec5e0f0813ea16ec
|
|
| MD5 |
89781294deb28e22e1292851cfeb312f
|
|
| BLAKE2b-256 |
f21b00749039bd617a26847eb350ec6d13d8aeb3800073fc703c85a66ab3188d
|