Python utilities used for interacting with .avro/.avsc files
Project description
Christopher H. Todd's Python Lib for AVRO/AVSC
The ctodd-python-lib-avro project is responsible for interacting with Apache AVRO. This includes converting to byte arrays and from byte arrays, writing and reading .avro files, writing and reading .avsc files, and other minor quality of life wrappers.
The library relies on Python's avro-python3 package, and is wrapped with custom/specific exception handling, simpler interactions, and a more functional style to reduce code in projects dealing with AVRO
Table of Contents
Dependencies
Python Packages
- avro-python3>=1.8.2
- simplejson>=3.16.0
Libraries
avro_converter_helpers.py
This library is used to convert avro to other formats (first .json)
Functions:
def convert_avro_file_to_json(avro_filename, json_filename=None):
"""
Purpose:
Convert an .avro file into a .json file
Args:
avro_filename (String): Path/filename of the .avro file to convert to .json
json_filename (String): Path/filename of the .json file to generate. if none
is specified, just use the same .avro path and change the extension
Yields:
json_filename (String): Path/filename of the .json file generated
"""
avro_exceptions.py
File for holding custom exception types that will be generated by the avro_helpers libraries
Exception Types:
class AvroTestException(Exception):
"""
Purpose:
The AvscInvalid will be raised when reading the .avsc raises an exception
"""
class AvscInvalid(Exception):
"""
Purpose:
The AvscInvalid will be raised when reading the .avsc raises an exception
"""
class AvscNotFound(Exception):
"""
Purpose:
The AvscNotFound will be raised when trying to Read a .avsc file
that cannot be found.
"""
class AvroNotFound(Exception):
"""
Purpose:
The AvroNotFound will be raised when trying to Read a .avro file
that cannot be found.
"""
avro_general_helpers.py
Avro General Helpers. This library is used to interact with .avro files not specificlly related to reading or writing them.
Functions:
N/A
avro_reading_helpers.py
Avro Reading Helpers. This library is used to aid in the task of reading .avro files
Functions:
def get_record_from_avro_generator(avro_filename):
"""
Purpose:
Generator of records from a .avro filename (with path in the filename)
Args:
avro_filename (String): Path/filename of the .avro file to get records from
Yields:
avro_record (Record Obj from .avro): Record read from the .avro file
"""
def get_record_from_avro_buffered(avro_filename):
"""
Purpose:
Buffered Get records from a .avro filename (with path in the filename)
Args:
avro_filename (String): Path/filename of the .avro file to get records from
Returns:
avro_records (List of Record Objs from .avro): List of Records read from
the .avro file
"""
avro_schema_helpers.py
Avro Schema Helpers. This library is used to interact with .avsc files
Functions:
def get_schema_from_avsc_file(avsc_filename):
"""
Purpose:
Get the file schema from an .avsc filename (with path in the filename)
Args:
avsc_filename (String): Path/filename of the .avsc file to get the schema from
Return:
avro_schema (AVRO Schema Object): Schema object from the avro library
"""
avro_writing_helpers.py
Avro Writing Helpers. This library is used to aid in the task of writing .avro files
Functions:
def write_raw_records_to_avro(raw_records, avro_filename, avro_schema):
"""
Purpose:
Write Records to .avro File
Args:
raw_records (List of Dicts): List of Recrods to Write to AVRO as Bytes
avro_filename (String): Filename and path of .avro to write
avro_schema (AVRO Schema Object): Schema object from the avro library
Returns:
N/A
"""
def serialize_data(raw_records, avro_schema):
"""
Purpose:
Serialize a record as bytes
Args:
raw_records (List of Dicts): List of Records to Serialize
avro_schema (AVRO Schema Object): Schema object from the avro library
Return:
serialized_records (List of Byte Array): Records Serialized into Byte-Array
"""
Example Scripts
Example executable Python scripts/modules for testing and interacting with the library. These show example use-cases for the libraries and can be used as templates for developing with the libraries or to use as one-off development efforts.
read_avro_file.py
Purpose:
Read an .avro File
Steps:
- Either
- Read .avro File as Buffered List
- Read .avro File as Generator
function call:
python3 read_avsc_file.py {--avro=avro_filename}
example call:
python3 read_avsc_file.py --avro="./data/test_data.avro"
read_avsc_file.py
Purpose:
Read an .avsc File to get the schema
Steps:
- Read .avsc Schema
function call:
python3 read_avsc_file.py {--avsc=avsc_filename}
example call:
python3 read_avsc_file.py --avsc="./avsc/test_schema.avsc"
write_avro_file.py
Purpose:
Write an .avro File
Steps:
- Either
- Write .avro File
function call:
python3.6 write_avro_file.py {--avro=avro_filename} \
{--avsc=avsc_filename}
example call:
python3.6 write_avro_file.py --avro="./data/generated_data.avro" \
--avsc="./avsc/test_schema.avsc"
Notes
- Relies on f-string notation, which is limited to Python3.6. A refactor to remove these could allow for development with Python3.0.x through 3.5.x
TODO
- Unittest framework in place, but lacking tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ctodd-python-lib-avro-1.0.5.tar.gz
.
File metadata
- Download URL: ctodd-python-lib-avro-1.0.5.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b7ae261395c8d231328d28771edea5934c19f1305089699c75a51c7f80467d6 |
|
MD5 | 363ecb10e5c4e9cc60f67bf5a35964a2 |
|
BLAKE2b-256 | f06daa1c9e60d8c3d0b62a0a7d65a3c908bc613e83aac2412c703428ff079b14 |
File details
Details for the file ctodd_python_lib_avro-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: ctodd_python_lib_avro-1.0.5-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ee1c2cc64ac9b62f02b4dc0f0e208720d5bdcfb6023ed4a62351bf892df3706 |
|
MD5 | 3e020a96765cee34dc4c97f05adc7794 |
|
BLAKE2b-256 | d52eec6804c3698cdcf9f8bdcfc0abea782fe972ff7f15acaa6f01b1e66e83f8 |