Skip to main content

Python utilities used for interacting with .avro/.avsc files

Project description

Christopher H. Todd's Python Lib for AVRO/AVSC

The ctodd-python-lib-avro project is responsible for interacting with Apache AVRO. This includes converting to byte arrays and from byte arrays, writing and reading .avro files, writing and reading .avsc files, and other minor quality of life wrappers.

The library relies on Python's avro-python3 package, and is wrapped with custom/specific exception handling, simpler interactions, and a more functional style to reduce code in projects dealing with AVRO

Table of Contents

Dependencies

Python Packages

  • avro-python3>=1.8.2
  • simplejson>=3.16.0

Libraries

avro_converter_helpers.py

This library is used to convert avro to other formats (first .json)

Functions:

def convert_avro_file_to_json(avro_filename, json_filename=None):
    """
    Purpose:
        Convert an .avro file into a .json file
    Args:
        avro_filename (String): Path/filename of the .avro file to convert to .json
        json_filename (String): Path/filename of the .json file to generate. if none
            is specified, just use the same .avro path and change the extension
    Yields:
        json_filename (String): Path/filename of the .json file generated
    """

avro_exceptions.py

File for holding custom exception types that will be generated by the avro_helpers libraries

Exception Types:

class AvroTestException(Exception):
    """
    Purpose:
        The AvscInvalid will be raised when reading the .avsc raises an exception
    """
class AvscInvalid(Exception):
    """
    Purpose:
        The AvscInvalid will be raised when reading the .avsc raises an exception
    """
class AvscNotFound(Exception):
    """
    Purpose:
        The AvscNotFound will be raised when trying to Read a .avsc file
        that cannot be found.
    """
class AvroNotFound(Exception):
    """
    Purpose:
        The AvroNotFound will be raised when trying to Read a .avro file
        that cannot be found.
    """

avro_general_helpers.py

Avro General Helpers. This library is used to interact with .avro files not specificlly related to reading or writing them.

Functions:

N/A

avro_reading_helpers.py

Avro Reading Helpers. This library is used to aid in the task of reading .avro files

Functions:

def get_record_from_avro_generator(avro_filename):
    """
    Purpose:
        Generator of records from a .avro filename (with path in the filename)
    Args:
        avro_filename (String): Path/filename of the .avro file to get records from
    Yields:
        avro_record (Record Obj from .avro): Record read from the .avro file
    """
def get_record_from_avro_buffered(avro_filename):
    """
    Purpose:
        Buffered Get records from a .avro filename (with path in the filename)
    Args:
        avro_filename (String): Path/filename of the .avro file to get records from
    Returns:
        avro_records (List of Record Objs from .avro): List of Records read from
            the .avro file
    """

avro_schema_helpers.py

Avro Schema Helpers. This library is used to interact with .avsc files

Functions:

def get_schema_from_avsc_file(avsc_filename):
    """
    Purpose:
        Get the file schema from an .avsc filename (with path in the filename)
    Args:
        avsc_filename (String): Path/filename of the .avsc file to get the schema from
    Return:
        avro_schema (AVRO Schema Object): Schema object from the avro library
    """

avro_writing_helpers.py

Avro Writing Helpers. This library is used to aid in the task of writing .avro files

Functions:

def write_raw_records_to_avro(raw_records, avro_filename, avro_schema):
    """
    Purpose:
        Write Records to .avro File
    Args:
        raw_records (List of Dicts): List of Recrods to Write to AVRO as Bytes
        avro_filename (String): Filename and path of .avro to write
        avro_schema (AVRO Schema Object): Schema object from the avro library
    Returns:
        N/A
    """
def serialize_data(raw_records, avro_schema):
    """
    Purpose:
        Serialize a record as bytes
    Args:
        raw_records (List of Dicts): List of Records to Serialize
        avro_schema (AVRO Schema Object): Schema object from the avro library
    Return:
        serialized_records (List of Byte Array): Records Serialized into Byte-Array
    """

Example Scripts

Example executable Python scripts/modules for testing and interacting with the library. These show example use-cases for the libraries and can be used as templates for developing with the libraries or to use as one-off development efforts.

read_avro_file.py

    Purpose:
        Read an .avro File

    Steps:
        - Either
            - Read .avro File as Buffered List
            - Read .avro File as Generator

    function call:
        python3 read_avsc_file.py {--avro=avro_filename}

    example call:
        python3 read_avsc_file.py --avro="./data/test_data.avro"

read_avsc_file.py

    Purpose:
        Read an .avsc File to get the schema

    Steps:
        - Read .avsc Schema

    function call:
        python3 read_avsc_file.py {--avsc=avsc_filename}

    example call:
        python3 read_avsc_file.py --avsc="./avsc/test_schema.avsc"

write_avro_file.py

    Purpose:
        Write an .avro File

    Steps:
        - Either
            - Write .avro File

    function call:
        python3.6 write_avro_file.py {--avro=avro_filename} \
            {--avsc=avsc_filename}

    example call:
        python3.6 write_avro_file.py --avro="./data/generated_data.avro" \
            --avsc="./avsc/test_schema.avsc"

Notes

  • Relies on f-string notation, which is limited to Python3.6. A refactor to remove these could allow for development with Python3.0.x through 3.5.x

TODO

  • Unittest framework in place, but lacking tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctodd-python-lib-avro-1.0.5.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

ctodd_python_lib_avro-1.0.5-py3-none-any.whl (10.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page