Skip to main content

Python utilities used for interacting with .avro/.avsc files

Project description

Christopher H. Todd's Python Lib for AVRO/AVSC

The ctodd-python-lib-avro project is responsible for interacting with Apache AVRO. This includes converting to byte arrays and from byte arrays, writing and reading .avro files, writing and reading .avsc files, and other minor quality of life wrappers.

The library relies on Python's avro-python3 package, and is wrapped with custom/specific exception handling, simpler interactions, and a more functional style to reduce code in projects dealing with AVRO

Table of Contents

Dependencies

Python Packages

  • avro-python3>=1.8.2
  • simplejson>=3.16.0

Libraries

avro_converter_helpers.py

This library is used to convert avro to other formats (first .json)

Functions:

def convert_avro_file_to_json(avro_filename, json_filename=None):
    """
    Purpose:
        Convert an .avro file into a .json file
    Args:
        avro_filename (String): Path/filename of the .avro file to convert to .json
        json_filename (String): Path/filename of the .json file to generate. if none
            is specified, just use the same .avro path and change the extension
    Yields:
        json_filename (String): Path/filename of the .json file generated
    """

avro_exceptions.py

File for holding custom exception types that will be generated by the avro_helpers libraries

Exception Types:

class AvroTestException(Exception):
    """
    Purpose:
        The AvscInvalid will be raised when reading the .avsc raises an exception
    """
class AvscInvalid(Exception):
    """
    Purpose:
        The AvscInvalid will be raised when reading the .avsc raises an exception
    """
class AvscNotFound(Exception):
    """
    Purpose:
        The AvscNotFound will be raised when trying to Read a .avsc file
        that cannot be found.
    """
class AvroNotFound(Exception):
    """
    Purpose:
        The AvroNotFound will be raised when trying to Read a .avro file
        that cannot be found.
    """

avro_general_helpers.py

Avro General Helpers. This library is used to interact with .avro files not specificlly related to reading or writing them.

Functions:

N/A

avro_reading_helpers.py

Avro Reading Helpers. This library is used to aid in the task of reading .avro files

Functions:

def get_record_from_avro_generator(avro_filename):
    """
    Purpose:
        Generator of records from a .avro filename (with path in the filename)
    Args:
        avro_filename (String): Path/filename of the .avro file to get records from
    Yields:
        avro_record (Record Obj from .avro): Record read from the .avro file
    """
def get_record_from_avro_buffered(avro_filename):
    """
    Purpose:
        Buffered Get records from a .avro filename (with path in the filename)
    Args:
        avro_filename (String): Path/filename of the .avro file to get records from
    Returns:
        avro_records (List of Record Objs from .avro): List of Records read from
            the .avro file
    """

avro_schema_helpers.py

Avro Schema Helpers. This library is used to interact with .avsc files

Functions:

def get_schema_from_avsc_file(avsc_filename):
    """
    Purpose:
        Get the file schema from an .avsc filename (with path in the filename)
    Args:
        avsc_filename (String): Path/filename of the .avsc file to get the schema from
    Return:
        avro_schema (AVRO Schema Object): Schema object from the avro library
    """

avro_writing_helpers.py

Avro Writing Helpers. This library is used to aid in the task of writing .avro files

Functions:

def write_raw_records_to_avro(raw_records, avro_filename, avro_schema):
    """
    Purpose:
        Write Records to .avro File
    Args:
        raw_records (List of Dicts): List of Recrods to Write to AVRO as Bytes
        avro_filename (String): Filename and path of .avro to write
        avro_schema (AVRO Schema Object): Schema object from the avro library
    Returns:
        N/A
    """
def serialize_data(raw_records, avro_schema):
    """
    Purpose:
        Serialize a record as bytes
    Args:
        raw_records (List of Dicts): List of Records to Serialize
        avro_schema (AVRO Schema Object): Schema object from the avro library
    Return:
        serialized_records (List of Byte Array): Records Serialized into Byte-Array
    """

Example Scripts

Example executable Python scripts/modules for testing and interacting with the library. These show example use-cases for the libraries and can be used as templates for developing with the libraries or to use as one-off development efforts.

read_avro_file.py

    Purpose:
        Read an .avro File

    Steps:
        - Either
            - Read .avro File as Buffered List
            - Read .avro File as Generator

    function call:
        python3 read_avsc_file.py {--avro=avro_filename}

    example call:
        python3 read_avsc_file.py --avro="./data/test_data.avro"

read_avsc_file.py

    Purpose:
        Read an .avsc File to get the schema

    Steps:
        - Read .avsc Schema

    function call:
        python3 read_avsc_file.py {--avsc=avsc_filename}

    example call:
        python3 read_avsc_file.py --avsc="./avsc/test_schema.avsc"

write_avro_file.py

    Purpose:
        Write an .avro File

    Steps:
        - Either
            - Write .avro File

    function call:
        python3.6 write_avro_file.py {--avro=avro_filename} \
            {--avsc=avsc_filename}

    example call:
        python3.6 write_avro_file.py --avro="./data/generated_data.avro" \
            --avsc="./avsc/test_schema.avsc"

Notes

  • Relies on f-string notation, which is limited to Python3.6. A refactor to remove these could allow for development with Python3.0.x through 3.5.x

TODO

  • Unittest framework in place, but lacking tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctodd-python-lib-avro-1.0.5.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

ctodd_python_lib_avro-1.0.5-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file ctodd-python-lib-avro-1.0.5.tar.gz.

File metadata

  • Download URL: ctodd-python-lib-avro-1.0.5.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for ctodd-python-lib-avro-1.0.5.tar.gz
Algorithm Hash digest
SHA256 4b7ae261395c8d231328d28771edea5934c19f1305089699c75a51c7f80467d6
MD5 363ecb10e5c4e9cc60f67bf5a35964a2
BLAKE2b-256 f06daa1c9e60d8c3d0b62a0a7d65a3c908bc613e83aac2412c703428ff079b14

See more details on using hashes here.

File details

Details for the file ctodd_python_lib_avro-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: ctodd_python_lib_avro-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.9.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for ctodd_python_lib_avro-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8ee1c2cc64ac9b62f02b4dc0f0e208720d5bdcfb6023ed4a62351bf892df3706
MD5 3e020a96765cee34dc4c97f05adc7794
BLAKE2b-256 d52eec6804c3698cdcf9f8bdcfc0abea782fe972ff7f15acaa6f01b1e66e83f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page