Skip to main content

Even faster alternative to FastAvro

Project description

Cerializer

PyPI

Cerializer is an Avro de/serialization library that aims at providing an even faster alternative to FastAvro and Avro standard library.

This speed increase does not come without a cost. Cerializer will work only with predefined set of schemata for which it will generate tailor made Cython code. This way, the overhead caused by the universality of other serialization libraries will be avoided.

Special credit needs to be given to FastAvro library, by which is this project heavily inspired.

Example of a schema and the corresponding code

SCHEMA

{
    'name': 'array_schema',
    'doc': 'Array schema',
    'namespace': 'cerializer',
    'type': 'record',
    'fields': [
        {
            'name': 'order_id',
            'doc': 'Id of order',
            'type': 'string'
        },
        {
            'name': 'trades',
            'type': {
                'type': 'array',
                'items': ['string', 'int']
            }
        }
    ]
}

CORRESPONDING CODE

def serialize(data, output):
    cdef bytearray buffer = bytearray()
    cdef dict datum
    cdef str type_0
    write.write_string(buffer, data['order_id'])
    if len(data['trades']) > 0:
        write.write_long(buffer, len(data['trades']))
        for val_0 in data['trades']:
            if type(val_0) is tuple:
                type_0, val_1 = val_0

                if type_0 == 'string':
                    write.write_long(buffer, 0)
                    write.write_string(buffer, val_1)

                elif type_0 == 'int':
                    write.write_long(buffer, 1)
                    write.write_int(buffer, val_1)

            else:
                if type(val_0) is str:
                    write.write_long(buffer, 0)
                    write.write_string(buffer, val_0)
                elif type(val_0) is int:
                    write.write_long(buffer, 1)
                    write.write_int(buffer, val_0)
    write.write_long(buffer, 0)
    output.write(buffer)



def deserialize(fo):
    cdef long long i_0
    cdef long long i_1
    cdef long i_2
    data = {}
    data['order_id'] = read.read_string(fo)
    data['trades'] = []

    i_1 = read.read_long(fo)
    while i_1 != 0:
        if i_1 < 0:
            i_1 = -i_1
            read.read_long(fo)
        for i_0 in range(i_1):
            i_2 = read.read_int(fo)
            if i_2 == 0:
                val_2 = read.read_string(fo)
            if i_2 == 1:
                val_2 = read.read_int(fo)
            data['trades'].append(val_2)
        i_1 = read.read_long(fo)
    return data

Usage Example

  1. Create an instance of CerializerSchemata For initializing CerializerSchemata it is necessary to supply a list of tuples in form of (schema_identifier, schema) where schema_identifier is a string and schema is a dict representing the Avro schema. schema tuple = (namespace.schema_name, schema). eg.:

    import cerializer.schema_handler
    import os
    import yaml
    
    def list_schemata():
        # iterates through all your schemata and yields schema_identifier and path to schema folder
        raise NotImplemented
    
    def schemata() -> cerializer.schema_handler.CerializerSchemata:
        schemata = []
        for schema_identifier, schema_root in list_schemata():
            schema_tuple = schema_identifier, yaml.unsafe_load( # type: ignore
                open(os.path.join(schema_root, 'schema.yaml'))
            )
            schemata.append(schema_tuple)
        return cerializer.schema_handler.CerializerSchemata(schemata)
    
  2. Create an instance of Cerializer for each of your schemata by calling cerializer_handler.Cerializer. eg. cerializer_instance = cerializer_handler.Cerializer(cerializer_schemata, schema_namespace, schema_name) This will create an instance of Cerializer that can serialize and deserialize data in the particular schema format.

  3. Use the instance accordingly. eg.:

    data_record = {
        'order_id': 'aaaa',
        'trades': [123, 456, 765]
    }
    
    cerializer_instance = cerializer.cerializer_handler.Cerializer(cerializer_schemata, 'school', 'student')
    serialized_data = cerializer_instance.serialize(data_record)
    print(serialized_data)
    

Serialized data

b'\x08aaaa\x06\x02\xf6\x01\x02\x90\x07\x02\xfa\x0b\x00'

You can also use serialize_into if you already have an IO buffer:

output = io.BytesIO()
cerializer_instance.serialize_into(output, data_record)
print(output.getvalue())

Benchmark

cerializer.default_schema:3            2.5661 times faster,   0.0209s : 0.0082s
cerializer.fixed_decimal_schema:1      1.2795 times faster,   0.1588s : 0.1241s
cerializer.int_date_schema:1           2.8285 times faster,   0.0273s : 0.0097s
cerializer.plain_int:1                 2.2334 times faster,   0.0146s : 0.0065s
cerializer.timestamp_schema_micros:1   2.3759 times faster,   0.0577s : 0.0243s
cerializer.default_schema:2            2.8129 times faster,   0.0240s : 0.0085s
cerializer.array_schema:3              1.2177 times faster,   0.3088s : 0.2536s
cerializer.timestamp_schema:1          2.5928 times faster,   0.0577s : 0.0223s
cerializer.array_schema:2              1.4756 times faster,   0.6542s : 0.4434s
cerializer.union_schema:1              3.0796 times faster,   0.0284s : 0.0092s
cerializer.bytes_decimal_schema:1      1.8449 times faster,   0.0490s : 0.0266s
cerializer.array_schema:1              2.1771 times faster,   0.0344s : 0.0158s
cerializer.string_uuid_schema:1        1.8887 times faster,   0.0494s : 0.0262s
cerializer.map_schema:2                2.0896 times faster,   0.0331s : 0.0158s
cerializer.fixed_schema:1              3.4042 times faster,   0.0213s : 0.0062s
cerializer.long_time_micros_schema:1   2.3747 times faster,   0.0352s : 0.0148s
cerializer.array_schema:4              2.8779 times faster,   0.0591s : 0.0205s
cerializer.default_schema:1            2.0182 times faster,   0.0393s : 0.0195s
cerializer.map_schema:1                3.4610 times faster,   0.0597s : 0.0172s
cerializer.string_schema:1             2.2048 times faster,   0.0352s : 0.0159s
cerializer.reference_schema:1          2.9309 times faster,   0.1525s : 0.0520s
cerializer.enum_schema:1               3.0065 times faster,   0.0217s : 0.0072s
cerializer.tree_schema:1               4.0494 times faster,   0.0869s : 0.0215s
cerializer.huge_schema:1               2.8161 times faster,   0.1453s : 0.0516s
AVERAGE: 1.7814 times faster

Measured against Fastavro using the benchmark in Cerializer/tests.

Device: ASUS ZenBook 14 UM425QA, AMD Ryzen 7 5800H, 16 GB 2133 MHz LPDDR4X

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cerializer-1.5.1.tar.gz (24.5 kB view details)

Uploaded Source

Built Distributions

cerializer-1.5.1-cp313-cp313-manylinux_2_39_x86_64.whl (943.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

cerializer-1.5.1-cp313-cp313-manylinux_2_35_x86_64.whl (948.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.35+ x86-64

cerializer-1.5.1-cp312-cp312-manylinux_2_39_x86_64.whl (943.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

cerializer-1.5.1-cp312-cp312-manylinux_2_35_x86_64.whl (693.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

cerializer-1.5.1-cp311-cp311-manylinux_2_39_x86_64.whl (943.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

cerializer-1.5.1-cp311-cp311-manylinux_2_35_x86_64.whl (948.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.35+ x86-64

cerializer-1.5.1-cp310-cp310-manylinux_2_39_x86_64.whl (943.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

cerializer-1.5.1-cp310-cp310-manylinux_2_35_x86_64.whl (948.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.35+ x86-64

File details

Details for the file cerializer-1.5.1.tar.gz.

File metadata

  • Download URL: cerializer-1.5.1.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for cerializer-1.5.1.tar.gz
Algorithm Hash digest
SHA256 8c821482f329b0c8a9ed9600d3132c9d621f8f1f8003a5e6962505b22d38a710
MD5 2f4bf79b2d3b1b3049a94ab37b037dbf
BLAKE2b-256 eae13f72819d6d4d78af8412173a37d8f7c9cb98d587b09cc171dc5b9576f97a

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 faff079cf1aa411a8d93b6347d06a676880af91406c67f5c6c1caaa0b6048803
MD5 6097ddd06e97b34f8e318ac95ed6f103
BLAKE2b-256 7f57667b8192f60f0cee94d65d7ba381fce5d4784a4a123e4bbe0c08b11fad29

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp313-cp313-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp313-cp313-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 0ddf851892fb1356c10e0c9e5ac9168fadd01eeadda018fbef7a134c3e6db8b5
MD5 54c4a98194f08f84169675cb832dcc74
BLAKE2b-256 bd8ab9489563cad1300c1b36471bee0d02b50e32d570ff3fa0f53d9013f86b33

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 c706d476c618e62f56a1b82d57703edb933668af158591f731449fd39bc358bd
MD5 394a52f769bbaab1ef1db045951f6c69
BLAKE2b-256 7a905b1404ebe6a6e6bf94d264963b0043d91805c678ec77a283e92ec9a28cfe

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 82c1d9fb7742dec4fef13be2c7dad9a187054f808e318363a29f965544bebc62
MD5 7e4d3ea21b7effd480cdd256b6438844
BLAKE2b-256 2b255d02d116f8cfc51bfd32a4db13758af33ac4e9d5a111ee04102bc7389daa

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 e59cb046776250b00331740376b16359a312be4d7137b158aeb9150c5558273b
MD5 c9c250ad9d433248439808dd81e9d4fa
BLAKE2b-256 f3c6efc11d303949136e33d1abfc4f8ed95f5e65b981b30575c552b74434691e

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp311-cp311-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp311-cp311-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 551dce0abc6a620ed93b9b8ba6b6d337a35e978c3e4289e836a4da03b0a5a716
MD5 f78df0e6a32611dfd824ddb3970fb0e0
BLAKE2b-256 fa903a4a81ca14afa39329821052cc18191534a435a4a1853bdb7c5579df724f

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 c5d49adb9bad2862ff4e942e98080a431247760cf15dddeca046fa3db0f89aa5
MD5 5f24a4147f93d1ee8459e934742428d4
BLAKE2b-256 f8961652147a8f1727cdcbf04285c9151b802f87b75f948074419475c9a41443

See more details on using hashes here.

File details

Details for the file cerializer-1.5.1-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for cerializer-1.5.1-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 4327a366d3c4216f92727157fd831d1871a3b34d3c32b51578d4349933b89756
MD5 4f16bf774806809cc4a2611d48d6c418
BLAKE2b-256 f9d4cf680a608200c099d8d95a00e0214c9b568361c1f2575f7799775d408459

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page