An Avro library, Spavro is a (sp)eedier avro implementation using Cython -- Spavro is a fork of the official Apache AVRO python 2 implementation with the goal of greatly improving data read deserialization and write serialization performance.

These details have not been verified by PyPI

Project links

Homepage

Project description

(Sp)eedier Avro - Spavro

Spavro is a fork of the official Apache AVRO python 2 implementation with the goal of greatly improving data read deserialization and write serialization performance.

Spavro is also python 2/3 compatible (instead of a separate project / implementation). Currently tested using python 2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, and 3.11. Python 3 versions before 3.3 are not supported due to the use of unicode literals and other compatibility features.

Implementation Details

There are three primary differences between the official implementation and Spavro. First, Spavro uses a C extension, created with Cython, to accelerate some of the low level binary serialization logic. Additionally Spavro uses a different model for handling schemas. Spavro attempts to parse the write and read schemas once and only once and creates recursive reader/writer functions from the schema definition. These reader/writer functions encode the type structure of the schema so no additional lookups are necessary while processing data. The last difference is that Spavro has been updated to be both Python 2 and Python 3 compatible using the six library. The official apache AVRO implementation has two separate codebases for Python 2 and Python 3 and spavro only has one.

This has the net effect of greatly improving the throughput of reading and writing individual datums, since the schema isn't interrogated for every datum. This can be especially beneficial for "compatible" schema reading where both a read and write schema are needed to be able to read a complete data set.

Performance / Benchmarks

Results

These tests were run using an AWS m4.large instance running CentOS 7. They were run with the following versions: avro-python3==1.8.2, fastavro==0.17.9, spavro==1.1.10. Python 3.6.4 was used for the python 3 tests.

The TLDR is that spavro has 14-23x the throughput of the default Apache avro implementation and 2-4x the throughput of the fastavro library (depending on the shape of the records).

Deserialize avro records (read)

Records per second read:

Read, 1 field, records per sec Read, 500 fields, records per sec

Datums per second (individual fields) read:

Read, fields per second

Serialize avro records (write)

Records per second write:

Write, 1 field, records per sec Write, 500 fields, records per sec

Datums per second (individual fields) write:

Write, fields per second

Methodology

Benchmarks were performed with the benchmark.py script in the /benchmarks path in the repository (if you'd like to run your own tests).

Many of the records that led to the creation of spavro were of the form {"type": "record", "name": "somerecord", "fields": [1 ... n fields usually with a type of the form of a union of ['null' and a primitive type]]} so the benchmarks were created to simulate that type of record structure. I believe this is a very common use case for avro so the benchmarks were created around this pattern.

The benchmark creates a random schema of a record with a mix of string, double, long and boolean types and a random record generator to test that schema. The pseudo-random generator is seeded with the same string to make the results deterministic (but with varied records). The number of fields in the record was varied from one to 500 and the performance of the avro implementations were tested for each of the cases.

The serializer and deserializer benchmarks create an array of simulated records in memory and then attempts to process them using the three different implementation as quickly as possible. This means the max working size is limited to memory (a combination of the number of records and the number of fields in the simulated record). For these benchmarks 5m datums were processed for each run (divided by the number of fields in each record).

Each run of the schema/record/implementation was repeated ten times and the time to complete was averaged.

API

Spavro keeps the default Apache library's API. This allows spavro to be a drop-in replacement for code using the existing Apache implementation.

Tests

Since the API matches the existing library, the majority of the existing Apache test suite is used to verify the correct operation of Spavro. Spavro adds some additional correctness tests to compare new vs old behaviors as well as some additional logic tests above and beyond the original library. Some of the java-based "map reduce" tests (specifically the tether tests) were removed because Spavro does not include the java code to implement that logic.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.27

Jun 27, 2025

1.1.26

Nov 8, 2023

1.1.24

Aug 3, 2021

1.1.23

Feb 21, 2020

1.1.22

Apr 10, 2019

1.1.21

Jan 18, 2019

1.1.20

Oct 6, 2018

1.1.19

Aug 21, 2018

1.1.18

Aug 21, 2018

1.1.17

May 4, 2018

1.1.16

May 4, 2018

1.1.15

May 3, 2018

1.1.14

May 3, 2018

1.1.13

May 3, 2018

1.1.12

May 3, 2018

1.1.11

May 1, 2018

1.1.10

Mar 19, 2018

1.1.9

Mar 19, 2018

1.1.8

Mar 19, 2018

1.1.7

Mar 7, 2018

1.1.6

Jan 17, 2018

1.1.5

Jan 5, 2018

1.1.4

Dec 22, 2017

1.1.3

Dec 5, 2017

1.1.2

Nov 15, 2017

1.1.1

Oct 31, 2017

1.1.0

Jun 21, 2017

1.0

Jun 5, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spavro-1.1.27.tar.gz (250.3 kB view details)

Uploaded Jun 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spavro-1.1.27-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (854.3 kB view details)

Uploaded Nov 6, 2025 CPython 3.11manylinux: glibc 2.17+ ARM64

File details

Details for the file spavro-1.1.27.tar.gz.

File metadata

Download URL: spavro-1.1.27.tar.gz
Upload date: Jun 27, 2025
Size: 250.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.6

File hashes

Hashes for spavro-1.1.27.tar.gz
Algorithm	Hash digest
SHA256	`77a800153f7db2181a0fbd7f1b2b23fec1436c67b5cd4e982b39911693f96cad`
MD5	`df2216dce89b0b53b1d4bdc2c93a0092`
BLAKE2b-256	`d3b404097fbed1c26d2c43f8b90dcad9bd5190ab5928826620df679e7eea849c`

See more details on using hashes here.

File details

Details for the file spavro-1.1.27-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

Download URL: spavro-1.1.27-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Upload date: Nov 6, 2025
Size: 854.3 kB
Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for spavro-1.1.27-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm	Hash digest
SHA256	`2a187e751212fc77091214d565e620413da191a3318b1a14a995e1d51303cf8b`
MD5	`80e671649d274a4d3f710161d483bba2`
BLAKE2b-256	`dfc7011f5f11aad1f9df55db9bafa3c984cf111e7702a04df812db70d59cfb1f`

See more details on using hashes here.

spavro 1.1.27

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

(Sp)eedier Avro - Spavro

Implementation Details

Performance / Benchmarks

Results

Deserialize avro records (read)

Serialize avro records (write)

Methodology

API

Tests

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes