Skip to main content

Fast, multi-threaded deserialization of schema-less avro encoded messages

Project description

Ruhvro

A library for deserializing schemaless avro encoded bytes into Apache Arrow record batches. This library was created as an experiment to gauge potential improvements in kafka messages deserialization speed - particularly from the python ecosystem.

The main speed-ups in this code are from releasing python's gil during deserialization and the use of multiple cores. The speed-ups are much more noticable on larger datasets or while running several python threads at once.

Building

Python extension

building a wheel:

Requires Rust tools to be installed

  • create python virtual environment
  • pip install maturin
  • maturin build --release
  • the previous command should yield a path to the compiled wheel file, something like this /Users/currentUser/rust/pyruhvro/target/wheels/pyruhvro-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
  • pip install /Users/tylerschauer/rust/pyruhvro/target/wheels/pyruhvro-0.1.0-cp312-cp312-macosx_11_0_arm64.whl

The extension can be used like so:

from pyruhvro import deserialize_array, deserialize_array_threaded 

schema = """
    {
      "type": "record",
      "name": "UserData",
      "namespace": "com.example",
      "fields": [
        {
          "name": "userId",
          "type": "string"
        },
        {
          "name": "age",
          "type": "int"
        },
        ... more fields...
    }
    """
    
# serialized values from kafka messages
serialized_messages: List[List[u8]] = [serialized messages...]

record_batch = deserialize_array(serialized_messages, schema) 

# or if you'd like to leverage multiple cores
num_cores = 8
deserialize_array_threaded(serialized_messages, schema, num_cores)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyruhvro-0.1.0.tar.gz (486.0 kB view hashes)

Uploaded Source

Built Distributions

pyruhvro-0.1.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (857.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (809.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-pp310-pypy310_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (798.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (857.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (809.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (798.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-pp38-pypy38_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-pp38-pypy38_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (857.1 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-pp38-pypy38_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (809.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (798.1 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-cp312-none-win_amd64.whl (643.1 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

pyruhvro-0.1.0-cp312-none-win32.whl (577.1 kB view hashes)

Uploaded CPython 3.12 Windows x86

pyruhvro-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (800.9 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (855.7 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.0 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (808.3 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.whl (797.0 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (628.2 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

pyruhvro-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl (742.8 kB view hashes)

Uploaded CPython 3.12 macOS 10.12+ x86-64

pyruhvro-0.1.0-cp311-none-win_amd64.whl (646.8 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

pyruhvro-0.1.0-cp311-none-win32.whl (579.8 kB view hashes)

Uploaded CPython 3.11 Windows x86

pyruhvro-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.3 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (856.2 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.1 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (808.7 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.whl (797.4 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (707.3 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pyruhvro-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl (743.5 kB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

pyruhvro-0.1.0-cp310-none-win_amd64.whl (646.9 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

pyruhvro-0.1.0-cp310-none-win32.whl (579.7 kB view hashes)

Uploaded CPython 3.10 Windows x86

pyruhvro-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (856.1 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (808.7 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (797.5 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (707.4 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pyruhvro-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl (743.5 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

pyruhvro-0.1.0-cp39-none-win_amd64.whl (646.8 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pyruhvro-0.1.0-cp39-none-win32.whl (579.8 kB view hashes)

Uploaded CPython 3.9 Windows x86

pyruhvro-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.3 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (856.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (808.7 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (797.5 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

pyruhvro-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (707.4 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pyruhvro-0.1.0-cp39-cp39-macosx_10_12_x86_64.whl (743.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.12+ x86-64

pyruhvro-0.1.0-cp38-none-win_amd64.whl (646.7 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pyruhvro-0.1.0-cp38-none-win32.whl (579.6 kB view hashes)

Uploaded CPython 3.8 Windows x86

pyruhvro-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (801.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyruhvro-0.1.0-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

pyruhvro-0.1.0-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (855.9 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

pyruhvro-0.1.0-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (796.0 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARMv7l

pyruhvro-0.1.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (808.7 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

pyruhvro-0.1.0-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl (797.4 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page