Skip to main content

Polars io-plugin for reading and writing avro files

Project description

polars-avro

build pypi docs

A polars io plugin for reading and writing Apache Avro files, built on arrow-avro. It provides scan support with predicate pushdown, map type reading, and continued avro support as polars deprecates its built-in implementation.

Python Usage

from polars_avro import scan_avro, read_avro, write_avro

lazy = scan_avro(path)
frame = read_avro(path)
write_avro([frame], path)

Rust Usage

There are two main exports: [Reader] for iterating DataFrames from avro sources, and [Writer] for writing DataFrames to an avro file.

use polars_avro::{Reader, Writer, ReadOptions};

// read
let reader = Reader::try_new(
    [File::open("data.avro")],
    ReadOptions::basic(),
).unwrap();
for batch in reader {
    let frame = batch.unwrap();
}

// write
let mut writer = Writer::try_new(file, frame.schema(), None).unwrap();
writer.write(&frame).unwrap();

ℹ️ Avro supports writing with file compression schemes. In rust these need to be enabled via feature flags: deflate, snappy, bzip2, xz, zstd. Decompression is handled automatically.

Idiosyncrasies

Avro and Arrow don't align fully, and polars only supports a subset of arrow. Some types require casting before writing, and some avro types map to different polars types than you might expect when reading.

Writing

The following polars types error when writing and must be cast first:

Polars Type Cast To
large UInt64 Wrap to Int64
Categorical Int32 or String
Enum Int32 or String

Times will get truncated to micro seconds.

Compression is supported via feature flags: deflate, snappy, bzip2, xz, zstd.

Reading

utf8_view behavior — the utf8_view option (default false) changes how certain types are read:

Type utf8_view=false (default) utf8_view=true
UUID binary (16 bytes) formatted string
nullable strings preserves nulls replaces null with "" (lossy)

Since polars tends to work with string views internally, utf8_view=true is likely faster if you don't mind losing null string distinctions.

Type mappings of note:

Avro Type Polars Type
Enum Categorical (not Enum)
Map List of Struct {key, value}
BigDecimal Binary
Duration unsupported (errors)
Date Date (days since epoch)
TimeMillis, TimeMicros Time (nanoseconds)
TimestampMillis/Micros/Nanos Datetime with matching precision and UTC tz
LocalTimestampMillis/Micros/Nanos Datetime with matching precision and no tz

Constraints: the root avro schema must be a Record, and all files in a multi-file read must share the same schema.

Benchmarks

Python reports median (file reads, in-memory writes). Rust reports mean. native = polars built-in avro. Ratio relative to native; bold = fastest. Complex rows use nested/struct types.

Benchmark native polars-avro jetliner
python read 1K × 2 64 µs (1.00x) 99 µs (1.54x) 180 µs (2.79x)
python read 64K × 2 2.7 ms (1.00x) 2.1 ms (0.78x) 2.8 ms (1.04x)
python read 1K × 8 183 µs (1.00x) 242 µs (1.32x) 337 µs (1.84x)
python read 1M × 8 159 ms (1.00x) 114 ms (0.72x) 145 ms (0.91x)
python read 1M × 128 2.6 s (1.00x) 1.8 s (0.69x) 2.8 s (1.09x)
python read complex 1K × 8 449 µs 592 µs
python read complex 1M × 8 181 ms 260 ms
python read proj 1M × 128 → 8 1.6 s (1.00x) 1.2 s (0.75x) 1.2 s (0.77x)
python read proj 1K × 8 → 2 133 µs (1.00x) 297 µs (2.24x) 264 µs (1.99x)
python write 1K × 2 42 µs (1.00x) 30 µs (0.72x)
python write 64K × 2 1.5 ms (1.00x) 1.1 ms (0.71x)
python write 1K × 8 143 µs (1.00x) 114 µs (0.80x)
python write 1M × 8 87 ms (1.00x) 93 ms (1.07x)
python write 1M × 128 1.5 s (1.00x) 2.2 s (1.48x)
rust read 1K × 2 42 µs (1.00x) 34 µs (0.80x)
rust read 1M × 128 2.8 s (1.00x) 2.0 s (0.69x)
rust read proj 1M × 128 → 8 1.3 s (1.00x) 1.2 s (0.87x)
rust read proj 1K × 8 → 2 109 µs (1.00x) 116 µs (1.06x)
rust write 1K × 2 42 µs (1.00x) 22 µs (0.53x)
rust write 64K × 2 1.5 ms (1.00x) 1.0 ms (0.67x)
rust write 1K × 8 135 µs (1.00x) 93 µs (0.69x)
rust write 1M × 8 97 ms (1.00x) 89 ms (0.92x)
rust write 1M × 128 1.6 s (1.00x) 1.4 s (0.88x)

Development

Rust

Standard cargo commands will build and test the rust library.

Python

The python library is built with uv and maturin. The rust components should build once, ance otherwise allow usage and testing.

You may need to recompile the python bindings with uv run maturin develop.

Testing

cargo fmt --check
cargo clippy --all-features --tests
cargo test
uv run ruff format --check
uv run ruff check
uv run pyright
uv run pytest

Benchmarking

Python benchmarks are disabled by default. To run them:

cargo +nightly bench
uv run pytest --benchmark-only

Releasing

rm -rf dist
uv build --sdist
uv run maturin build -r -o dist --target aarch64-apple-darwin
uv run maturin build -r -o dist --target aarch64-unknown-linux-gnu --zig
uv publish --username __token__

To Do

  • reimplement single column reader?
  • reimplement better workarounds for types that don't exist, e.g. serialize polars cat/enum to arrow enum and vice versa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_avro-0.9.2.tar.gz (167.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_avro-0.9.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

polars_avro-0.9.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

polars_avro-0.9.2-cp310-abi3-macosx_11_0_arm64.whl (12.3 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file polars_avro-0.9.2.tar.gz.

File metadata

  • Download URL: polars_avro-0.9.2.tar.gz
  • Upload date:
  • Size: 167.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_avro-0.9.2.tar.gz
Algorithm Hash digest
SHA256 a0051445aaf3afb3fae0b0770cf78f94c2997c11de5b9be8fdf8623418713e52
MD5 ae0b9e852d6486378000ea30e260dfb6
BLAKE2b-256 e0929bfe6561f32f3ec4d91f36fd4e7697a16079450530773f97a541853c8731

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_avro-0.9.2.tar.gz:

Publisher: release.yml on hafaio/polars-avro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_avro-0.9.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_avro-0.9.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f64d6e32c39b3dbfb6041c26966d77d40ad9d6909b8fe693cd9d4c364349de7f
MD5 dd6af6712e7a80feb6dfa544032a9310
BLAKE2b-256 708fd28c3aef5559555a3fae749726e04a7abd4a7b38f4e4f3936c4ea1e18fa5

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_avro-0.9.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on hafaio/polars-avro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_avro-0.9.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_avro-0.9.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6c4b7a7c795eaaddc24976b2e85f6464289215b95914e076c35f9fb9e3d8d1e3
MD5 fe6408c221ff123c98c76fd9d9d8e64e
BLAKE2b-256 bedcff5e29fa408659955a69541c33a6e96fc9ca53b19d2f32daa710029511ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_avro-0.9.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on hafaio/polars-avro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_avro-0.9.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_avro-0.9.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 61a530e91282f00914e3243d046998826ebdda398fd9f8d1e4c383dfae83f662
MD5 c9f6f9fb2abfe16bc80285e2c19f8363
BLAKE2b-256 55cbf2b605fa501a83f4dab3d238656c49a6f6823f26e01b9939ec0cac65c214

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_avro-0.9.2-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on hafaio/polars-avro

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page