Skip to main content

Polars io-plugin for reading and writing avro files

Project description

polars-avro

A polars io plugin for reading and writing avro files.

Polars is likely deprecating support for reading and writing avro files, and this plugin fills in support. Currently it's about 7x slower at reading avro files and up to 20x slower at writing files.

However, in exchange for speed you get:

  1. future proof - this won't get deprecated
  2. robust support - the current polars avro implementation has bugs with non-contiguous data frames
  3. scan support - this can scan and push down predicates by chunk

Python Usage

from polars_avro import scan_avro, read_avro, write_avro

lazy = scan_avro(path)
frame = read_avro(path)
write_avro(frame, path)

Rust Usage

There are two main objects exported in rust: AvroScanner for creating an iterator of DataFrames from polars ScanSources, and sink_avro for writing an iterator of DataFrames to a Writeable.

use polars_avro::{AvroScanner, sink_avro, WriteOptions};

let scanner = AvroScanner::new_from_sources(
    &ScanSources::Paths(...),
    1024,  //  batch size
    false, // expand globs
    None,  // cloud options
).unwrap()

sink_avro(
    scanner.map(Result::unwrap),
    ..., // impl Write
    WriteOptions::default(),
).unwrap();

ℹ️ Avro supports writing with a fire compression schemes. In rust these features need to be enabled manually, e.g. apache-avro/bzip to enable bzip2 compression. Decompression is handled automatically.

Development

Rust

Standard cargo commands will build and test the rust library.

Python

The python library is built with uv and maturin. Run the following to compile rust for use by python:

For local rust development, run

uv run maturin develop -m Cargo.toml

to build a local copy of the rust interface.

Testing

cargo clippy --all-features
cargo test
uv run ruff format --check
uv run ruff check
uv run pyright
uv run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_avro-0.2.0.tar.gz (123.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_avro-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (35.0 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file polars_avro-0.2.0.tar.gz.

File metadata

  • Download URL: polars_avro-0.2.0.tar.gz
  • Upload date:
  • Size: 123.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.28

File hashes

Hashes for polars_avro-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1fdbbcdd1598c8e59f253d93f29eb34ac418e37a20d4ff5a9cf580138102442b
MD5 0b6f891731ea5781825cb245fb73ec48
BLAKE2b-256 cb504bd6f261ccaf2fbd3bce8a018cb6def4ef9291be8e16ab762e8c56a7810d

See more details on using hashes here.

File details

Details for the file polars_avro-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_avro-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2d6b0afabef1e1cdaf4fd360625e44d58f6aa57009e6d24dbf0381a81a5ecd55
MD5 2400fc04bc48b0c09e61ec959fb5edf1
BLAKE2b-256 fc9ec045a43cdcc75efa32a898a1dfd39f3817f90ebd73f5c930fab2e1c165de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page