Skip to main content

Convert from protobuf to arrow and back

Project description

codecov

Protarrow

A library for converting from protobuf to arrow and back

Installation

pip install protarrow

Usage

Convert from proto to arrow

message MyProto {
  string name = 1;
  repeated int32 values = 2;
}
import protarrow

my_protos = [
    MyProto(name="foo", values=[1, 2, 4]),
    MyProto(name="bar", values=[1, 2, 4]),
]

schema = protarrow.message_type_to_schema(MyProto)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto)
table = protarrow.messages_to_table(my_protos, MyProto)
name values
foo [1 2 4]
bar [3 4 5]

Convert from arrow to proto

protos_from_record_batch = protarrow.table_to_messages(record_batch, MyProto)
protos_from_table = protarrow.table_to_messages(table, MyProto)

Customize arrow type

The arrow type for Enum, Timestamp and TimeOfDay can be configured:

config = protarrow.ProtarrowConfig(
    enum_type=pa.int32(),
    timestamp_type=pa.timestamp("ms", "America/New_York"),
    time_of_day_type=pa.time32("ms"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)

Type Mapping

Native Types

Proto Pyarrow Note
bool bool_
bytes binary
double float64
enum int32/string/binary configurable
fixed32 int32
fixed64 int64
float float32
int32 int32
int64 int64
message struct
sfixed32 int32
sfixed64 int64
sint32 int32
sint64 int64
string string
uint32 uint32
uint64 uint64

Other types

Proto Pyarrow Note
repeated list_
map map_
google.protobuf.BoolValue bool_
google.protobuf.BytesValue binary
google.protobuf.DoubleValue float64
google.protobuf.FloatValue float32
google.protobuf.Int32Value int32
google.protobuf.Int64Value int64
google.protobuf.StringValue string
google.protobuf.Timestamp timestamp("ns", "UTC") Unit and timezone are configurable
google.protobuf.UInt32Value uint32
google.protobuf.UInt64Value uint64
google.type.Date date32()
google.type.TimeOfDay time64/time32 Unit and type are configurable

Nullability

  • Top level native field, list and maps are marked as non-nullable.
  • Any nested message and their children are nullable

Development

Set up

python3 -m venv --clear venv
source venv/bin/activate
poetry self add "poetry-dynamic-versioning[plugin]"
poetry install
python ./scripts/protoc.py
pre-commit install

Testing

This library relies on property based testing. Tests convert randomly generated data from protobuf to arrow and back, making sure the end result is the same as the input.

coverage run --branch --include "*/protarrow/*" -m pytest tests
coverage report

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protarrow-0.0.1rc5.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protarrow-0.0.1rc5-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file protarrow-0.0.1rc5.tar.gz.

File metadata

  • Download URL: protarrow-0.0.1rc5.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.8 Linux/5.15.0-1023-azure

File hashes

Hashes for protarrow-0.0.1rc5.tar.gz
Algorithm Hash digest
SHA256 3d82c0326535ef8be7dd5f7b525aa3995c6d626714d302b5683afdf5e8cbd5c2
MD5 0f21ca5cc7cc6a0c825aab88c8526d50
BLAKE2b-256 617c3431511813ccc029a27697d9ea141347c784942911846d882f796884f8e2

See more details on using hashes here.

File details

Details for the file protarrow-0.0.1rc5-py3-none-any.whl.

File metadata

  • Download URL: protarrow-0.0.1rc5-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.8 Linux/5.15.0-1023-azure

File hashes

Hashes for protarrow-0.0.1rc5-py3-none-any.whl
Algorithm Hash digest
SHA256 2c2ac046034c3be3e613fa069abcda71c3e8fd032ec3f100bcf6988c8ab6c6ff
MD5 76211943502548cd70f5db7271f7b97e
BLAKE2b-256 91f29428d52d85c9c64cf8918609de810a3766ef4e5b2b5db5868b6a249b071d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page