Skip to main content

Convert from protobuf to arrow and back

Project description

PyPI Version codecov Build Status

Protarrow

A library for converting from protobuf to arrow and back

Installation

pip install protarrow

Usage

Convert from proto to arrow

message MyProto {
  string name = 1;
  int32 id = 2;
  repeated int32 values = 3;
}
import protarrow

my_protos = [
    MyProto(name="foo", id=1, values=[1, 2, 4]),
    MyProto(name="bar", id=2, values=[3, 4, 5]),
]

schema = protarrow.message_type_to_schema(MyProto)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto)
table = protarrow.messages_to_table(my_protos, MyProto)
name id values
foo 1 [1 2 4]
bar 2 [3 4 5]

Convert from arrow to proto

protos_from_record_batch = protarrow.table_to_messages(record_batch, MyProto)
protos_from_table = protarrow.table_to_messages(table, MyProto)

Customize arrow type

The arrow type for Enum, Timestamp and TimeOfDay can be configured:

config = protarrow.ProtarrowConfig(
    enum_type=pa.int32(),
    timestamp_type=pa.timestamp("ms", "America/New_York"),
    time_of_day_type=pa.time32("ms"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)

Type Mapping

Native Types

Proto Pyarrow Note
bool bool_
bytes binary
double float64
enum int32/string/binary configurable
fixed32 int32
fixed64 int64
float float32
int32 int32
int64 int64
message struct
sfixed32 int32
sfixed64 int64
sint32 int32
sint64 int64
string string
uint32 uint32
uint64 uint64

Other types

Proto Pyarrow Note
repeated list_
map map_
google.protobuf.BoolValue bool_
google.protobuf.BytesValue binary
google.protobuf.DoubleValue float64
google.protobuf.FloatValue float32
google.protobuf.Int32Value int32
google.protobuf.Int64Value int64
google.protobuf.StringValue string
google.protobuf.Timestamp timestamp("ns", "UTC") Unit and timezone are configurable
google.protobuf.UInt32Value uint32
google.protobuf.UInt64Value uint64
google.type.Date date32()
google.type.TimeOfDay time64/time32 Unit and type are configurable

Nullability

  • Top level native field, list and maps are marked as non-nullable.
  • Any nested message and their children are nullable

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protarrow-0.0.1rc6.tar.gz (14.4 kB view hashes)

Uploaded Source

Built Distribution

protarrow-0.0.1rc6-py3-none-any.whl (14.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page