Convert from protobuf to arrow and back
Project description
Protarrow
A library for converting from protobuf to arrow and back
Installation
pip install protarrow
Usage
Convert from proto to arrow
message MyProto {
string name = 1;
repeated int32 values = 2;
}
import protarrow
my_protos = [
MyProto(name="foo", values=[1, 2, 4]),
MyProto(name="bar", values=[1, 2, 4]),
]
schema = protarrow.message_type_to_schema(MyProto)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto)
table = protarrow.messages_to_table(my_protos, MyProto)
name | values |
---|---|
foo | [1 2 4] |
bar | [3 4 5] |
Convert from arrow to proto
protos_from_record_batch = protarrow.table_to_messages(record_batch, MyProto)
protos_from_table = protarrow.table_to_messages(table, MyProto)
Customize arrow type
The arrow type for Enum
, Timestamp
and TimeOfDay
can be configured:
config = protarrow.ProtarrowConfig(
enum_type=pa.int32(),
timestamp_type=pa.timestamp("ms", "America/New_York"),
time_of_day_type=pa.time32("ms"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)
Type Mapping
Native Types
Proto | Pyarrow | Note |
---|---|---|
bool | bool_ | |
bytes | binary | |
double | float64 | |
enum | int32/string/binary | configurable |
fixed32 | int32 | |
fixed64 | int64 | |
float | float32 | |
int32 | int32 | |
int64 | int64 | |
message | struct | |
sfixed32 | int32 | |
sfixed64 | int64 | |
sint32 | int32 | |
sint64 | int64 | |
string | string | |
uint32 | uint32 | |
uint64 | uint64 |
Other types
Proto | Pyarrow | Note |
---|---|---|
repeated | list_ | |
map | map_ | |
google.protobuf.BoolValue | bool_ | |
google.protobuf.BytesValue | binary | |
google.protobuf.DoubleValue | float64 | |
google.protobuf.FloatValue | float32 | |
google.protobuf.Int32Value | int32 | |
google.protobuf.Int64Value | int64 | |
google.protobuf.StringValue | string | |
google.protobuf.Timestamp | timestamp("ns", "UTC") | Unit and timezone are configurable |
google.protobuf.UInt32Value | uint32 | |
google.protobuf.UInt64Value | uint64 | |
google.type.Date | date32() | |
google.type.TimeOfDay | time64/time32 | Unit and type are configurable |
Nullability
- Top level native field, list and maps are marked as non-nullable.
- Any nested message and their children are nullable
Development
Set up
python3 -m venv --clear venv
source venv/bin/activate
poetry self add "poetry-dynamic-versioning[plugin]"
poetry install
python ./scripts/protoc.py
pre-commit install
Testing
This library relies on property based testing. Tests convert randomly generated data from protobuf to arrow and back, making sure the end result is the same as the input.
coverage run --branch --include "*/protarrow/*" -m pytest tests
coverage report
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
protarrow-0.0.1rc5.tar.gz
(14.3 kB
view hashes)
Built Distribution
Close
Hashes for protarrow-0.0.1rc5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c2ac046034c3be3e613fa069abcda71c3e8fd032ec3f100bcf6988c8ab6c6ff |
|
MD5 | 76211943502548cd70f5db7271f7b97e |
|
BLAKE2b-256 | 91f29428d52d85c9c64cf8918609de810a3766ef4e5b2b5db5868b6a249b071d |