Convert from protobuf to arrow and back
Project description
Protarrow
A library for converting from protobuf to arrow and back
Installation
pip install protarrow
Usage
Convert from proto to arrow
message MyProto {
string name = 1;
repeated int32 values = 2;
}
import protarrow
my_protos = [
MyProto(name="foo", values=[1, 2, 4]),
MyProto(name="bar", values=[1, 2, 4]),
]
schema = protarrow.message_type_to_schema(MyProto)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto)
table = protarrow.messages_to_table(my_protos, MyProto)
| name | values |
|---|---|
| foo | [1 2 4] |
| bar | [3 4 5] |
Convert from arrow to proto
protos_from_record_batch = protarrow.table_to_messages(record_batch, MyProto)
protos_from_table = protarrow.table_to_messages(table, MyProto)
Customize arrow type
The arrow type for Enum, Timestamp and TimeOfDay can be configured:
config = protarrow.ProtarrowConfig(
enum_type=pa.int32(),
timestamp_type=pa.timestamp("ms", "America/New_York"),
time_of_day_type=pa.time32("ms"),
)
record_batch = protarrow.messages_to_record_batch(my_protos, MyProto, config)
Type Mapping
Native Types
| Proto | Pyarrow | Note |
|---|---|---|
| bool | bool_ | |
| bytes | binary | |
| double | float64 | |
| enum | int32/string/binary | configurable |
| fixed32 | int32 | |
| fixed64 | int64 | |
| float | float32 | |
| int32 | int32 | |
| int64 | int64 | |
| message | struct | |
| sfixed32 | int32 | |
| sfixed64 | int64 | |
| sint32 | int32 | |
| sint64 | int64 | |
| string | string | |
| uint32 | uint32 | |
| uint64 | uint64 |
Other types
| Proto | Pyarrow | Note |
|---|---|---|
| repeated | list_ | |
| map | map_ | |
| google.protobuf.BoolValue | bool_ | |
| google.protobuf.BytesValue | binary | |
| google.protobuf.DoubleValue | float64 | |
| google.protobuf.FloatValue | float32 | |
| google.protobuf.Int32Value | int32 | |
| google.protobuf.Int64Value | int64 | |
| google.protobuf.StringValue | string | |
| google.protobuf.Timestamp | timestamp("ns", "UTC") | Unit and timezone are configurable |
| google.protobuf.UInt32Value | uint32 | |
| google.protobuf.UInt64Value | uint64 | |
| google.type.Date | date32() | |
| google.type.TimeOfDay | time64/time32 | Unit and type are configurable |
Nullability
- Top level native field, list and maps are marked as non-nullable.
- Any nested message and their children are nullable
Development
Set up
python3 -m venv --clear venv
source venv/bin/activate
poetry self add "poetry-dynamic-versioning[plugin]"
poetry install
python ./scripts/protoc.py
pre-commit install
Testing
This library relies on property based testing. Tests convert randomly generated data from protobuf to arrow and back, making sure the end result is the same as the input.
coverage run --branch --include "*/protarrow/*" -m pytest tests
coverage report
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protarrow-0.0.1rc5.tar.gz.
File metadata
- Download URL: protarrow-0.0.1rc5.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.8 Linux/5.15.0-1023-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d82c0326535ef8be7dd5f7b525aa3995c6d626714d302b5683afdf5e8cbd5c2
|
|
| MD5 |
0f21ca5cc7cc6a0c825aab88c8526d50
|
|
| BLAKE2b-256 |
617c3431511813ccc029a27697d9ea141347c784942911846d882f796884f8e2
|
File details
Details for the file protarrow-0.0.1rc5-py3-none-any.whl.
File metadata
- Download URL: protarrow-0.0.1rc5-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.10.8 Linux/5.15.0-1023-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c2ac046034c3be3e613fa069abcda71c3e8fd032ec3f100bcf6988c8ab6c6ff
|
|
| MD5 |
76211943502548cd70f5db7271f7b97e
|
|
| BLAKE2b-256 |
91f29428d52d85c9c64cf8918609de810a3766ef4e5b2b5db5868b6a249b071d
|